报告题目一：Deep Learning Architectures for Protein Bioinformatics
报告人：Prof. Jianlin Cheng, University of Missouri - Columbia, USA
报告摘要: Deep learning has emerged as the most powerful machine learning method for big data analysis in many areas, such as computer vision, image processing, speech recognition, and natural language processing. Since deep learning was introduced into the field of bioinformatics in 2012 by my group and others, it has significantly advanced analysis and modeling of protein sequence, structure and function. In this talk, I will present our recent deep learning methods for predicting protein structural features and folds from sequences. Particularly, I will focus on the novel deep convolutional neural network architectures that can automatically learn multi-level features from unstructured raw sequence data of any length to recognize structural patterns in proteins, which can be widely applied to all kinds of classification and clustering problems in protein bioinformatics.
报告人简介： Dr. Jianlin Cheng is a full professor in the Department of Electrical Engineering and Computer Science at the University of Missouri - Columbia, USA. He earned his PhD in computer science from the University of California, Irvine in 2006. His research focuses on bioinformatics, machine learning, data mining, and big data. Dr. Cheng has authored or co-authored more than 120 publications, which were cited >7,300 times. He also developed dozens of bioinformatics software tools on 3D genome structure modeling, protein structure and function prediction, and biological network modeling, which are widely used around the world. His research has been supported by National Institutes of Health (NIH), National Science Foundation (NSF) and Department of Energy (DoE), USA. Dr. Cheng is a recipient of a 2012 NSF CAREER award.
报告题目二：Large-scale Multilabel Learning and its applications in Bioinformatics
报告摘要: Multi-label learning deals with the classification problems where each instance can be assigned with multiple class labels simultaneously. There are thousands or even more labels in large-scale multi-label learning. Many important problems in bioinformatics can be modeled as a large scale multi-label learning problem, such as MeSH indexing, drug target interaction prediction and protein function prediction. By utilizing learning to rank framework, we have developed MeSHLabeler and DeepMeSH to solve large-scale MeSH indexing problem,DrugE-Rank to solve drug target interaction prediction problem, and GOLabeler for protein function prediction. DeepMeSH achieved the first place in both BioASQ4 and BioASQ5 challenge, and MeSHLabeler achieved the first place in both BioASQ2 and BioASQ3 challenges. Specifically, DeepMeSH achieved a Micro F-measure of 0.6323, 2% higher than 0.6218 of MeSHLabeler and 12% higher than 0.5637 of MTI (NLM's official solution), for BioASQ3 challenge data with 6000 citations.In addition, using benchmark data in DrugBank, experimental results show that DrugE-Rank outperforms competing methods significantly, especially achieving more than 30% improvement in Area under Prediction Recall curve for FDA approved new drugs and FDA experimental drugs. Finally, according to the initial evaluation of CAFA3 (The Critical Assessment of protein Function Annotation algorithms) in July 2017, GOLabeler achieved the first place in terms of F-max out of nearly 200 submissions by around 50 labs all over the world.
报告人简介： 朱山风，复旦大学计算机科学技术学院副教授，博士生生导师。香港城市大学博士（2003）,日本京都大学博士后（2004-2008）,日本学术振兴会邀请访问学者（JSPS Invitation Fellowship 2012）,美国伊利诺伊大学香槟分校访问学者（2013-2014）,日本京都大学访问副教授（2016）。主要研究方向为生物信息学、信息检索和数据挖掘。在相关领域的著名国际期刊和会议如KDD、IJCAI、ISMB、Bioinformatics、NAR、Briefings in Bioinformatics等以第一作者或通讯作者发表论文40余篇。BIBM2014-2017、InCoB2012-2017、GIW2015-2017、APBC2014-2018等生物信息学国际会议程序委员会委员。2014年-2017年连续四次参加BioASQ大规模生物医学文本自动标注国际竞赛中均取得第一名的好成绩，比美国国立医学图书馆使用软件精度提高约12%。
报告题目三：Finding remote homologous proteins: alignment-based, alignment-free and cross-modal methods
报告摘要：Proteins function in living organisms as enzymes, antibodies, sensors, and transporters, among myriad other roles. The understanding of protein functions has great implications for the study of biological and medical sciences. It has been widely accepted that protein functions are largely determined by protein structures, and proteins with similar sequences tend to fold into similar structures. Moreover, protein structures are more conserved than protein sequences over the course of evolution. Therefore, funding remote homologous proteins with conserved structure similarities but limited sequence similarities becomes a fundamental yet challenging problem in computational biology. Indeed, this is an indispensable step towards understanding protein function. Here, three different novel methods are presented for finding remote homologous proteins with different goals: (1) the PROtein Structure Alignment (PROSTA) methods that automatically determine and align homologous structures of protein pockets and interaction interfaces; (2) the ContactLib method that scans tens of thousands of protein structures for homologous structures in seconds; and (3) the CMsearch method that simultaneously explore the sequence space and the structure space to perform cross-modal search for homologous proteins. Experiments show that our methods do not only improve the accuracy of finding homologous proteins, but also improve the accuracy of predicting protein structures. Moreover, case studies where our methods discover, for the first time, structural similarities between pairs of functionally related protein-DNA complexes are presented.
报告人简介：崔学峰，清华大学交叉信息学研究院 Tenure-Track 助理教授(博士生导师)。师 从加拿大皇家科学院院士李明教授，在加拿大知名学府滑铁卢大学(University of Waterloo)先后获得计算机科学本科、硕士与博士学位，拥有 17 年的海外留学与科研 经历。2016 年 9 月入职清华大学交叉信息学研究院，继续从事计算生物学、深度学习 与并行算法方面的研究。近期主要科研课题包括冷冻电镜蛋白质结构检测(Cyo-EM protein structure determination)、纳米孔测序(nano-pore DNA sequencing)、 合成生物信息学(syntheticbioinformatics)等计算生物学尖端课题。涉及计算方 法包括深度神经网络(deep neuralnetworks)、生成对抗网络(generative adversarial networks)、并行计算(CUDA programming)等最新计算方法。主要科 研成果发表于业内顶级会议和期刊(e.g., ISMB, Bioinformatics, Nucleic Acids Research, ACS SyntheticBiology, Pattern Recognition)。
报告题目四：Bio-macromolecular image alignment in cryo-electron microscopy reconstruction
报告摘要：Cryo-electron microscopy (Cryo-EM), awarded 2017 Nobel Prize in Chemistry, is becoming the premier method for determining the three-dimensional (3D) structure of protein complexes at molecular resolution. In this report, we will start with a brief review of computational methods in Cryo-EM 3D reconstruction, then I will focus on the imaging alignment methods in this research field, and will present three imaging alignment algorithms, feature-based alignment, marker-based alignment and alignment based pm Gaussian Mixture Model. Also, I will introduce briefly our research progresses based on deep learning in Cryo-EM 3D reconstruction.