Introduction
Computer Vision
- image recognition 图像识别
- object detection 目标检测
- image inpainting 图像修复
- super-resolution 超分辨率
Natural Language Processing
- text classification 文本分类
- speech recognition 语音识别
- machine translation 机器翻译
Bioinformatics
sequence analysis 序列分析
- predict the effect of noncoding sequence variants 预测非编码序列变异的影响
- model the transcription factor binding affinity landscape 对转录因子结合亲和性的建模 (PS:有一篇文章可以看)
- improve DNA sequencing and peptide sequencing
- analyze DNA sequence modification 分析DNA序列的变异
- model various post-transcription regulation events 模拟各种转录后调控事件
structure prediction and reconstruction 结构预测和重建
- protein secondary structure 蛋白质的二级结构
- model the protein structure when it interacts with other molecules
- predict protein contact maps and the structure of membrane proteins 预测蛋白质的接触图和膜蛋白的结构
- accelerates the fluorescence microscopy super-resolution
biomolecular property and function prediction 生物分子性质和功能预测
- predicts enzyme detailed function by predicting the Enzyme Commission number 通过预测酶的EC number 预测酶的详细功能
- predict the protein Gene Ontology (GO)
- predicts the protein subcellular location 预测蛋白质亚细胞位置
biomedical image processing and diagnosis 生物图像处理和诊断
- classifying skin cancer
- predict fluorescent labels from transmitted-light images of unlabeled biological samples
- analyze the cell imagining data
biomolecule interaction prediction and system biology 生物相互作用预测与统生物学
- model the hierarchical structure and the function of the whole cell 对整个细胞的层次结构和功能进行建模
- predict novel drug-target interaction 预测新的药物靶点相互作用
- model polypharmacy sides effects - multi-modal graph convolutional networks
Deep Learning Methods
通过训练一个神经网络(具有非线性函数)来表达特征和标签之间的隐含关系。
需要训练参数 W, 让模型 fit data,实现这一目的的算法是前向-反向传播(forward-backward propagation
),通过最小化前向输出和标签之间的差异(loss or error)直到模型收敛
常用的 activation function:
hidden layer — ReLU
output layer — softmax
常用的 loss function:
classification — cross-entropy
regression — mean squared error
optimizer:
stochastic gradient descent (SGD)
Momentum with learning rate decay — understanding the problem
Adam — not familiar with the problem
RMSprop
CNN Deep Learning Architecture
local connectivity 局部连接
weight sharing 权重共享
模型架构:
AlexNet、VGG、GoogleNet、ResNet、SENet、DenseNet、DPN
RNN Deep Learning Architecture
模型架构:
LSTM、Bi-RNN、GRU
Graph Nerual Networks
Primary task:
extract and encode the topological and connectivity information from the network 提取并编码网络的拓扑和连接信息
为了保证网络中节点的信息(邻居信息),构建一棵邻居树
Generative models: GAN and VAE — unsupervised learning
学习数据分布并且生成带有一些变化的新数据点
GAN
Variational Autoencoder
Autoencoder 并不能产生新的数据
Variational antoencoder
Applications of deep learning in bioinformatics
1. Identifying enzymes using multi-layer neural networks
identify enzyme sequences based on sequence information using deep learning based methods 用深度学习的方法基于序列信息预测酶的序列
encoder the protein sequences into numbers → Forward Nerual Network
2. Gene expression regression
different genes’ expression can be highly correlated
profiling around 1000 carefully selected landmark genes and predicting the expression of the other target genes based on computational methods and landmark gene expression
3. RNA-protein binding sites prediction with CNN
RNA-binding proteins (RBP) RNA 结合蛋白
将RNN序列编码为 2D tensors
4. DNA sequence function prediction with CNN and RNN
predict the functionality of non-coding DNA sequences 预测DNA序列的非编码区的功能
5. Biomedical image classification using transfer learning and ResNet
6. Graph embedding for novel protein interaction prediction using GCN
graph embedding — PPI networks
使用 GCN 学得节点(蛋白质)的嵌入表示,然后 apply the interaction operation (inner product) to each pair of nodes
7. Biology image super-resolution using GAN
8. High dimensional biological data embedding and generation with VAE
Perspectives: limitations and suggestions
1. Lack of data
transfer learning
use a well trained model from another similar task and fine tune the last one or two layers using the limited real data
data augmentation
simulated data
2. Overfitting
acts on the model parameters and the model architecture :
- dropout
- batch normalization
- weight decay
3. Imbalanced data
- use the right criteria to evaluate the prediction result and the loss 使用恰当的评价标准衡量模型的预测结果
- upsample smaller classes
- downsample larger classes
4. Interpretability
查看输入的每一部分的重要性分数
Perturbation-based approaches
Backpropagation-based methods
5. Uncertainty scaling
legendary Platt scaling
histogram binning
isotonic regression 保序回归
Bayssian Binning into Quantiles
temperature scaling
6. Catastrophic forgetting
- regularizations:EWC
- dynamic nerual network
- rehearsal training methods:iCaRL
7. Reducing computational requirement and model compression 减少计算需求并进行模型压缩
- parameter pruning :reduces the redundant parameters
- knowledge distillation
- use compact convolutional filters to save parameters
- low rank factorization
- 本文作者: Kelly Liu
- 本文链接: http://tiantianliu2018.github.io/2019/09/08/论文阅读《Deep-Learning-in-bioinformatics-introduction-application-and-perspective-in-big-data-era》/
- 版权声明: 本博客所有文章除特别声明外,均采用 MIT 许可协议。转载请注明出处!