Server Introduction Code and Data Contact
Introduction page

Service description:

As shown in Figure 1, our model can process original variable-length protein sequence with One-hot encode. The model has two channels that includes LSTM module and CONV module.

In this paper, we propose a computational method based on deep neural network for predicting antiviral peptides, and also fine-tune the substitution matrix for specifically functional peptide. Our model is a dual-channel deep neural network, in order to extract different dimentional features from original variable-length sequence data. The LSTM module imports the peptide sequence length as an important element to classfication the antiviral peptide. The bi-directional recurrent neural network (BLSTM) can capture long-term dependencies for effectively studying sequential data. The CONV module applies the substitution matrix as kernels to extract the convolutional features. The dynamic neural network can deal with the variable length sequence data for analyzing the local evolution information. The final joint module concatenates the LSTM and CONV channels by two fully-connected layers, which integrates the evidence to classificate the antiviral peptide.

Our predictive model has several key competitive advantages. First important characteristic of our model is that we processe sequence data with no need for the feature extraction, whereas the LSTM and CONV channels can analyze peptide sequence from sequential and evolutionary levels, respectively. Furthermore, the PSSM feature extraction layer in the CONV channel can transform the orignal BLOSUM matrix into the specific evolutionary substitution matrix for antiviral peptide dataset. We can also use this strategy to generate the refined BLOSUM matrix in order to fit different peptide sequence learning task. Even more important, the input of our model is variable length sequence, which is just a peptide with any length from several residues to hundred or thousand residues. It is interesting to achieve that we only train the truth length peptides although we encode the sequence to max length one-hot code. In the LSTM channel, we use the state output with the time step specific to the sequence length. In the CONV channel, we add AvBlock layer to do average block on the sequence length PSSM matrix.

flow

Fig.1. DeepAVP Model.