architecture

DL architectures based on CNN Virus and used in this project

The paper “A multi-task CNN learning model for taxonomic assignment of human viruses” proposes a CNN model applied to virus.

“A multi-task CNN learning model for taxonomic assignment of human viruses”

by Haoran Ma, Tin Wee Tan and Kenneth Hon Kim Ban

From 19th International Conference on Bioinformatics 2020 (InCoB2020) - Virtual. 25-29 November 2020

The module architecture includes:

Original Architecture

Below are the functions to create model with the same architecture as the original paper.


source

create_model_original

 create_model_original (load_parameters:bool=True,
                        path2parameters:pathlib.Path|None=None,
                        to_gpu:bool=True)

Build a CNN model as per CNN Virus paper

Type Default Details
load_parameters bool True Load pretrained weights when True
path2parameters pathlib.Path | None None Path to pretrained weights, defaults to project CNN Virus weights
to_gpu bool True Move model to GPU if possible when True
Returns Model New instance of an original paper architecture
ProjectFileSystem().data
Path('/home/vtec/projects/bio/metagentorch/data')

This is the original paper’s model, taking 50-mer sequences and predicting:

  1. the logits for 187 viruses
  2. the logits for one of 10 regions in the original virus genome
model = create_model_original(load_parameters=False)
Creating CNN Model (Original)
Created randomly initialized model
model.summary()
Model: "CNN_Virus"
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ Layer (type)         Output Shape          Param #  Connected to      ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ input-seq           │ (None, 50, 5)     │          0 │ -                 │
│ (InputLayer)        │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv-1 (Conv1D)     │ (None, 50, 512)   │     13,312 │ input-seq[0][0]   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ bn-1                │ (None, 50, 512)   │      2,048 │ conv-1[0][0]      │
│ (BatchNormalizatio… │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ maxpool-1           │ (None, 25, 512)   │          0 │ bn-1[0][0]        │
│ (MaxPooling1D)      │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv-2 (Conv1D)     │ (None, 25, 512)   │  1,311,232 │ maxpool-1[0][0]   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ bn-2                │ (None, 25, 512)   │      2,048 │ conv-2[0][0]      │
│ (BatchNormalizatio… │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ maxpool-2           │ (None, 13, 512)   │          0 │ bn-2[0][0]        │
│ (MaxPooling1D)      │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv-3 (Conv1D)     │ (None, 13, 1024)  │  3,671,040 │ maxpool-2[0][0]   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv-4 (Conv1D)     │ (None, 13, 1024)  │  7,341,056 │ conv-3[0][0]      │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ bn-3                │ (None, 13, 1024)  │      4,096 │ conv-4[0][0]      │
│ (BatchNormalizatio… │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ maxpool-3           │ (None, 7, 1024)   │          0 │ bn-3[0][0]        │
│ (MaxPooling1D)      │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ flatten (Flatten)   │ (None, 7168)      │          0 │ maxpool-3[0][0]   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dense-1 (Dense)     │ (None, 1024)      │  7,341,056 │ flatten[0][0]     │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ bn-4                │ (None, 1024)      │      4,096 │ dense-1[0][0]     │
│ (BatchNormalizatio… │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ do-1 (Dropout)      │ (None, 1024)      │          0 │ bn-4[0][0]        │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ labels (Dense)      │ (None, 187)       │    191,675 │ do-1[0][0]        │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ concat              │ (None, 1211)      │          0 │ do-1[0][0],       │
│ (Concatenate)       │                   │            │ labels[0][0]      │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dense-2 (Dense)     │ (None, 1024)      │  1,241,088 │ concat[0][0]      │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ bn-5                │ (None, 1024)      │      4,096 │ dense-2[0][0]     │
│ (BatchNormalizatio… │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ pos (Dense)         │ (None, 10)        │     10,250 │ bn-5[0][0]        │
└─────────────────────┴───────────────────┴────────────┴───────────────────┘
 Total params: 21,137,093 (80.63 MB)
 Trainable params: 21,128,901 (80.60 MB)
 Non-trainable params: 8,192 (32.00 KB)
model = create_model_original(load_parameters=True)
Creating CNN Model (Original)
Loading parameters from pretrained_model.h5
Created pretrained model