ProjectFileSystem().data
Path('/home/vtec/projects/bio/metagentorch/data')
The paper “A multi-task CNN learning model for taxonomic assignment of human viruses” proposes a CNN model applied to virus.
“A multi-task CNN learning model for taxonomic assignment of human viruses”
by Haoran Ma, Tin Wee Tan and Kenneth Hon Kim Ban
From 19th International Conference on Bioinformatics 2020 (InCoB2020) - Virtual. 25-29 November 2020
The module architecture
includes:
Below are the functions to create model with the same architecture as the original paper.
create_model_original (load_parameters:bool=True, path2parameters:pathlib.Path|None=None, to_gpu:bool=True)
Build a CNN model as per CNN Virus paper
Type | Default | Details | |
---|---|---|---|
load_parameters | bool | True | Load pretrained weights when True |
path2parameters | pathlib.Path | None | None | Path to pretrained weights, defaults to project CNN Virus weights |
to_gpu | bool | True | Move model to GPU if possible when True |
Returns | Model | New instance of an original paper architecture |
This is the original paper’s model, taking 50-mer sequences and predicting:
Creating CNN Model (Original)
Created randomly initialized model
Model: "CNN_Virus"
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ Connected to ┃ ┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩ │ input-seq │ (None, 50, 5) │ 0 │ - │ │ (InputLayer) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv-1 (Conv1D) │ (None, 50, 512) │ 13,312 │ input-seq[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ bn-1 │ (None, 50, 512) │ 2,048 │ conv-1[0][0] │ │ (BatchNormalizatio… │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ maxpool-1 │ (None, 25, 512) │ 0 │ bn-1[0][0] │ │ (MaxPooling1D) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv-2 (Conv1D) │ (None, 25, 512) │ 1,311,232 │ maxpool-1[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ bn-2 │ (None, 25, 512) │ 2,048 │ conv-2[0][0] │ │ (BatchNormalizatio… │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ maxpool-2 │ (None, 13, 512) │ 0 │ bn-2[0][0] │ │ (MaxPooling1D) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv-3 (Conv1D) │ (None, 13, 1024) │ 3,671,040 │ maxpool-2[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv-4 (Conv1D) │ (None, 13, 1024) │ 7,341,056 │ conv-3[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ bn-3 │ (None, 13, 1024) │ 4,096 │ conv-4[0][0] │ │ (BatchNormalizatio… │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ maxpool-3 │ (None, 7, 1024) │ 0 │ bn-3[0][0] │ │ (MaxPooling1D) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ flatten (Flatten) │ (None, 7168) │ 0 │ maxpool-3[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dense-1 (Dense) │ (None, 1024) │ 7,341,056 │ flatten[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ bn-4 │ (None, 1024) │ 4,096 │ dense-1[0][0] │ │ (BatchNormalizatio… │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ do-1 (Dropout) │ (None, 1024) │ 0 │ bn-4[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ labels (Dense) │ (None, 187) │ 191,675 │ do-1[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ concat │ (None, 1211) │ 0 │ do-1[0][0], │ │ (Concatenate) │ │ │ labels[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dense-2 (Dense) │ (None, 1024) │ 1,241,088 │ concat[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ bn-5 │ (None, 1024) │ 4,096 │ dense-2[0][0] │ │ (BatchNormalizatio… │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ pos (Dense) │ (None, 10) │ 10,250 │ bn-5[0][0] │ └─────────────────────┴───────────────────┴────────────┴───────────────────┘
Total params: 21,137,093 (80.63 MB)
Trainable params: 21,128,901 (80.60 MB)
Non-trainable params: 8,192 (32.00 KB)