architecture

DL architectures based on CNN Virus and used in this project

The paper “A multi-task CNN learning model for taxonomic assignment of human viruses” proposes a CNN model applied to virus.

“A multi-task CNN learning model for taxonomic assignment of human viruses”

by Haoran Ma, Tin Wee Tan and Kenneth Hon Kim Ban

From 19th International Conference on Bioinformatics 2020 (InCoB2020) - Virtual. 25-29 November 2020

The module architecture includes:

The architecture from used in the original paper model, refactor to use the latest version of Keras. This is based on the authors’ github repo.
All other variants we are developing, whether they have the same architecture and are simply retrained or finetuned, or whether they adopt a new architecture.

Original Architecture

Below are the functions to create model with the same architecture as the original paper.

source

create_model_original

 create_model_original (load_parameters:bool=True,
                        path2parameters:pathlib.Path|None=None,
                        to_gpu:bool=True)

Build a CNN model as per CNN Virus paper

	Type	Default	Details
load_parameters	bool	True	Load pretrained weights when True
path2parameters	pathlib.Path \| None	None	Path to pretrained weights, defaults to project CNN Virus weights
to_gpu	bool	True	Move model to GPU if possible when True
Returns	Model		New instance of an original paper architecture

ProjectFileSystem().data

Path('/home/vtec/projects/bio/metagentorch/data')

This is the original paper’s model, taking 50-mer sequences and predicting:

the logits for 187 viruses
the logits for one of 10 regions in the original virus genome

model = create_model_original(load_parameters=False)

Creating CNN Model (Original)
Created randomly initialized model

model.summary()

Model: "CNN_Virus"

┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ Layer (type)        ┃ Output Shape      ┃    Param # ┃ Connected to      ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ input-seq           │ (None, 50, 5)     │          0 │ -                 │
│ (InputLayer)        │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv-1 (Conv1D)     │ (None, 50, 512)   │     13,312 │ input-seq[0][0]   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ bn-1                │ (None, 50, 512)   │      2,048 │ conv-1[0][0]      │
│ (BatchNormalizatio… │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ maxpool-1           │ (None, 25, 512)   │          0 │ bn-1[0][0]        │
│ (MaxPooling1D)      │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv-2 (Conv1D)     │ (None, 25, 512)   │  1,311,232 │ maxpool-1[0][0]   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ bn-2                │ (None, 25, 512)   │      2,048 │ conv-2[0][0]      │
│ (BatchNormalizatio… │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ maxpool-2           │ (None, 13, 512)   │          0 │ bn-2[0][0]        │
│ (MaxPooling1D)      │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv-3 (Conv1D)     │ (None, 13, 1024)  │  3,671,040 │ maxpool-2[0][0]   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv-4 (Conv1D)     │ (None, 13, 1024)  │  7,341,056 │ conv-3[0][0]      │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ bn-3                │ (None, 13, 1024)  │      4,096 │ conv-4[0][0]      │
│ (BatchNormalizatio… │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ maxpool-3           │ (None, 7, 1024)   │          0 │ bn-3[0][0]        │
│ (MaxPooling1D)      │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ flatten (Flatten)   │ (None, 7168)      │          0 │ maxpool-3[0][0]   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dense-1 (Dense)     │ (None, 1024)      │  7,341,056 │ flatten[0][0]     │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ bn-4                │ (None, 1024)      │      4,096 │ dense-1[0][0]     │
│ (BatchNormalizatio… │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ do-1 (Dropout)      │ (None, 1024)      │          0 │ bn-4[0][0]        │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ labels (Dense)      │ (None, 187)       │    191,675 │ do-1[0][0]        │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ concat              │ (None, 1211)      │          0 │ do-1[0][0],       │
│ (Concatenate)       │                   │            │ labels[0][0]      │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dense-2 (Dense)     │ (None, 1024)      │  1,241,088 │ concat[0][0]      │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ bn-5                │ (None, 1024)      │      4,096 │ dense-2[0][0]     │
│ (BatchNormalizatio… │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ pos (Dense)         │ (None, 10)        │     10,250 │ bn-5[0][0]        │
└─────────────────────┴───────────────────┴────────────┴───────────────────┘

 Total params: 21,137,093 (80.63 MB)

 Trainable params: 21,128,901 (80.60 MB)

 Non-trainable params: 8,192 (32.00 KB)

model = create_model_original(load_parameters=True)

Creating CNN Model (Original)
Loading parameters from pretrained_model.h5
Created pretrained model