ID CHLYSE RUT's Reflection /
PSM 2 - HOST AND INFECTIVITY PREDICTION OF COVID-19 USING CONVOLUTIONAL NEURAL NETWORK AND LONG SHORT-TERM MEMORY

Abstract

A novel virus outbreak now known as Coronavirus Disease (COVID-19) happened in December 2019 in Wuhan, Hubei Province, China; this paper doves into a virus-host prediction of this virus’s sequences extracted from Nucleotide Centre Biotechnology Information, GenBank. Studies on virus-host prediction have been published in recent years but “VIDHOP” a predicting tool created by Florian Mock will be used on COVID-19 to obtain potential host results. The datasets retrieved from Kaggle on coronavirus includes host information and nucleotide sequences. After aligning the sequences using Clustal Omega, the data is run through Google Colaboratory to prepare the cleaned data as input for VIDHOP. Cleaning up, sorting to subsets of hosts, extracting nucleotide sequences, and finally giving the data to VIDHOP in its proper syntax. The VIDHOP program uses its CNN+LSTM architecture to process the 2375 sequences and sort them to what host it calculates to be of a potential reservoir to COVID-19. This paper records an approximate accuracy of 80% on VIDHOP’s prediction although the result obtained might still be doubtful as the result from the Phylogenetic tree does not correlate with the result obtained in this research, there could be a problem in faulty, limited data or under progressed neural networks. Though better computing power or updated datasets could further be used in this work to boost the accuracy of its result.

 

Attached to this post are two videos which are;

1. Presentation Video (Concept-Based) : https://youtu.be/vZUVlpXj5us

2. Demo Video: https://youtu.be/4uySgIr5ohM