Genome sequencing arsenal of India

Aswini | 16 May 2020

Coronaviruses are known to affect the respiratory system in humans and are generally harboured from bats. Though they can infect us directly, in some cases like MERS-CoV, intermediate hosts have been observed. While coronavirus infections are usually harmless, some strains like the MERS-CoV can kill up to 30% of the infected cases. Amongst the many coronaviruses, we have now witnessed a new virus that has created a pandemic.

The new virus - SARS-CoV-2, has been a mystery to date. It shares only 76% genome similarity with the SARS CoV. While SARS CoV was known to affect the respiratory system and caused death in ~10% of the affected cases, the disease manifestation of SARS CoV-2 has been different in different parts of the world. Few countries like Italy and Spain have witnessed more deaths, while a few other countries observed less death. Along with various social and logistical factors, this disparity makes us question if there are multiple strains of SARS-CoV-2 or if it is mutating rapidly depending on its environmental factors.

It is also observed that SARS-CoV-2 spreads quite fast. Scientists are trying to understand the virus better and develop vaccines and treatment against it. And to do that, knowing the genome of the virus is crucial, as it will help us to determine if the virus is mutating, which in turn is necessary to determine the efficiency of the vaccines and drugs in all the strains.  A recent study published in Lancet speculates that there might be an intermediate host for SARS-CoV-2 and there is an urgent need to understand the evolution, adaptation and spread of the virus (1)

Most of the nations are sequencing the SARS-CoV-2 genomes in huge numbers. India is the 5th nation to sequence the viral genome and share the data with the international community. (2) Early in April, CSIR, through the partnership between National Centre for Disease Control (NCDC), New Delhi and CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), had submitted 53 sequences of COVID-19 genomes to the Global Coronavirus Genome Database, Global Initiative on Sharing All Influenza Data (GISAID). This is the largest submission of sequences, so far from India by any group. The joint NCDC-CSIR program is planning to accelerate molecular epidemiology and viral surveillance efforts of India (3).

That being said, multiple questions might be running in our mind. Like, how are we sequencing the genomes? What are the resources used? What is the capacity? India is using different kinds of sequencing technologies to understand the genetic make-up of the SARS-CoV-2. This includes MinION, Mi-Seq, Hi-Seq, Next-Seq, miniseq, Sanger sequencing and Ion-Torrent sequencing. All of these technologies have different capacities and yield a different amount of data. Sanger sequencing is a classical method of sequencing whose sequencing read length is 400bp – 900bp. Mi-Seq, for example, is a short-read sequencing technique used generally for high throughput sequencing. It has a sequencing read length of 50bp – 600bp and yields up to 25 million reads. MinION, a pocket-sized device, on the other hand, has a sequencing read length of up to 50,000bases and yields data up to 50Gb. It is also called as long-read sequencing.

More details of the sequencing technologies used by our nation are briefed into a highly informative visual representation (see below) by Vigneshwar Senthivel, a PhD student in CSIR-IGIB.



Vigneshwar is a PhD student in CSIR-IGIB. He is trying to understand inherited cardiac disorders as part of his doctoral work. He is passionate about Sci-Art. He has also created a series of myth busters for COVID-19 called the ‘Foolproof April’. In his free time, he enjoys drawing cartoons.