Update 5 | 2020.04.28

Yale SARS-CoV-2 Genomic Surveillance Initiative

In this report, we present preliminary results related to 17 new SARS-CoV-2 genomes collected from COVID-19 patients in Connecticut between April 2nd and April 17th, 2020. In the analysis described below, which results can be found here, we combined the new genomes with 85 genomes previously sequenced by our team, for a total of 102 genomes. For more information on how Genomic Epidemiology works, please check out this page.

⚠️ WARNING: These results should be considered preliminary data, as they may change in light of new evidence.

Public Health implications

Our analysis of the 17 new SARS-CoV-2 genomes indicates that local transmission is being sustained within Connecticut (CT). We previously observed that new COVID-19 outbreaks in CT arose mainly from interstate and coast-to-coast spread of the virus. More recently, our genomic data suggest that the majority of new viral lineages in CT are coming from New York. The genomes we sequenced in the last 7 days cluster with other genomes collected in CT, indicating that, after spreading from New York, sustained local transmission of SARS-CoV-2 is occurring among CT residents.

Over the past 30 days in Connecticut, most genomes sequenced by our team were associated with repeated viral introductions from NY into CT, but introductions from other states in the US were also observed, especially in March (see the A.1/Washington cluster, Figure 1). So far we did not find evidence of viruses coming from Massachusetts into CT. The few MA genomes are grouped in a specific cluster, far from most CT genomes (see arrow in Figure 1).


Evidence-based policies

Variation in case numbers, trends, and attitudes towards social distancing have caused differences in policies between states and counties. Considering the frequent occurrence of state-to-state viral spread, it is clear that state policies have implications for their neighbors. It is important that policymakers work together across municipal and state boundaries to coordinate the most effective interventions and limit further spread. In addition, given that outbreaks driven by local transmission are still unfolding in major urban centers, and considering that most of the population in CT (and in the US) is still susceptible to the virus, policymakers must consider epidemiological data and information on viral dynamics to plan safe reopening strategies. Such data, coupled with large scale testing, are essential to minimize the current wave, and later waves of viral transmission.

Data interpretation

SARS-COV-2: lineages A and B

Genomes sequenced by our team are found in both major SARS-CoV-2 lineages: A and B. These lineages are further divided into nested sub-lineages. In lineage A, most of our viral genomes cluster within the lineage A1 and the sub-lineage A1.2 (see Figure 2). In lineage B, the biggest group among SARS-CoV-2, our sequenced genomes cluster within the lineages B1 (sub-lineages B1.3, B1.11) and B2 (Figure 3).

Introductions of lineage A viruses, and their transmission chains in Connecticut

Among the 102 genomes sequenced by our team since March 18th, only 20 (around 19.5%) belong to lineage A (Figure 1). In this lineage, at least two introductions (most likely from Eastern Asia) have led to sustained transmission chains in North America. The largest one consists of genomes extracted from viruses circulating in Canada and all four major US regions, mainly in the West (WA state). Local introductions of lineage A1 viruses into Connecticut took place mostly in March. In this current update, only one new genome (Yale-102), collected on April 6th (Figure 2, see arrow), was found to group with other genomes from CT and NY in lineage A.


Lineage B1: predominant in New York state and Connecticut

In the current update, most sequenced genomes (15 out of 17) belong to viruses from lineage B1 (red circles in Figure 3), a large group of genetically-related viruses found mainly in the US. Nearly 76.5% (78 out of 102) of all genomes sequenced by our group are found in this cluster. Genomes from the West, Midwest, and Southern US are also observed in the B1 lineage, indicating constant spread between the Northeast and other US regions. Among our newly-sequenced genomes presented here, 10 out of 17 are specifically found in the sub-lineage B1.3, a growing cluster of genomes closely related to NY genomes. Based on data available so far, this scenario of a growing B1 cluster reinforces the central role of NY as a source of viruses causing outbreaks in CT. Finally, we now observe that, in addition to viral spread from NY, the epidemic in Connecticut is also being sustained by transmission chains within CT residents.



We combined our new dataset of 17 SARS-COV-2 genomes with other 85 genomes reported by our team in Update 4. These samples were sequenced using a MinION platform, following an ARTIC protocol. To perform preliminary analysis, we also downloaded other 483 genomes available on GISAID, from around the world and the US, to uncover recent patterns of viral spread within and from Northeastern USA in the past weeks. Sequence alignment and phylogenetic analysis were performed using the nextstrain pipeline.

Data availability

The directories consensus_genomes and metadata in our GitHub repository contain all of our current SARS-COV-2 genomes and metadata. The directory auspice contains a JSON file that was produced using the nextstrain pipeline. A list of GISAID accession numbers of genomes used in this report can be downloaded on a link at the bottom of our nextstrain page.


We thank Mary Petrone, Anne Wyllie, Chantal Vogels, and Isabel Ott, which performed the viral RNA extractions. Tara Alpert and Joseph Fauver, for preparing samples for sequencing, and assembling the SARS-CoV-2 genomes. Cole Jensen and Anderson Brito for performing the phylogenetic analysis. Cole Jensen, Chaney Kalinich, Mary Petrone, Anderson Brito and Nathan Grubaugh for writing and reviewing this report. Chaney Kalinich and Peter Neugebauer for developing and maintaining the COVIDTracker website. Nathan Grubaugh for leading this SARS-CoV-2 genomic epidemiology project. Finally, we also thank the authors of the genomes in our complementary dataset: a list of authors is provided at the bottom of our dedicated nextstrain page.

Grubaugh Lab | Yale School of Public Health (YSPH) | https://grubaughlab.com/

Leave a Reply

Your email address will not be published. Required fields are marked *