Update 6 | 2020.05.06

Yale SARS-CoV-2 Genomic Surveillance Initiative

In this report, we present our preliminary analysis of 19 SARS-CoV-2 genomes collected from COVID-19 patients in Connecticut between March 13th and April 9th, 2020. In the analysis described below, the results for which can be found here, we combined these 19 new genomes with 102 genomes previously sequenced by our team, for a total of 121 genomes. For more information on how Genomic Epidemiology works, please check out this page.

⚠️ WARNING: These results should be considered preliminary data, as they may change in light of new evidence.

Public Health Significance

Our analysis of the 19 new SARS-COV-2 genomes leads us to believe that there is still a significant amount of local transmission in Connecticut (CT). As in previous updates, we observed that many of the newly sequenced COVID-19 cases in CT trace back to interstate spread from New York. However, as the outbreak progresses and we gather more genomic data, we have seen increasing proportions of new sequences grouping with other CT genomes, rather than out-of-state genomes. This indicates that most infections are likely the result of local transmission in Connecticut.

We found a number of early introductions to CT in March that resulted from coast-to-coast spread (see the A.1/Washington cluster, Figure 1), but we do not observe further interstate spread from the West Coast among these new genomes. Based on genomic data, we do not find any evidence of virus introductions coming in CT from Massachusetts (MA). The MA genomes are grouped together in a cluster, all more distantly related from most CT genomes (see arrow in Figure 1), indicating that none of the samples we have sequenced have ancestral origins in MA. 

Evidence-based policies

Variation in case numbers, trends, and attitudes towards social distancing have caused differences in policies between states and counties. Considering the frequent occurrence of state-to-state viral spread, it is clear that state policies have implications for their neighbors. It is important that, as they look towards relaxing social distancing measures, policymakers work together across municipal and state boundaries to coordinate interventions and limit further spread. In addition, given that outbreaks driven by local transmission are still unfolding in major urban centers, and considering that most of the population in CT (and in the US) is still susceptible to the virus, policymakers must consider epidemiological data and information on viral dynamics to plan safe reopening strategies. Such data, coupled with large scale testing, are essential to minimize the current wave and later waves of viral transmission.


SARS-COV-2: lineages A and B

Genomes sequenced by our team can be found in both major lineages of SARS-CoV-2 lineages: A and B. A lineage is a classification of a group of relatively closely related viruses, based on a common ancestor. These lineages are labeled in the interactive phylogeny on our Nextstrain page. In lineage A, most of the viral genomes that our lab has sequenced cluster with lineage A1 and within the sub-lineage A1.2 (see figure 2). In the B lineage, the biggest group of SARS-CoV-2, most genomes sequenced by our lab belong to B1. The majority of our genomes within that lineage cluster in B.1.3 and B.11.  There are also smaller clusters that can be seen in B2 (see figure 3). 

LineageSub-lineageIntroduced fromSampling datesNumber of sequences
AA.1Initially introduced to the Western U.S./Canada at least twice, now sustained in CTMarch 8 to April 921 (3 new)
NoneEast Asia/Oceania to U.S. NortheastMarch 13 to April 64 (0 new)
BB.1 (mostly B.1.3 and B.1.1)New YorkMarch 11 to April 1795 (16 new)
B.2Southeastern Asia to U.S. northeastMarch 6 to April 103 (0 new)

New York and Connecticut’s fates are linked

In the current update, most of the new genomes (16 out of 19) belong to viruses from lineage B1 (Figure 2; new genomes are in red. This lineage is a large group of genetically-related viruses found mainly in the US. This is the predominant lineage found in New York, and nearly 83.5% (96 out of 121) of all genomes sequenced by our group are found in this cluster. Other genomes from throughout the US also cluster in the B1 lineage, indicating constant spread from the Northeast to other US regions and other countries. Among our newly-sequenced genomes presented here, 9 out of 19 are specifically found in the sub-lineage B1.3, and are more closely related to other CT genomes than those in NY. This indicates that, in addition to viral spread from NY, the epidemic in Connecticut is also being sustained by transmission chains within CT residents. Based on data available, the growing of the B1 cluster (i) reinforces the central role of NY as an early source of viruses seeding CT outbreaks, and (ii) highlight persistent local transmissions which are now sustaining the outbreaks in CT. 

Coast-to-Coast transmission persists, albeit with reduced frequency

Among the 121 genomes sequenced by our team since March 18th, 25 (around 20.7%) belong to lineage A (Figure 3). In this lineage, at least two introductions (most likely from Eastern Asia) have led to sustained transmission in North America. The largest grouping, A.1, consists of genomes from viruses circulating in Canada and all four major US regions with the earliest sequences mainly in the West. Many of the most closely linked genomes to those found in CT are in Washington State, indicating that the virus spread coast-to-coast from the West to the East coast. Most of the samples that indicated introductions of lineage A to CT were collected throughout March, and the three new sequences in this lineage either cluster with known introductions, indicating spread within CT after the introduction, or represent previously unsampled introductions in this earlier time frame. We did not find evidence of more recent introductions than those in mid-March; however, the number of more recently sampled sequences in CT is limited, so we can’t draw a definitive conclusion yet as to whether there has been further spread from distant outbreaks.


We combined our new dataset of 19 SARS-COV-2 genomes with other 102 genomes reported by our team in Update 5. These samples were sequenced using a MinION platform, following an ARTIC protocol. To perform preliminary analysis, we also downloaded other 653 genomes available on GISAID, from around the world and the US, to uncover recent patterns of viral spread within and from Northeastern USA in the past weeks. Sequence alignment and phylogenetic analysis were performed using a nextstrain pipeline.

Data availability

The directories consensus_genomes and metadata in our GitHub repository contain all of our current SARS-COV-2 genomes and metadata. The directory auspice contains a JSON file that was produced using the nextstrain pipeline. A list of GISAID accession numbers of genomes used in this report can be downloaded on a link at the bottom of our Nextstrain page.


Mary Petrone, Anne Wyllie, Chantal Vogels, and Isabel Ott performed the viral RNA extractions. Tara Alpert and Joseph Fauver prepared samples for sequencing and assembled the SARS-CoV-2 genomes. Cole Jensen and Anderson Brito performed the phylogenetic analysis. Cole Jensen, Chaney Kalinich, Mary Petrone, Anderson Brito and Nathan Grubaugh wrote and reviewed this report. Chaney Kalinich and Peter Neugebauer developed and maintain the COVIDTrackerCT website. Nathan Grubaugh leads the Yale SARS-CoV-2 Genomic Epidemiology Initiative. Finally, we also thank the authors of the genomes in our complementary dataset: a list of authors is provided at the bottom of our dedicated nextstrain page.

Leave a Reply

Your email address will not be published. Required fields are marked *