Update 11 | 2020.06.24

Yale SARS-CoV-2 Genomic Surveillance Initiative

Leer esta página en Español

What’s new

This week, we have added 40 new SARS-CoV-2 genomes that were collected in late March and early April from Connecticut (CT) and one genome from New York (NY). We have now sequenced 231 SARS-CoV-2 genomes from CT and 10 from NY. In total, including work from the University of Washington and the CDC, there are 306 genomes from CT. A total of 36 new genomes are clustered in the NY-clade near other CT genomes, which indicates the same pattern we have seen since mid-March: local transmission linked with New York. This pattern can be seen in figure 1. The genomes in our analysis predate the SARS-CoV-2 peak in CT, which occurred in late April. Since this peak, the number of new cases reported per day has consistently declined. Additionally, we included more SARS-CoV-2 genomes from Massachusetts (MA, n=275), and all available genomes for New Hampshire (NH, n=4), Rhode Island (RI, n=4), and Vermont (VT, n=1).

Figure 1. Set of virus genomes sequenced by the Yale SARS-CoV-2 Genomic Surveillance Initiative. Each circle represents one or more genomes (with size being proportional to the number of samples), and the edges connecting them represent spread between countries, states, and towns. The resolution in the CT map is based on a scheme that takes CT zip code areas and the population of those areas to define different geographic locations.

⚠️ WARNING: These results should be considered preliminary, as they may change in light of new data.


We previously showed that COVID-19 outbreaks in CT were likely seeded by domestic introductions from the West Coast and NY. This is supported by travel and case data. We also showed that most virus genomes sequenced from COVID-19 cases in CT from mid-March to late April were more closely related to other SARS-CoV-2 genomes in CT and NY than those of viruses introduced from international or other domestic sources.SARS-CoV-2 genomes sequenced from CT can be separated into many groups, called clades, based on their ancestral history. Within clades, groups of closely related viruses form lineages. The clusters of genomes are labeled in the interactive phylogeny on our nextstrain page. The viruses that first circulated on the West Coast are part of the ‘A’ lineage (found in a group referred to as the WA-Clade, because the earliest detected virus in this group was collected in Washington state). These genomes were collected from some of the earliest cases identified in CT. Genomes collected in March and early to mid-April are in the ‘B.1’ lineage, in the NY-Clade (referred to as the NY-Clade, because New York is the most likely recent origin of this group). As we collect more sequences, we are finding more and more genomes from the ‘B.1’ lineage. Within the 210 sequences we sampled within this lineage, there appeared to be a number of separate introductions followed by sustained transmission clusters later in March and in early April.

SARS-CoV-2 lineages in CT

LineageSub-lineageIntroduced to CT fromSampling DatesNumber of sequences
AA.1Initially introduced to the Western U.S./Canada at least twice, now sustained in CTMarch 8 to April 924 (1 new)
noneEast Asia/Oceania to U.S. NortheastMarch 13 to April 66 (3 new)
BB.1 (mostly B.1.3 and B.1.1)New YorkMarch 11 to April 17210 (36 new)
B.2B.2.1Southeastern Asia to U.S. northeastMarch 6 to April 103

New England & Connecticut

Surprisingly, we have not yet found evidence of interstate SARS-CoV-2 spread between CT and RI, NH, and VT with our genomic data. We include all available SARS-CoV-2 genomes from these states which are not currently interspersed with the CT genomes, which indicates a lack of interstate spread (Figure 2). However, the very limited sampling from these states suggests that we may be missing many patterns of spread throughout New England and further data is needed to verify these findings.

Figure 2. The relationship among SARS-CoV-2 genomes collected in Connecticut (mint green), Rhode Island, New Hampshire, Vermont, and Maine (all in pink).

We are now seeing interstate SARS-CoV-2 spread between CT and MA. In previous updates, the MA genomes clustered together and far from CT genomes. Now, with the addition of the new SARS-CoV-2 genomes, MA genomes in our dataset are interspersed among CT genomes throughout the tree (Figure 3). This indicates an interstate spread between CT and MA. More information about the situation in MA can be found here.

Figure 3. The relationship among SARS-CoV-2 genomes collected in Connecticut (mint green) and Massachusetts (all in pink).

New virus genomes in CT

We added 40 new SARS-CoV-2 genomes sequenced from CT to our analysis and found that 39 of the sequences are closely related to other sequences sampled in NY and CT; as can be seen in figure 4. The continued grouping of sequences also highlights persistent local transmission that is occurring in CT. This is evidenced by increasing proportions of new CT virus genomes clustering with each other instead of out-of-state genomes, which indicates local transmission. These 39 SARS-CoV-2 genomes are part of the NY-clade which indicates that the NY-clade has become the predominant clade in CT. We also know that NY and other places seeded early CT COVID-19 outbreaks, but this finding reinforces NY’s central role in the continuing of CT outbreaks.

Figure 4. The new SARS-CoV-2 genomes in Connecticut. 37 can be found in major clades. 36 can be found in the NY-clade (the mint dots connected to the green lines at the top of the phylogenic tree) and one can be found in the WA-clade (the mint dot connected to the red line at the bottom of the phylogenic tree). The geographic locations of each genome can be seen on the map.

Public Health Significance

Analysis of the CT SARS-CoV-2 genomes collected mainly in March and April indicates a significant amount of local transmission occurring during those months. As in previous updates, we observed that many of the newly sequenced COVID-19 cases in CT trace back to interstate spread from NY. However, as the COVID-19 outbreak progresses and we gather more genomic data, we have seen increasing proportions of new virus sequences grouping with other CT genomes rather than out-of-state genomes. This indicates that most infections are likely the result of local transmission in CT.

Policy Implications

Our earlier analyses demonstrated the vital importance of inter-municipality and inter-state cooperation and coordination with regards to testing, contact tracing, and social distancing interventions. It is clear that viruses spread across municipal and state boundaries, meaning state and municipality level policies have implications for their neighbors. As policymakers relax social distancing measures, it is important to work together across geographic boundaries and coordinate interventions to limit further spread. For example, the governors of CT, NY, and New Jersey announced a coordinated quarantine policy today. With the decrease in new COVID-19 cases in CT, understanding the importance of public health measures in policy and their implementation will help prevent new spikes in the number of COVID-19 cases.

As we sequence more SARS-CoV-2 genomes, we will be looking to understand at what geographic level does the most spread occur (e.g. within cities, a single state, a few states, nationwide, or new international introductions). This data will be coupled with case numbers to determine how longer-term intervention strategies are working and where the strategies can improve.


We analyzed 200 SARS-CoV-2 genomes that we previously sequenced and added 41 newly sequenced genomes. The 241 total genomes were generated using either the Nanopore MinION platform or the Miseq Illumina platform. We used a targeted amplicon sequencing approach following an adapted ARTIC network. To perform preliminary analysis, we also downloaded other genomes available on GISAID and NCBI, from around the world and the US, to uncover recent patterns of viral spread within and from Northeastern USA in the past months. Sequence alignment and phylogenetic analysis were performed using a nextstrain pipeline. Geographic information for each sequence was aggregated by zip code areas with more than 50,000 inhabitants, mostly matching existing CT towns and county borders.

Data availability

The directories consensus_genomes and metadata in our GitHub repository contain all of our current SARS-COV-2 genomes and metadata. The directory auspice contains a JSON file that was produced using the nextstrain pipeline. A list of GISAID/NCBI accession numbers of genomes used in this report can be downloaded on a link at the bottom of our nextstrain page.


Mary Petrone, Anne Wyllie, Chantal Vogels, Ed Courchiane, Sarah Prophet, Isabel Ott, and Chaney Kalinich performed the viral RNA extractions. Tara Alpert and Joseph Fauver prepared samples for sequencing and assembled the SARS-CoV-2 genomes. Cole Jensen and Anderson Brito performed the phylogenetic analysis. Cole Jensen, Chaney Kalinich, Mary Petrone, Anderson Brito, and Nathan Grubaugh wrote and reviewed this report. Chaney Kalinich and Peter Neugebauer developed and maintain this COVIDTrackerCT website. Mario Peña-Hernández leads all Spanish translations. Nathan Grubaugh leads the Yale SARS-CoV-2 Genomic Epidemiology Initiative. Finally, we also thank the authors of the genomes in our complementary dataset for making their data freely available to other researchers: a full list of authors is provided at the bottom of our dedicated nextstrain page.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s