Update 9 | 2020.06.10

Yale SARS-CoV-2 Genomic Surveillance Initiative

Leer esta página en Español

What’s new

This week, we have added 37 new genomes that were collected in late March and early April from Connecticut (CT).  We have also included 47 newly released genomes from Massachusetts (MA), adding to the 19 genomes that were previously available and significantly increasing our ability to observe virus spread between  CT  and MA. These additional genomes will improve our understanding of the spread of SARS-CoV-2 in CT when strict social distancing measures were in place and provide insight into spread between states. We found that most of the new sequences from CT are in the NY-clade. Importantly, the close relatedness of some of these genomes suggest the presence of local transmission chains. We also found the first evidence of spread between CT and MA.

⚠️ WARNING: These results should be considered preliminary, as they may change in light of new data.


We previously showed that outbreaks in CT were likely seeded by domestic introductions from the West Coast and New York (NY). We also showed that the genetic sequences collected from COVID-19 cases in southern CT in late April were more closely related to other sequences in CT and NY than other introductions from international or other domestic sources.

Genomes collected in CT can be delineated into two groups, called clades, based on their ancestral history. Clades are made up of smaller groups of relatively closely related viruses based on a common ancestor called lineages. These lineages and clades are labeled in the interactive phylogeny on our nextstrain page. The genomes that first circulated on the West Coast (referred to as the WA-Clade, because the earliest detected case in this group was in Washington state) are part of the ‘A’ lineage. These genomes were collected from some of the earliest cases identified in CT. Genomes collected in March and early to mid-April are in the ‘B.1’ lineage in the NY-Clade (referred to as the NY-Clade, because the earliest detected case in this group was in New York). As we collect more sequences,  we are finding more and more genomes from the ‘B.1’ lineage. Within the 136 sequences we sampled within this lineage, there appeared to be a number of separate introductions followed by sustained transmission clusters later in March and in early April.

 SARS-CoV-2 lineages in CT

LineageSub-lineageIntroduced to CT fromSampling DatesNumber of sequences
AA.1Initially introduced to the Western U.S./Canada at least twice, now sustained in CTMarch 8 to April 922 (1 new)
NoneEast Asia/Oceania to U.S. NortheastMarch 13 to April 64
BB.1 (mostly B.1.3 and B.1.1)New YorkMarch 11 to April 17136 (36 new)
B.2Southeastern Asia to U.S. northeastMarch 6 to April 103

Massachusetts & Connecticut

With this update, we see the spread of SARS-CoV-2 between CT and MA for the first time. One reason why we did not previously observe this spread is the small number of MA SARS-CoV-2 genomes that were previously available. We collecte new MA genomes that were published in the NCBI’s Virus Variation and included them in the current update. Previously, MA genomes clustered closely together and far from CT genomes. New MA genomes, collected from March through early April, are found in both the WA-Clade and the NY-Clade. They are interspersed among CT genomes throughout the tree, which indicates interstate spread between CT and MA. More information about the situation in MA can be found here.

New genomes in CT

We added 37 new sequences from CT to our analysis and found that 36 of them are most closely related to other sequences sampled in NY and CT. We know that NY and other places seeded early CT outbreaks, but this finding reinforces NY’s central role in the continuing of CT outbreaks. The continued grouping of sequences also highlights persistent local transmission that is occurring in the CT. This is because we are seeing increasing proportions of new CT genomes clustering with each other instead of out-of-state genomes which indicates local transmission. 

Public Health Significance

We showed in previous updates that outbreaks in CT were mostly seeded by domestic introductions from the West Coast and New York, with varying frequencies between the two sources throughout March and early April. The CT genomes collected mainly in March and April indicate a significant amount of local transmission occurring during those months.

Policy implications

Our earlier analyses demonstrated the vital importance of inter-municipality and inter-state cooperation and coordination with regards to testing, contact tracing, and social distancing interventions. It is clear that viruses spread across municipal and state boundaries, meaning state and municipality level policies have implications for their neighbors. It is important, as policymakers look towards relaxing social distancing measures, to work together across geographic boundaries and coordinate interventions to limit further spread. 

As we sequence more genomes, we will be looking to understand at what geographic level does the most spread occur (e.g. within cities, a single state, a few states, nationwide, or new international introductions). This data will be coupled with case numbers to determine how longer-term intervention strategies are working and where the strategies can improve.


We used 128 genomes that we previously sequenced and added 37 newly sequenced genomes. The previous 128 genomes were sequenced using the Nanopore MinION platform, and the 37 new sequences with an Illumina MiSeq. We used a targeted amplicon sequencing approach following an adapted ARTIC network. To perform preliminary analysis, we also downloaded other 515 genomes available on GISAID, from around the world and the US, to uncover recent patterns of viral spread within and from Northeastern USA in the past weeks. Sequence alignment and phylogenetic analysis were performed using a nextstrain pipeline. Geographic information for each sequence was aggregated by zip code areas with more than 50,000 inhabitants, mostly matching existing CT towns and county borders.

Data availability

The directories consensus_genomes and metadata in our GitHub repository contain all of our current SARS-COV-2 genomes and metadata. The directory auspice contains a JSON file that was produced using the nextstrain pipeline. A list of GISAID accession numbers of genomes used in this report can be downloaded on a link at the bottom of our nextstrain page.


Mary Petrone, Anne Wyllie, Chantal Vogels, Ed Courchiane, Sarah Prophet, Isabel Ott, and Chaney Kalinich performed the viral RNA extractions. Tara Alpert and Joseph Fauver prepared samples for sequencing and assembled the SARS-CoV-2 genomes. Cole Jensen and Anderson Brito performed the phylogenetic analysis. Cole Jensen, Chaney Kalinich, Mary Petrone, Anderson Brito, and Nathan Grubaugh wrote and reviewed this report. Chaney Kalinich and Peter Neugebauer developed and maintain this COVIDTrackerCT website. Mario Peña-Hernández leads all Spanish translation. Nathan Grubaugh leads the Yale SARS-CoV-2 Genomic Epidemiology Initiative. Finally, we also thank the authors of the genomes in our complementary dataset for making their data freely available to other researchers: a full list of authors is provided at the bottom of our dedicated nextstrain page.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s