Update 8 | 2020.05.28

Yale SARS-CoV-2 Genomic Surveillance Initiative

Leer esta página en Español

We added 5 new genomes this week from late April, which are now the most recent genomes from Connecticut (CT). These new genomes, sampled just after the peak in CT (April 20-22), will provide some insight into the transmission that was still happening with strict social distancing measures in place.


We previously showed how cases sampled in March through early April were closely related to other domestic outbreaks in the US. Specifically, 21 of our very early cases (in March) were related to SARS-CoV-2 circulating on the West Coast (found in a group we refer to as WA-clade, because of the earliest origin of this group, linked to Washington state). This clade is within the ‘A’ lineage of the virus globally. Within these 21, there are 12 genomes or clusters of CT cases that are more closely connected to WA cases than to other CT cases, indicating that there were several separate introductions from the west coast during the early epidemic in CT. This is supported by travel and case data.

SARS-CoV-2 lineages in CT

LineageSub-lineageIntroduced to CT fromSampling DatesNumber of sequences
AA.1Initially introduced to the Western U.S./Canada at least twice, now sustained in CTMarch 8 to April 921
NoneEast Asia/Oceania to U.S. NortheastMarch 13 to April 64
BB.1 (mostly B.1.3 and B.1.1)New YorkMarch 11 to April 17100
B.2Southeastern Asia to U.S. northeastMarch 6 to April 103

In addition, we found that throughout March and early/mid-April, more and more viruses in our samples fell within the ‘B.1’ viral lineage, which was first introduced to New York (and is thus labeled NY-clade). Within the 95 sequences we sampled within this lineage, there appeared to be a number of separate introductions followed by sustained transmission clusters later in March and in early April.

What’s new

We added 5 new sequences from the end of April, which are the first sequences sampled after the peak. This can provide data into how SARS-CoV-2 transmission was occurring with tight social distancing measures. We found that all 5 sequences are most closely related to other sequences sampled in NY and CT, but are not able to conclusively say whether they are the result of community transmission or new introductions. All 5 sequences fell into the larger NY clade, which could indicate that the NY clade has become the predominant one in CT; however, all 5 were also sequenced from the New Haven area, so it is very possible that there are other lineages circulating elsewhere.

The Bottom Line & Caveats

We showed in previous updates that outbreaks in CT were most likely seeded by domestic introductions from the West Coast and New York, to varying frequencies throughout March and early April. We show here that cases in Southern CT in late April, after the peak, were likely more related to transmission in CT and from NY than new introductions from international or domestic sources  outside the Northeast.

It’s important to note that the fact that the NY-clade has become predominant in South-Central CT does not necessarily indicate this clade is “more fit” or more transmissible than the other. A number of factors are much more likely to account for this, such as earlier or larger numbers of introductions of viruses from one lineage (founder effect), chance introduction of this clade into a population with high levels of transmission due to social factors (e.g. essential workers, individuals without stable housing, or individuals living in a long-term care facility), and even sampling bias (though we try to sample wherever there are cases, acquiring samples from throughout the state and from the asymptomatic transmission is very difficult).

Policy implications

Our earlier data demonstrated the vital importance of inter-municipality and inter-state cooperation and coordination with regards to testing, contact tracing, and social distancing interventions. Viruses spread without regards to municipal and state boundaries, so mitigation and suppression strategies must cross these boundaries as well.

We still have too few new post-peak genomes to draw definitive policy conclusions, as well as a strong bias towards Southern CT. As we sequence more, we will be looking to see at what level most transmission is occurring (e.g. just within cities, throughout a state, between states, whether we start to see international introductions). We also will begin to use this data in conjunction with case numbers to determine how longer-term mitigation strategies are working, and where there is room for improvement.


We used 121 genomes that we previously sequenced and sequenced 5 new genomes, all using a MinION platform following an ARTIC protocol. To perform preliminary analysis, we also downloaded other 653 genomes available on GISAID, from around the world and the US, to uncover recent patterns of viral spread within and from Northeastern USA in the past weeks. Sequence alignment and phylogenetic analysis were performed using a nextstrain pipeline. Geographic information for each sequence was aggregated by zip code areas with more than 50,000 inhabitants, mostly matching existing CT town and county borders.

Data availability

The directories consensus_genomes and metadata in our GitHub repository contain all of our current SARS-COV-2 genomes and metadata. The directory auspice contains a JSON file that was produced using the nextstrain pipeline. A list of GISAID accession numbers of genomes used in this report can be downloaded on a link at the bottom of our Nextstrain page.


Mary Petrone, Anne Wyllie, Chantal Vogels, Ed Courchiane, Sarah Prophet, Isabel Ott, and Chaney Kalinich performed the viral RNA extractions. Tara Alpert and Joseph Fauver prepared samples for sequencing and assembled the SARS-CoV-2 genomes. Cole Jensen and Anderson Brito performed the phylogenetic analysis. Cole Jensen, Chaney Kalinich, Mary Petrone, Anderson Brito, and Nathan Grubaugh wrote and reviewed this report. Chaney Kalinich and Peter Neugebauer developed and maintain this COVIDTrackerCT website. Mario Peña-Hernández leads all Spanish translation. Nathan Grubaugh leads the Yale SARS-CoV-2 Genomic Epidemiology Initiative. Finally, we also thank the authors of the genomes in our complementary dataset for making their data freely available to other researchers: a full list of authors is provided at the bottom of our dedicated nextstrain page.

Grubaugh Lab | Yale School of Public Health (YSPH) | https://grubaughlab.com/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s