Update 12 | 2020.09.04

Yale SARS-CoV-2 Genomic Surveillance Initiative

Recent outbreaks in Danbury, CT

This week we have sequenced 27 new SARS-CoV-2 genomes that were collected in mid-August, from outbreaks reported in Danbury, Connecticut (CT) (Figure 1). By sequencing and analyzing viral genomes using phylogenetic methods, we identified at least 4 transmission chains unrelated to others detected by our group from March to May. One large cluster contains 19 virus genomes, which comes from a lineage likely circulating in the Northeast US since March. The other seven virus genomes are grouped in three small clusters, containing 2 or 3 genomes, likely of domestic (South USA) or international (European) origins.

Figure 1. Origins and time of sampling of SARS-CoV-2 genomes sequenced from Connecticut. The red circle represents the 27 new genomes sequenced from Danbury, CT. Circle sizes are proportional to number of sequenced virus genomes (not number of COVID-19 cases).

⚠️ WARNING: These results should be considered preliminary, as they may change in light of new data.

Main findings

On Thursday, August 27th, our laboratory received 30 samples from Quest Diagnostics via the Connecticut Department of Health to perform genomic epidemiological analysis of a recent SARS-CoV-2 outbreak in Danbury, CT. Of the 30 samples, we were able to obtain 27 new SARS-CoV-2 genomes using our rapid sequencing approach. Combining these new virus genomes with other SARS-CoV-2 genomes sequenced sequenced by our group (234 from CT and 10 from NY), and also with 5857 from all over the world, we used phylogeographic analysis to uncover the possible origins of viruses causing recent outbreaks in Danbury. We identified that this recent outbreak of COVID-19 was caused by viruses from at least four distinct transmission chains (Figure 2A and B). The Danbury genomes belong to viral lineages that are not directly related to lineages previously detected by our group since March 2020.

Figure 2. Phylogenies of 27 genomes sequenced from samples collected in Danbury. (A)The four long lines leading to pink circles (clusters of genomes) suggest that at least four transmission chains (viral lineages) caused the recent outbreak. (B) Same phylogeny as shown on (A), with colours highlighting genomes sequenced by our group, from March-August. The recent cases seem not to be directly related to viral lineages previously detected in Connecticut.

Our findings show that one large cluster of 19 viral genomes (cluster #1, Figure 3A) belong to viral lineage that was introduced in New York state back in February/March, but are now found in all regions of the United States, but specially in the Northeast. The other 7 Danbury genomes are grouped in three distinct clusters, containing 2 or 3 genomes (Figure 3B, C and D)

Figure 3. Subtrees highlighting the four SARS-CoV-2 lineages causing outbreaks in Danbury. (A) In the largest cluster (#1) there are 19 genomes sequenced from viruses likely circulating in Northeast US since February/March. (B) Cluster #2 shows two virus genomes, introduced in Connecticut from the South, most likely from Florida. Finally, (C) and (D) show small clusters of three genomes from Danbury (#3 and #4), which derive from lineages that once circulated in Europe early in the pandemic (March).

The second cluster (#2) has two virus genomes, and is grouped among genomes that were sampled in Florida in June and July (Figure 3B). This suggests the South as the possible origin of that SARS-CoV-2 lineage. The third cluster (#3) shows three virus genomes closely related to genomes collected in the United Kingdom (Figure 3C). These European genomes date back to April, and may indicate an early origin in Europe. However, the data cannot tell us about more recent origins of those three CT virus genomes. Finally, a fourth cluster (#4),  containing three virus genomes, also belongs to a lineage with early origin in Europe, dating back to March 2020 (Figure 3D). Due to the sparse sampling over the past four months, we could not rule out whether the clusters #3 and #4 have been circulating locally in CT since March or April. Genomes from the previous four months, once sequenced, could provide us a better picture about the dynamics of viral spread in Connecticut. These 27 newly sequenced viruses seem to belong to new lineages, unrelated to the ones our group sequenced from March to May in the state. This could indicate new introductions in Connecticut, or lineages that have been circulating cryptically in our community, only to be detected once a large outbreak takes place.

Public Health Significance

Policy Implications

By sequencing viral genomes from recent outbreaks we detected new transmission chains not yet identified in previous months. The coronavirus SARS-CoV-2 can circulate cryptically for weeks or months before it is identified in a major outbreak. The virus does not acknowledge geographic borders, and continues to spread from state to state within the US, via short or long range commuting (flights). Our phylogenetic results confirm this pattern and highlight the vital importance of inter-municipality and inter-state cooperation to curb the viral spread, via testing, contact tracing, and social distancing interventions.

The sequencing of SARS-CoV-2 genomes is particularly beneficial when sampling done continuously (weekly). With viruses being sequenced regularly, we would be able to fill the gaps, and get a better understanding about the geographic level at which most spread occurs (e.g. within cities, a single state, a few states, nationwide, or new international introductions). With samples and analyses being released in near-real time, as done in March and April, genomic epidemiology can provide valuable information for planning direct interventions to prevent further spread.


We analyzed 6101 SARS-CoV-2 genomes that we previously sequenced and added 27 newly sequenced genomes, from Danbury, CT. The 27 new genomes were generated using a Nanopore MinION platform. We used a targeted amplicon sequencing approach following an adapted ARTIC network. To perform preliminary analysis, we also downloaded other genomes available on GISAID and NCBI, from around the world and the US, to uncover recent patterns of viral spread in Connecticut. Sequence alignment and phylogenetic analysis were performed using a nextstrain pipeline. Geographic information for each sequence from CT was aggregated by zip code areas with more than 50,000 inhabitants, mostly matching existing CT towns and county borders.

Data availability

The directories consensus_genomes and metadata in our GitHub repository contain all of our current SARS-COV-2 genomes and metadata. The directory auspice contains a JSON file that was produced using the nextstrain pipeline. A list of GISAID/NCBI accession numbers of genomes used in this report can be downloaded on a link at the bottom of our nextstrain page. The GISAID Accession numbers for genomes included in this update are EPI_ISL_527738, EPI_ISL_527761-EPI_ISL_527782.


Joseph Fauver, Mary Petrone, and Tara Alpert processed and sequenced the samples. Joseph Fauver processed the sequencing data, and Anderson Brito performed the phylogenetic analysis. Anderson Brito, Joseph Fauver and Nathan Grubaugh wrote and reviewed this report. Chaney Kalinich and Peter Neugebauer developed and maintain this COVIDTrackerCT website. Mario Peña-Hernández leads all Spanish translations. Nathan Grubaugh leads the Yale SARS-CoV-2 Genomic Epidemiology Initiative. Finally, we also thank the authors of the genomes in our complementary dataset for making their data freely available to other researchers: a full list of authors is provided at the bottom of our dedicated nextstrain page.

Leave a Reply

Your email address will not be published. Required fields are marked *