Yale SARS-CoV-2 Genomic Surveillance Initiative
This report shows preliminary results related to 30 new SARS-CoV-2 genomes from clinical samples collected in Connecticut between March 11 and April 08, 2020. These samples were sequenced by Joseph Fauver and Tara Alpert, using a MinION platform and an ARTIC protocol. Phylogenetic analysis were performed and results were interpreted by Anderson Brito. The results reported in this update were obtained using a `nextstrain` pipeline, and can be also visualized here.
The directories `consensus_genomes` and `metadata` in the main page in this repository contain all of our current SARS-COV-2 genomes and metadata.
⚠️ WARNING: These results should be considered as preliminary data: they may change in light of new evidences.
We combined our new dataset of 30 SARS-COV-2 genomes with other 40 genomes shown by our team in the Update 2. Figure 1 below shows where all genomes released by the Yale SARS-CoV-2 Genome Surveillance Initiative were collected. To perform preliminary analysis, we also collected other 410 genomes available on GISAID, from around the world and the US, to uncover recent patterns of viral spread within and from Northeastern USA in the past weeks. We thank the authors of the genomes in such complementary dataset: a list of authors is provided at the bottom of our dedicated nextstrain page.
The nomenclature system for SARS-CoV-2
A nomenclature system for the main SARS-CoV-2 lineages was proposed by leading virologists and evolutionary biologists last April 8th (full description and rationale can be found here). In our reports we will start to refer to viral lineages using such system. Viruses sequenced by our initiative are found in both major SARS-CoV-2 lineages: lineage A and B, which in turn are divided into nested sub-lineages. So far, in lineage A our viral genomes cluster within the lineage A1, and the sub-lineage A1.2. On lineage B, the biggest group among SARS-CoV-2, our sequenced genomes cluster within B1 (sub-lineages B1.3, B1.11) and B2 (see Figure 1).
Coast-to-coast transmission within lineage A1
Lineage A1 is so far composed mostly by genomes collected in the US and Canada. Canadian genomes cluster in a basal group where a previously sequenced Connecticut genome is also found (Yale-005, green circle at the base of A1, Figure 2), forming a basal group with viruses from British Columbia, one of the earliest places were lineage A1 was introduced. With the data available so far, genomes from Connecticut still show to be closely related to genomes sequenced from Washington State samples, fact that suggest coast-to-coast transmission, as shown in a previous manuscript released by our group (already submitted for peer reviewing). The current scenario of transmissions is supported mainly by genomic data generated by our group, and genomes publicly available generated by Harrigan et al and Roychoudhury et al (see here). As more data are available, the patterns presented here may change, potentially revealing intermediate steps of transmission involving other states or regions in North America.
Lineage B2 in Connecticut: international or domestic introductions?
One of our newly sequenced genomes (Yale-055) groups with Yale-028 in lineage B2. The genetic relatedness between these viruses reveal a new transmission chain in CT, which was likely seeded by an early international viral introduction from Europe, or from viruses circulating in the Northeastern US, as suggested by genomes sequenced by Aguero-Rosenfeld et al, Gonzalez-Reiche et al, and Marti-Carreras et al (see here and in Figure 3). As the number of genomes grows, this pattern will be better resolved.
Most of the sequences from outbreaks in NY and CT belong to lineage B1
Most genomes from NY state belong to sub-lineages of B1. Among the 30 genomes from CT sequenced and included in this update, 24 cluster in this lineage, and were collected between March 11th and April 8th. The lineage B1 was most likely introduced in Northeastern US from Western Europe in multiple events. Genomes from NY and CT and closely related (see Figure 4) and, as shown in our previous update, this pattern suggests that constant introductions of viruses are taking place from New York into Connecticut, as the data released by Aguero-Rosenfeld et al, and Gonzalez-Reiche et al also suggest (see here).
Sub-lineage B1.11 is mainly found in Connecticut
Six genomes sequenced by our team, one of them (Yale-068), belongs to the lineage B1.11, and can be distinguished by a mutation on ORF1b (A88V) (see Figure 5). This lineage was most likely introduced from Northern Europe into CT or NY, a pattern evidenced both by our data, and data generated by Gonzalez-Reiche et al (see here). Genomic data shows that sub-lineage B1.11 is now mainly circulating in Connecticut, at least on Beacon Falls, Clinton and surrounding areas.
The bottom line
Using genomic epidemiology, SARS-CoV-2 genomes sequenced from Connecticut (Figure 6) show multiple independent coronavirus lineages circulating in the state, especially in counties located by the coastline towards New York, such as Fairfield, New Haven and Middlesex county, but genomes were also sequenced from pacients residents of Litchfield and Hartford county (see map here).
With a peak in cases being projected to late April in Connecticut, it is still advisable to keep social distancing to avoid exposure to the virus SARS-CoV-2. This applies to young and adult populations (silent, asymptomatic viral spreaders), and especially to the elderly and people with underlying health conditions, those suffering severe burdens of COVID-19.
On our Github, the directory
auspice contains a json file that was produced using the
augur nextstrain pipeline.
Grubaugh Lab | Yale School of Public Health (YSPH) | https://grubaughlab.com/