Canada

Scientists present a highly accurate new framework for genomic surveillance of SARS-CoV-2

In a recent study published on the medRxiv* preprint server, researchers presented a novel framework for genomic surveillance of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) based on cases from New York (NY), the United States (U.S. ) and the United Kingdom (UK).

Study: Rapid threat detection in SARS-CoV-2. Image credit: SWKStock/Shutterstock

SARS-CoV-2 shows significantly higher transmissibility due to the continuous evolution of the SARS-CoV-2 spike (S) protein, which mediates the binding interactions between SARS-CoV-2 S and hACE2 (human angiotensin-converting enzyme 2), and by this manner, the efficiency of host invasion by SARS-CoV-2. SARS-CoV-2 S mutations not only affect the transmission of the virus, but also increase the chances of re-infection, raising concerns about the efficacy of vaccines against coronavirus disease 2019 (COVID-19).

About the research

In the present study, the researchers developed a framework for genomic surveillance of SARS-CoV-2 based on case studies from New York, the United Kingdom, and the United States and data obtained from the Global Initiative for Sharing All Influenza Data (GISAID) database.

The framework relies on genomic co-evolutionary sites as building blocks instead of genomic sequences, and considers the relationships between multiple sequence alignment (MSA) columns, where each column represents a genetic locus or site. MSA is considered irreducible and shows a complex of motifs representative of co-evolutionary relationships between different genomic sites, so that if several sites are linked, concomitant mutations will occur at all sites; however, the link will be preserved.

The relationship between the motif-based Omicron (M) variant (OmicronM) with BA.1M mutations and the phylogeny-based (P) variant of Omicron (OmicronP) with BA.1P mutations of SARS-CoV-2 S was assessed when the frame trigger an alert (in the first week of December 2021). In addition, site bindings in DeltaM, BA.2M, BA.4M, and BA.5M when their respective motif-based signals were evaluated.

Complex motif differentials (D) were analyzed to improve the understanding of relational structures of MSA evolution. Warnings are issued only in the case of sufficiently large D values ​​and the presence of critical clusters (persistent clusters with entropy increases >0.35), and a variant is considered a key variant if the variant constitutes >50% of the population at a particular locus.

The observational framework was applied prospectively and retrospectively. The retrospective analysis was based on SARS-CoV-2 sequence data obtained from the UK (during the emergence of the Delta and Alpha variants) and the USA (during the emergence of the Omicron and Omicron BA.2 variants). For the analysis, the spread of SARS-CoV-2 was known and threats could be mapped.

For the prospective analysis, data from New York for the occurrence of Omicron BA.2.12/Omicron BA.2.12.1 and Omicron BA.4/BA.5 were analyzed and all SARS-CoV-2 threats could not be mapped and were therefore . considered unknown. Surveillance was validated by testing for SARS-CoV-2 populations at several temporal and spatial scales: city, state, country, and three-day, weekly, and monthly.

Results

The Framework issued warnings based on GISAID data and reasoning on 16 May 2022 related to a coevolving cluster of sites comprising several genomic sites (n=7) mapped to Omicron BA.5, of which one site encodes the D3N mutation of SARS-CoV-2 membrane (M) protein, three sites encode the ORF6:D61L mutation and three sites encode the A27259C, C27889T and C26858T mutations.

When new insight was gained and projected as sequences, the cluster split into two mutually exclusive blocks (nuc:C27889T, m:D3N) comprising coevolving regions associated with reverse amino acid substitutions such as ORF6:D61L,nuc:A27259C, number: C26858T . The framework issues timely warnings based on the emergence and disappearance of SARS-CoV-2 variants with 99%, 89%, and 100% accuracy for the New York, UK, and US cases, respectively, and >85% overall accuracy.

In the case studies, the team observed that the co-evolving sites contained in critical clusters almost always showed either reverse mutations or exclusive mutations. OmicronM represents a unique critical cluster of 55 coevolutionary sites showing OmicronP mutations (n=30), BA.1P mutations (n=13) and DeltaP (n=13) reverse mutations. The cluster expanded over the next week to contain 68 coevolutionary sites, including all sites that showed BA.1P mutations.

The SARS-CoV-2 variants that triggered alerts showed associated, reverse mutations in their major critical clusters, except for BA.2.12. Furthermore, BA.5 did not differ from BA.4 in SARS-CoV-2 S mutations, although BA.5 showed a critical cluster independent of all SARS-CoV-2 S mutations and included a separate SARS-CoV-2 M mutation . The signals issued by the monitoring system were specific but consistent across several geographic regions and were robust to multiple parameter selections.

Overall, the study results highlighted the accuracy of the new SARS-CoV-2 surveillance framework in issuing real-time motivated warnings about the emergence and disappearance of key SARS-CoV-2 variants. Critical signaling clusters can detect variant mutations, and variants are characterized by junctional mutations diverging from the wild-type (WT) SARS-CoV-2 strain and reverse mutations.

*Important message

medRxiv publishes preliminary scientific reports that are not peer-reviewed and therefore should not be considered conclusive, guiding clinical practice/health-related behavior or treated as established information.