Canada

New machine learning tool may shed light on chronic symptoms of COVID

The long COVID emerged as a pandemic within a pandemic. As scientists work to unravel the many unanswered questions about how the initial infection affects the body, they must now also study why some people develop debilitating, chronic symptoms that last months to years longer.

A new machine learning tool is here to help.

Developed by a team of researchers from institutions across the country, led by Justin Reese of the Berkeley Lab and Peter Robinson of the Jackson Lab, the software analyzes electronic health record (EHR) records to find common symptoms among people who have been diagnosed with long-term COVID and to determine subtypes of the condition. The research, which is described in a new paper in eBioMedicine, also identified strong correlations between different long subtypes of COVID and pre-existing conditions such as diabetes and hypertension.

According to Reese, a computational biosciences scientist at Berkeley Lab, this research will help improve our understanding of how and why some people develop long-lasting symptoms of COVID, and could enable more effective treatments by helping clinicians to develop personalized therapies for each group. For example, the best treatment for patients experiencing nausea and abdominal pain may be quite different from the treatment for those suffering from persistent cough and other pulmonary symptoms.

The team developed and validated their software using a database of EHR information from 6,469 patients diagnosed with persistent COVID after confirmed COVID-19 infections.

Basically, we found long COVID features in the EHR data for each patient with long COVID and then assessed patient-to-patient similarity using semantic similarity, which essentially allows for “fuzzy matching” between features—for example, “cough” is not the same as “ shortness of breath’, but they are similar in that they both involve lung problems. We compare all symptoms for the pair of patients in this way and get a score on how similar the two patients with persistent COVID are. We can then perform unsupervised machine learning on these results to find different subtypes of long COVID.”


Justin Rees of Berkeley Lab

They apply machine learning to these patient-patient similarity scores to group patients into groups, which are then characterized by analyzing associations between symptoms and preexisting conditions and other demographic characteristics, such as age, gender, or race.

Reese and his colleagues note that the tool will be convenient for researchers because the machine learning approach is essentially self-adaptive for different EHR systems, allowing researchers to collect data from a wide variety of medical settings.

This research builds on previous work to develop the Human Phenotype Ontology, an open-access database and research tool that provides a standardized dictionary of symptoms and characteristics found in all human diseases.

source:

Lawrence Berkeley National Laboratory