Characterization of long-term patient-reported symptoms of COVID-19: an analysis of social media data
Banda JM., Adderley N., Ahmed W-U-R., AlGhoul H., Alser O., Alser M., Areia C., Cogenur M., Fišter K., Gombar S., Huser V., Jonnagaddala J., Lai LYH., Leis A., Mateu L., Mayer MA., Minty E., Morales D., Natarajan K., Paredes R., Periyakoil VS., Prats-Uribe A., Ross EG., Singh G., Subbian V., Vivekanantham A., Prieto-Alhambra D.
As the SARS-CoV-2 virus (COVID-19) continues to affect people across the globe, there is limited understanding of the long term implications for infected patients1–3. While some of these patients have documented follow-ups on clinical records, or participate in longitudinal surveys, these datasets are usually designed by clinicians, and not granular enough to understand the natural history or patient experiences of ‘long COVID’. In order to get a complete picture, there is a need to use patient generated data to track the long-term impact of COVID-19 on recovered patients in real time. There is a growing need to meticulously characterize these patients’ experiences, from infection to months post-infection, and with highly granular patient generated data rather than clinician narratives. In this work, we present a longitudinal characterization of post-COVID-19 symptoms using social media data from Twitter. Using a combination of machine learning, natural language processing techniques, and clinician reviews, we mined 296,154 tweets to characterize the post-acute infection course of the disease, creating detailed timelines of symptoms and conditions, and analyzing their symptomatology during a period of over 150 days.