A Large-Scale COVID-19 Twitter Chatter Dataset for Open Scientific Research—An International Collaboration

Banda J.; Tekumalla R.; Wang G.; Yu J.; Liu T.; Ding Y.; E. Artemova; E. Tutubalina; Chowell G.

doi:10.3390/epidemiologia2030024

Publications

?

A Large-Scale COVID-19 Twitter Chatter Dataset for Open Scientific Research—An International Collaboration

Epidemiologia. 2021. Vol. 2. No. 3. P. 315–324.

Banda J., Tekumalla R., Wang G., Yu J., Liu T., Ding Y., Artemova E., Tutubalina E., Chowell G.

As the COVID-19 pandemic continues to spread worldwide, an unprecedented amount of open data is being generated for medical, genetics, and epidemiological research. The unparalleled rate at which many research groups around the world are releasing data and publications on the ongoing pandemic is allowing other scientists to learn from local experiences and data generated on the front lines of the COVID-19 pandemic. However, there is a need to integrate additional data sources that map and measure the role of social dynamics of such a unique worldwide event in biomedical, biological, and epidemiological analyses. For this purpose, we present a large-scale curated dataset of over 1.12 billion tweets, growing daily, related to COVID-19 chatter generated from 1 January 2020 to 27 June 2021 at the time of writing. This data source provides a freely available additional data source for researchers worldwide to conduct a wide and diverse number of research projects, such as epidemiological analyses, emotional and mental responses to social distancing measures, the identification of sources of misinformation, stratified measurement of sentiment towards the pandemic in near real time, among many others.

Research target: Computer Science

Keywords: social media public datasets open science data sources COVID-19

Publication based on the results of:

Development of Mathematical Models and Methods for Recommender Systems and Natural Language Processing (2020)

COVID-19 Quarantine Measures Efficiency Evaluation by Best Tube Interval Data Envelopment Analysis

S. Demin, Operations Research Forum 2023 No. 4 Article 21

All countries have responded with a wide range of measures to stop the propagation of coronavirus. We apply best tube interval data envelopment analysis, in order to evaluate efficiency of quarantine measures using imprecise data. Using the Oxford COVID-19 Government Response Tracker’s (OxCGRT) data and given method, we construct time series of efficiency assessment of ...

Added: November 2, 2023

Analyzing COVID-19 Medical Papers Using Artificial Intelligence: Insights for Researchers and Medical Professionals

Dmitry Soshnikov, Petrova T., Soshnikova V. et al., Big Data and Cognitive Computing 2022 Vol. 6 No. 1 Article 4

Since the beginning of the COVID-19 pandemic almost two years ago, there have been more than 700,000 scientific papers published on the subject. An individual researcher cannot possibly get acquainted with such a huge text corpus and, therefore, some help from artificial intelligence (AI) is highly needed. We propose the AI-based tool to help researchers ...

Added: February 22, 2022

Genomic epidemiology of the early stages of SARS-CoV-2 outbreak in Russia

Komissarov A. B., Safina K. R., Garushyants S. K. et al., / Series 005140 "Medrxiv". 2020.

The ongoing pandemic of SARS-CoV-2 presents novel challenges and opportunities for the use of phylogenetics to understand and control its spread. Here, we analyze the emergence of SARS-CoV-2 in Russia in March and April 2020. Combining phylogeographic analysis with travel history data, we estimate that the sampled viral diversity has originated from 67 closely timed ...

Added: July 17, 2020

The rise and spread of the SARS-CoV-2 AY.122 lineage in Russia

Klink G. V., Safina K. R., Nabieva E. et al., / Series 005140 "Medrxiv". 2021.

Background Delta has outcompeted most preexisting variants of SARS-CoV-2, becoming the globally predominant lineage by mid-2021. Its subsequent evolution has led to emergence of multiple sublineages, many of which are well-mixed between countries. Aim Here, we aim to study the emergence and spread of the Delta lineage in Russia. Methods We use a phylogeographic approach to infer imports of Delta ...

Added: December 6, 2021

RuSentiTweet: a sentiment analysis dataset of general domain tweets in Russian

Smetanin S., PeerJ Computer Science 2022 No. 8 Article e1039

The Russian language is still not as well resourced as English, especially in the field of sentiment analysis of Twitter content. Though several sentiment analysis datasets of tweets in Russia exist, they all are either automatically annotated or manually annotated by one annotator. Thus, there is no inter-annotator agreement, or annotation may be focused on ...

Added: June 29, 2022

Multimodal model with text and drug embeddings for adverse drug reaction classification

Sakhovskiy A., Tutubalina E., Journal of Biomedical Informatics 2022 Vol. 135 Article 104182

In this paper, we focus on the classification of tweets as sources of potential signals for adverse drug effects (ADEs) or drug reactions (ADRs). Following the intuition that text and drug structure representations are complementary, we introduce a multimodal model with two components. These components are state-of-the-art BERT-based models for language understanding and molecular property ...

Added: April 10, 2023

Information Security and Online Education During the COVID-19 Pandemic

Pronchev G., Inna V. Goncharova, Lyubimov A. et al., Journal of Higher Education Theory and Practice 2023 Vol. 23 No. 2 P. 218–232

Relevance of the problem in question is associated with academics facing information threats in the virtual educational environment during the COVID-19 pandemic and the short-term transition of the education process to distance learning. We aim to analyze information threats to individuals in the Runet virtual educational environment and suggest measures to neutralize them. Research into ...

Added: January 28, 2024

SARS-CoV-2 Omicron Outbreak in a Dormitory in Saint-Petersburg, Russia

Bazykin G., Danilenko D., Komissarov A. et al., / Series ResearchSquare "Research Square". 2022.

Added: January 13, 2022

Using Text Analytics for Health to Get Meaningful Insights from a Corpus of COVID Scientific Papers

Soshnikov D. V., Soshnikova V., / Series Computer Science "arxiv.org". 2021.

Since the beginning of COVID pandemic, there have been around 700000 scientific papers published on the subject. A human researcher cannot possibly get acquainted with such a huge text corpus -- and therefore developing AI-based tools to help navigating this corpus and deriving some useful insights from it is highly needed. In this paper, we ...

Added: February 22, 2022

Potential role of cellular miRNAs in coronavirus-host interplay

Nersisyan S., Engibaryan N., Gorbonos A. et al., PeerJ 2020 Vol. 8 Article e9994

Host miRNAs are known as important regulators of virus replication and pathogenesis. They can interact with various viruses through several possible mechanisms including direct binding of viral RNA. Identification of human miRNAs involved in coronavirus-host interplay becomes important due to the ongoing COVID-19 pandemic. In this article we performed computational prediction of high-confidence direct interactions ...

Added: September 17, 2020

COVID-19 reproduction number estimated from SEIR model: association with people's mobility in 2020

Soshnikov Dmitri, Petrova T., Grunin A., / Series Computer Science "arxiv.org". 2021.

This paper is an exploratory study of two epidemiological questions on a worldwide basis. How fast is the disease spreading? Are the restrictions (especially mobility restrictions) for people bring the expected effect? To answer the first question, we propose a tool for estimating the reproduction number of epidemic (the number of secondary infections Rt) based on ...

Added: October 7, 2021

Inclusion, Diversity Or Disparity In Telehealth During The Covid-19 Pandemic

Dvoryashina M. M., Tarasenko E. A., IFAC-PapersOnLine 2021 Vol. 54 No. 13 P. 323–326

: The COVID-19 pandemic has led to explosive growth in telemedicine solutions. On the one hand, this is an important moment for the development of telehealth, which promotes inclusion and diversity, and reduces inequalities in health care. On the other hand, telehealth may be less accessible to minorities such as people with cancer, heart disease, ...

Added: November 16, 2021

Sustainable Social Systems: Innovative Service Implications in the Restaurant Business in the Post-COVID Era with Digital Transformation Strategies

Elizaveta Fainshtein, Chkoniya V., Serova E. et al., Sustainability 2023 Vol. 15 No. 19 Article 14539

The COVID-19 pandemic led to changes in and modifications of the role of information and communication technologies in the digitalization of service provision. This paper aims to identify and summarize these changes in business operations, in the context of strategic management in the restaurant industry, triggered by COVID-19. Based on in-depth interviews with 16 key ...

Added: October 8, 2023

Proceedings of 6th International Conference on Recent Trends in Computer Science and Electronics January 5-7, 2021, University of Hawaii, USA

Kharlamov A. A., Raskhodchikov A., Pilgun M., University of Hawaii Press, 2021.

The goal of the study is analysis of residents' perception of environmental problems caused by urban development projects. Data: the material for the study was the data of social media, blogs, messengers, forums, reviews, videos about construction of SouthEast highway in Moscow. Methods: the study involved a cross-disciplinary approach using neural network technologies. Findings: data analysis made it ...

Added: November 9, 2021

Computational identification of disulfiram and neratinib as putative SARS-CoV-2 main protease inhibitors

Svitanko Igor, Stroylov V., Mendeleev Communications 2020 Vol. 30 No. 4 P. 419–420

Identification of disulfiram and neratinib as putative covalent inhibitors of SARS-CoV-2 virus main protease Mpro by a combination of 'on-top docking' procedure, expert evaluation of potential hits and molecular dynamics is reported herein. This finding shows the importance of further development of virtual screening add-ons. ...

Added: August 31, 2020

European Scientific Conference: сборник статей XX Международной научно-практической конференции

Пенза: МЦНС "Наука и просвещение", 2020.

Настоящий сборник составлен по материалам XX Международной научнопрактической конференции «European Scientific Conference», состоявшейся 17 мая 2020 г. в г. Пенза. В сборнике научных трудов рассматриваются современные проблемы науки и практики применения результатов научных исследований. Сборник предназначен для научных работников, преподавателей, аспирантов, магистрантов, студентов с целью использования в научной работе и учебной деятельности. Ответственность за аутентичность ...

Added: August 17, 2020

Frequency, time, and spatial EEG-changes after COVID-19 during a simple speech task.

Vorontsova D. V., Zubov A. I., Isaeva M. V. et al., Computer Research and Modeling 2023

Using data analysis and indirect application of neural networks in our work, we identified patterns of brain electrical activity that characterize COVID−19. We were interested in frequency, temporal, and spatial domain patterns of electrical activity in people who have undergone COVID−19. We found a predominance of α−rhythm patterns in the left hemisphere in healthy people compared ...

Added: April 26, 2023

Trends in Biomathematics: Modeling Epidemiological, Neuronal, and Social Dynamics.

Springer, 2023.

This volume gathers together selected peer-reviewed works presented at the BIOMAT 2022 International Symposium, which was virtually held on November 7-11, 2022, with an organization staff based in Rio de Janeiro, Brazil. Topics touched on in this volume include infection spread in a population described by an agent-based approach; the study of gene essentiality via network-based ...

Added: November 7, 2023

VGsim: scalable viral genealogy simulator for global pandemic

Shchur V., Spirin V., Pokrovskiy V. et al., / Series 005140 "Medrxiv". 2021.

As an effort to help contain the COVID-19 pandemic, large numbers of SARS-CoV-2 genomes have been sequenced from all continents. More than one million viral sequences are publicly available as of April 2021. Many studies estimate viral genealogies from these sequences, as these can provide valuable information about the spread of the pandemic across time ...

Added: April 27, 2021

Оценка занятости пожарных боевых расчётов и рисков их несвоевременного прибытия на объект защиты

Litvin Y. V., Абрамов И. В., Технологии техносферной безопасности 2016 № 66

Advanced approach to the assessment of a random time of arrival fire fighting calculation on the object of protection, the time of their employment and the free combustion. There is some quantitative assessments with the review of analytical methods and simulation ...

Added: August 27, 2016

A Parallel Algorithm to Detect Structural Breaks in Time Series

Furmanov K. K., Nikol'skii I. M., Computational Mathematics and Modeling 2016 Vol. 27 No. 2 P. 247–253

Added: December 22, 2016

Операционные системы. Учебник и практикум

Gostev I. M., М.: Юрайт, 2016.

В настоящее время компьютерные науки стремительно развиваются. Новые версии операционных систем появляются каждые полтора-два года, поэтому было принято решение о включении в данную книгу такого материала, который не будет устаревать. Содержание учебника представляет собой некоторые наиболее общие принципы построения операционных систем, которые были разработаны более 50 лет назад и практически не изменились за прошедшее время. ...

Added: October 13, 2009

Pre-experiments on Annotation of Russian Coreference Corpus

Toldova S., Azerkovich I., Гришина Ю. et al., / NRU HSE. Series WP BRP "Linguistics". 2015.

Building benchmark corpora in the domain of coreference and anaphora resolution is an important task for developing and evaluating NLP systems and models. Our study is aimed at assessing the feasibility of enhancing corpora with information about coreference relations. The annotation procedure includes identification of text segments that are subjects to annotation (markables), marking their ...

Added: December 15, 2015

Сборник трудов конференции NI Academic Days 2017, Москва 13-14 апреля 2017 г.

М.: National Instruments Russia, 2017.

Содержание сборника составляют доклады с результатами оригинальных исследований и технических решений, ранее не публиковавшиеся. Мы надеемся, что предлагаемый сборник окажется полезным для специалистов, работающих в различных областях науки и техники, для широкого круга преподавателей, аспирантов и студентов ВУЗов, а также для преподавателей средних школ и технических колледжей. ...

Added: May 10, 2017