During an epidemic period, such as the current coronavirus disease (COVID-19), reliable information is required to support decision making at many different levels, including individual treatment options and social distancing measures.[1] However, in many cases there is a lack of reliable evidence, for example, on how many people have been infected with a pathogen or become infected. In this context, accurate information is always needed to guide decisions and implementations and to monitor the epidemic impact.[2] When it comes to using citizen science (also known as “crowdsourcing”) to aid global problem solving there are many possibilities, and epidemiologists have started slowly to utilize citizen science to bolster disease prevention and enhance public health surveillance.[3]

Tackling aspects of the new COVID-19 pandemic is a complex task. However, crowdsourcing and collective intelligence can be used to aid informed responses. At its simplest, collective intelligence is the enhanced capacity created when distributed groups of people work together, often with the help of technology, to mobilize more information, ideas, and insights to solve a problem.[4] Advances in digital technologies have transformed what can be achieved through collective intelligence in recent years—connecting ever more people and helping us to generate new insights from novel sources of data. As such these approaches are considered particularly suited to addressing fast-evolving, complex global problems such as disease outbreaks.

In the case of COVID-19, it is increasingly clear that the surveillance and epidemiological data collected so far is of variable quality across different locations.[57] This creates uncertainty and ongoing debates about the reported statistics, such as the reported case fatality rates.[8] As health systems have variable testing capacity, even limited capacity in resource-restricted settings, inherent selection bias in the reported data might increase in the near future. Thus, real-time analyses of epidemiological data are needed to increase situational awareness and informed interventions. Coordinated high-quality data collection between providers and across settings can potentially prevent onward transmission and allow more efficient use of resources by informing targeted nonpharmaceutical interventions while reducing the overall societal burden.

As in previous recent pandemics, real-time analyses have shed light on the transmissibility, severity, and natural history of emerging pathogens; for example, severe acute respiratory syndrome (SARS), the 2009 influenza pandemic, and Ebola.[911] Evidence suggests that well-connected technology infrastructure can help manage infectious disease outbreaks more effectively, facilitating communication pathways, improving patient safety, and reducing medical errors.[12,13] Data analytics lies at the base of these decision-making pathways supporting coordinated epidemic care systems.

Therefore, the collection and analysis of detailed data are particularly useful for inferring key epidemiological parameters, such as the incubation and infectious periods of the pathogen, symptoms monitoring, and times between infection and detection, isolation, and reporting of COVID-19 cases. As no treatment exists for the current pandemic, the importance of patient-related data has increased dramatically. According to the European Centre for Disease Prevention and Control (ECDC), although case finding based on surveillance case definition is beneficial, in areas with ongoing community transmission, limited resources for testing means that this cannot be comprehensive. Therefore, such surveillance methods, when used exclusively, are unlikely to provide a full picture of COVID-19 epidemiology.[14]

Hence, new ways of accumulating and analyzing quality data need to be considered to complement existing legacy systems currently in use. The current paper presents such a new approach, namely, a tool called “CoronaBio.”

CoronaBio (https://coronab.io) is a novel digital system that uses crowdsourcing methodologies to collect real-world data from people with COVID-19 and members of the wider public for downstream use in artificial intelligence technologies, aiming to inform healthcare systems and public health intervention strategies (including potential vaccination) against the current pandemic and any subsequent waves that might follow, thereby enabling digitization of the healthcare system's response to an outbreak. The CoronaBio system's AI is agnostic but provides the foundation upon which AI-based decision support systems (DSS) can be deployed, linking participant-derived data and clinical data available from European healthcare centers and clinical databases.

No current solution to date provides a pan-European, general data protection regulation (GDPR)-compliant platform (patent-pending IT architecture and framework, according to GDPR legal requirements and biomedical research code of conduct and research) for non-hospitalized SARS-CoV-2–positive patients to record their real-world data, as well as for those who have recovered and were discharged from clinical systems. CoronaBio's technology architecture (Figure 1) is designed and implemented so that data are collected, stored, processed, and accessed with all required GDPR standards for elimination of all internal and external threats, with extensions to third-party access as well. The publications, providing the indicative set of parameters to be recorded for the purposes of personal well-being and as basic information enabling community-based surveillance, constitute a collection of 36 manuscripts published within the first two months of the pandemic and are provided as supplementary material, available online.

Figure 1

CoronaBio's techology architecture.

Figure 1

CoronaBio's techology architecture.

The application, service, and data layers of the digital system are supported by technologies and standards such as OpenEHR, HAPI FHIR (Fast Healthcare Interoperable Resource) server. The crowd-based platform accommodates the needs of the user community with all extensions in full compliance with FHIR framework (v4.01). CoronaBio's data lake includes both quantitative and qualitative descriptions as well as the appropriate findability descriptions. Attributes and properties of patients' health-related data follow automated workflows, for consistency and increased usability.

It is yet unknown whether healthy but previously infected individuals will remain healthy upon repeated exposures (meta-infection) or if they will suffer from virus-related pathologies or are prone to other diseases in the future. A systematic and standardized platform for health status data collection and management will provide the industry not only a tool for exploration in research on therapy but also a patient cohort pool for clinical trial recruitment and vaccination hotspots.

CoronaBio is a patient user interface/user experience design (UI/UX) patient-reported data solution consisting of a dedicated database and platform to link patient real-world data with AI, analytics, and network-based software tools. As a self-reporting digital tool, CoronaBio accommodates a UI (Figure 2) with pre-fixed options for all data attributes, thus increasing data harmonization and quality by offering previously scientifically validated (supplementary material), embedded, balanced, simple, and explanatory health status description options. CoronaBio's UI service manager offers separated, clear, and distinct sectors, with visual configurations (Figure 3) for stable but also flexible recording modules of all users' health-related parameters, such as symptomatology, treatments, and observations on vital functions. By covering the whole spectrum of symptoms, for both common and less common symptoms as reported in the existing scientific literature, the tool incorporates the majority of expected phenotypic variations of non-hospitalized COVID-19–positive individuals (supplementary material).

Figure 2

CoronaBio's user interface and pre-fixed options for data elements.

Figure 2

CoronaBio's user interface and pre-fixed options for data elements.

Figure 3

User interface service manager workflow.

Figure 3

User interface service manager workflow.

As a self-report tool, data accuracy is patient-dependent but can be directly comparable to that of electronic medical record reports, and accuracy is fair in relation to previous longitudinal cohorts for descriptive infectious disease symptomatology.[16] The tool provides added value when combined with professional medical diagnoses, within a one-to-one–based analysis or a cohort-based analysis. Furthermore, as these data are structured according to established harmonized standards, they can potentially relate to data from several European certified biobanks and clinical databases and diagnostic centers. CoronaBio aims to assist the interconnection of studies and stratify the interdependence of potential phenotypes with complex, chronic conditions.

More specifically, the digital platform allows participants to use the e-consent form[17,18] and then record demographic, health status, lifestyle, and psychological data. The tool collects and stores data following international HL7[19,20] and open HER[21] standards for clinical data; thus it forms a more complete solution with potential attached functionalities, superior to those of a typical application. As a result, the data have a much higher degree of accuracy and value for research.[22] CoronaBio's terminology for error minimization and patient data uses Logical Observation Identifiers Names and Codes (LOINC) and Systematized Nomenclature of Medicine–Clinical Terms (SNOMED CT).[23,24]

Participants interact with CoronaBio by directly joining the community via the web-based platform and record a spectrum of data valuable for future research. Data can be made available in real time to researchers, physicians, and the industry and can be used in infectious disease modeling to generate and compare epidemiological estimates relevant to interventions.

The CoronaBio added value will emerge when it becomes associated with healthcare providers' data and used as an off-site monitoring tool for both pre- and postpublic health intervention time points, including the potential implementation of vaccinations. As such, harmonized multi·source data, collected longitudinally, via notification prompts or entered voluntarily, for the same cohort augments the effect of CoronaBio. The tool can potentially be deployed for cloud-based, open online use, but can also be deployed within local servers in healthcare institutions as a white box system or embedded within network-based servers.

Users can monitor their symptomatology and observe data analysis on a series of data closely related to the clinical course of the coronavirus disease (supplementary material). Researchers are provided with a data-usability environment where they can conduct hypothesis-driven data exploration. Data transferring and export are available and satisfy European Union–based security protocols and consent statuses.

CoronaBio offers a solution to the general public as well as to researchers and clinical professionals. Users can register, state their current health status, and track and share their symptoms with medical research teams. Patients with confirmed diagnoses can record their symptoms daily, along with several health-related factors, such as diet and mood. The platform is secure, using the highest industry-standard software, to capture and harmonize patient data. The entire chain for data processing, along with the digital system, is GDPR-compliant. All patient information collected on the CoronaBio platform is offered for free to public health organizations, public institutions, or public health authorities dealing with COVID-19.

In the efforts to control the coronavirus pandemic, solidarity, open data, and cooperation are the keys to achieving better results faster, as well as setting up the mechanisms for the prevention of future worldwide outbreaks. CoronaBio's utility has the potential to be further enhanced through the setting up of a network of biobanks, healthcare providers, and research institutions that will process the available data, as well as use them to implement solutions using digital tools for risk assessment, biomarker identification, and patient management.

Supplemental Material

Supplemental material is available with the article online.

References

1.
Cuan-Baltazar
JY,
Muñoz-Perez
MJ,
Robledo-Vega
C,
Pérez-Zepeda
MF,
Soto-Vega
E.
Misinformation of COVID-19 on the Internet: Infodemiology Study
.
JMIR Public Health Surveill
2020
;
6
:
e18444
.
2.
Moscovitch
B,
Halamka
JD,
Grannis
S.
Better patient identification could help fight the coronavirus
.
NPJ Digit Med
2020
;
3
:
83
.
3.
Katapally
TR.
A global digital citizen science policy to tackle pandemics like COVID-19
.
J Med Internet Res
2020
;
22
:
e19357
.
4.
Suran
S,
Pattanaik
V,
Draheim
D.
Frameworks for collective intelligence: a systematic literature review
.
ACM Comput Surv
2020
;
53
:
1
36
.
5.
Idrovo
AJ,
Manrique-Hernández
EF.
Data quality of Chinese surveillance of COVID-19: objective analysis based on WHO's situation reports
.
Asia Pac J Public Health
2020
;
32
:
165
167
.
6.
Lau
H,
Khosrawipour
V,
Kocbach
P,
et al.
Internationally lost COVID-19 cases
.
J Microbiol Immunol Infect
2020
;
53
:
454
458
.
7.
Roda
WC,
Varughese
MB,
Han
D,
Li
MY.
Why is it difficult to accurately predict the COVID-19 epidemic?
Infect Dis Model
2020
;
5
:
271
281
.
8.
Ioannidis
JP.
A fiasco in the making? As the coronavirus pandemic takes hold, we are making decisions without reliable data
.
2020
.
Stat
,
17
.
9.
Chowell
G,
Bertozzi
SM,
Colchero
MA,
et al.
Severe respiratory disease concurrent with the circulation of H1N1 influenza
.
N Engl J Med
2009
;
361
:
674
679
.
10.
Fraser
C,
Donnelly
CA,
Cauchemez
S,
et al.
Pandemic potential of a strain of influenza A (H1N1): early findings
.
Science
2009
;
324
:
1557
1561
.
11.
Lipsitch
M,
Cohen
T,
Cooper
B,
et al.
Transmission dynamics and control of severe acute respiratory syndrome
.
Science
2003
;
300
:
1966
1970
.
12.
Wang
CJ,
Ng
CY,
Brook
RH.
Response to COVID-19 in Taiwan: big data analytics, new technology, and proactive testing
.
JAMA
2020
;
323
:
1341
1342
.
13.
Carrillo
D,
Nardelli
PH,
Pournaras
E,
et al.
Containing future epidemics with trustworthy federated systems for ubiquitous warning and response
.
IEEE Eng Manag Rev
2020
14.
European Centre for Disease Prevention and Control.
Strategies for the surveillance of COVID-19
.
2020
.
15.
Shah
P,
Kendall
F,
Khozin
S,
et al.
Artificial intelligence and machine learning in clinical development: a translational perspective
.
NPJ Digit Med
2019
;
2
,
69
.
16.
Fragaszy
EB,
Warren-Gash
C,
Copas
A,
et al.
Cohort profile: the Flu Watch Study
.
Int J Epidemiol
2017
;
46
:
e18
.
17.
Dankar
FK,
Gergelya
M,
Dankarb
SK.
Informed consent in biomedical research
.
Comput Struct Biotechnol J
2019
;
17
:
463
474
.
18.
Dankar
FK,
Gergely
M,
Malin
B,
et al.
Dynamic-informed consent: a potential solution for ethical dilemmas in population sequencing initiatives
.
Comput Struct Biotechnol J
2020
;
18
:
913
921
.
19.
Dolin
RH,
Alschuler
L,
Beebe
C,
et al.
The HL7 Clinical Document Architecture
.
J Am Med Inform Assoc
2001
;
8
:
552
569
.
20.
Olivero
MA,
Domínguez-Mayo
FJ,
Parra-Calderón
CL,
Escalona
MJ,
Martínez-García
A.
Facilitating the design of HL7 domain models through a model-driven solution
.
BMC Med Inform Decis Mak
2020
;
20
:
96
.
21.
Tarenskeena
D,
Van de Wetering
R,
Bakker
R,
Brinkkempera
S.
The contribution of conceptual independence to IT infrastructure flexibility: the case of openEHR
.
Health Policy Technol
2020
;
9
:
235
246
.
22.
Kiourtis
A,
Mavrogiorgou
A,
Menychtas
A,
Maglogiannis
I,
Kyriazis
D.
Structurally mapping healthcare data to HL7 FHIR through ontology alignment
.
J Med Syst
2019
;
43
:
62
.
23.
Saripalle
R,
Runyan
C,
Russell
M.
Using HL7 FHIR to achieve interoperability in patient health record
.
J Biomed Inform
2019
;
94
:
103188
.
24.
Alahmar
AD,
Benlamri
R. SNOMED
CT-based standardized e-clinical pathways for enabling big data analytics in healthcare
.
IEEE Access
2020
;
8
:
92765
92775
.

Competing Interests

Source of Support: Metabio. Conflict of Interest: None.

Disclaimer: Where authors are identified as personnel of the International Agency for Research on Cancer/WHO, the authors alone are responsible for the views expressed in this article and they do not necessarily represent the decisions, policy or views of the International Agency for Research on Cancer/WHO.

Supplementary data