A well-designed cohort study can provide powerful results. Over the past decade, the population cohort approaches, used for better understanding correlations with respect to a disease risk or outcome in public health, have become increasingly available in China. The Shanghai Birth Cohort and Early Life Plan cohort are studies developed using different strategies at Shanghai Xinhua Hospital, which is affiliated with the School of Medicine at Shanghai Jiaotong University in Shanghai, China. As part of these cohorts, banking biological samples and associated data are two essential and considerable tasks.[1,2] The term biobank commonly refers to a large, organized collection of well-characterized tissue samples, such as surgical biopsy specimens (fresh frozen or in paraffin sections), blood and serum samples, different cell types and DNA—all carefully collected for research purposes with their associated research and/or clinical data.

In a longitudinal active sampling of cohorts, work efficiency of biobanking activities is usually reviewed by providing routinely defined reports, with a focus on the increasing number of samples and/or cases collected. Thus, the increase in storage becomes a hallmark of the development status—confirming the ongoing sampling and growth of the cohort. However, of note, the information often missing pertains to both the usability of the data as well as growth; we do not know if the collection can meet the needs to address a scientific question as it was designed. Considering the above, we need different and innovative ways to review and evaluate updates of such longitudinal population-level data collections for true indicators of impact, which can directly inform the research question that the cohort was originally designed to answer. The method briefly presented below does not only inform us correctly, but also helps us identify issues to solve them on time.

To address this key question, we developed a concept similar to taking inventory. We developed a strategic method based on the requirements associated with Shanghai Birth Cohort and Early Life Plan, and named the method “appraisal of sample integrity” to determine and assign values to the cohort data. In this example, we defined and evaluated according to the following three key factors: (1) completeness of associated key elements with samples, (2) accuracy of relevant elements, and (3) consistency of related elements. As has been evidenced, sample integrity is the basis for increasing sample utilization to address scientific questions.[3] The key steps involved in this strategy are as follows:

  1. Define the research question for which the collection was designed to answer.

  2. Identify the associated data elements (ie, sample metadata) that are critical and determine sample usability.

  3. Determine logical relationships of the elements (ie, using the three basic Boolean operators, AND, OR, and NOT).

  4. Search the cohort to retrieve samples that match the requirements defined above.

By this relational appraisal approach, we identified some critical issues in our cohort collections, as it became easy to highlight potential data completion gaps that might have otherwise remained unaddressed. In doing so, we believe cohort collections can become more usable, resulting in an increased use of samples and data to support clinical research. Furthermore, as a long-term biobank operation is very costly,[4] it would be critical to simply identify any issues that could reduce sample integrity and address them early in the process. Currently, we are expanding this assessment model to consortium member hospitals that are aligned to our operations. Doing so is expected to also increase interoperability and eventually reduce heterogeneity for the sharing of samples and data.

In summary, one should move away from singular points of attainment (such as collection size), as these might be misleading especially in the case of large cohort studies. The biological samples and data are meaningful when they are of high quality, which requires a holistic approach for establishing and appraising cohort data collections.

1.
Zhang
J,
Tian
Y,
Wang
W,
et al
Cohort profile: the Shanghai Birth Cohort
.
Int J Epidemiol
.
2019
;
48
:
21
21g
.
2.
Ma
R,
Yang
K,
Chen
C,
et al
Early-life exposure to aluminum and fine motor performance in infants: a longitudinal study
.
J Exp Sci Environ Epidemiol
.
2021
;
31
:
248
256
.
3.
Betsou
F.
Quality assurance and quality control in biobanking
.
In:
Hainaut
P,
Vaught
J,
Zatloukal
K,
Pasterk
M,
Eds.
Biobanking of Human Biospecimens
.
Springer;
2017
:
23–49.
4.
Rao
A,
Vaught
J,
Tulskie
B,
et al
Critical financial challenges for biobanking: report of a National Cancer Institute study
.
Biopreserv Biobank
.
2019
;
17
:
129
138
.

Competing Interests

Sources of Support: This study was partly funded by Collaborative Innovation Program of Shanghai Municipal Health Commission (2020CXJQ01), “Establishing a Platform for Clinical Research Data Sharing to Facilitate Multicenter Collaboration by China-Canada Joint Effort” (No. 2014DFG31460), and the Early Life Plan Project, Shanghai Xinhua Hospital. Conflict of Interest: None.