Systematic Reviews: Data Extraction

Data extraction is the process of extracting the relevant pieces of information from the studies you are including in your review. This also includes organizing the information in a way that will help you synthesize the studies and draw conclusions.

For both quantitative and qualitative syntheses, the data extraction will often become "Table 1" of the published manuscript, or "Characteristics of included studies." For quantitative synthesis, this is where team will collect the necessary data to carry out meta-analysis.

The data points that will be extracted from each study should be predefined in the protocol.

Li T, Higgins JPT, Deeks JJ (editors). Chapter 5: Collecting data. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.3 (updated February 2022). Cochrane, 2022. Available from www.training.cochrane.org/handbook.

How should I perform data extraction?

Much like the study selection process, data extraction should be performed in duplicate. Having more than one person extracting data from every included report helps to minimize errors and reduce bias introduced by the review authors.

Data collection for systematic reviews should be performed using structured data collection forms. These can be paper forms, electronic forms (Google Form, Excel, REDCap), or commercially or custom-built data systems (Covidence, EPPI-Reviewer, Systematic Review Data Repository (SRDR)) that allow online form building, data entry by several users, data sharing, and efficient data management. All different means of data collection require data collection forms.

Review authors often have different backgrounds and level of systematic review experience. Using a data collection form ensures some consistency in the process of data extraction, and is necessary for comparing data extracted in duplicate. Piloting the form within the review team is recommended.

Covidence has produced a helpful guide on the data extraction process, available for free download, here: https://www.covidence.org/resource/data-extraction-for-intervention-systematic-reviews/

What data should be extracted?

The type of data extracted will depend on the clinical question informing the review. However, extracted data will often include both study characteristics and outcome data.

Items to consider in data collection

Not all of the following points will be relevant for all reviews.

Information about data extraction from reports

Name of data extractors
Date of data extraction
Identification features of each report from which data are being extracted (first author, year)

Study methods

Study design (parallel, factorial, crossover, cluster aspects of design for randomized trials, and/or study design features for non-randomized studies)
Single or multicentre study; if multicentre, number of recruiting centres
Recruitment and sampling procedures used (including at the level of individual participants and clusters/sites if relevant)
Enrollment start and end dates; length of participant follow-up
Statistical analysis
- Unit of analysis (individual participant, clinic, village, body part)
- Statistical methods used
Participants
- Setting
- Regions(s) and country/countries from which study participants were recruited
- Study eligibility criteria, including diagnostic criteria
- Characteristics of participants at the beginning (or baseline) of the study (e.g. age, sex, comorbidity, socio-economic status)
Intervention
- Components, routes of delivery, doses, timing, frequency, intervention protocols, length of intervention
- Factors relevant to implementation (e.g. staff qualifications, equipment requirements)
- Integrity of interventions (i.e. the degree to which specified procedures or components of the intervention were implemented as planned)
- Description of co-interventions
- Definition of ‘control’ groups (e.g. no intervention, placebo, minimally active comparator, or components of usual care)
- For observational studies: description of how intervention status was assessed; length of exposure, cumulative exposure
Outcomes
- Measurement tool or instrument (including definition of clinical outcomes or endpoints); for a scale, name of the scale (e.g. the Hamilton Anxiety Rating Scale), upper and lower limits, and whether a high or low score is favourable, definitions of any thresholds if appropriate
- Specific metric (e.g. post-intervention anxiety, or change in anxiety from baseline to a post-intervention time point, or post-intervention presence of anxiety (yes/no))
- Method of aggregation (e.g. mean and standard deviation of anxiety scores in each group, or proportion of people with anxiety)
- Timing of outcome measurements (e.g. assessments at end of eight-week intervention period, events occurring during the eight-week intervention period)
- Adverse outcomes need special attention depending on whether they are collected systematically or non-systematically (e.g. by voluntary report)
Results
- For each group, and for each outcome at each time point: number of participants randomly assigned and included in the analysis; and number of participants who withdrew, were lost to follow-up or were excluded (with reasons for each)
- Summary data for each group (e.g. 2×2 table for dichotomous data; means and standard deviations for continuous data)
- Between-group estimates that quantify the effect of the intervention on the outcome, and their precision (e.g. risk ratio, odds ratio, mean difference)
- If subgroup analysis is planned, the same information would need to be extracted for each participant subgroup
Miscellaneous
- Key conclusions of the study authors
- Reference to other relevant studies
- Correspondence required
- Miscellaneous comments from the study authors or by the review authors