Data Interoperability

Increasing productivity of reliable Real World Evidence generation requires the standardisation and automation of the analytical processes. This cannot be done without a standard representation of the data. Full interoperability of the data is needed with respect to structure (syntactic interoperability) and coding systems (semantic interoperability) by using a Common Data Model (CDM).

OMOP Common Data Model

DARWIN EU is using the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM), which is maintained by the Observational Health Data Sciences and Informatics (OHDSI) global community. The OHDSI community also has an active European Chapter, led by the Erasmus MC.

The OMOP CDM provides both syntactic and semantic interoperability. Syntacticly, the OMOP CDM is a relational database consisting of tables with demographics and clinical events ('Standardised Clinical Data', information about the care setting ('Standardised Health System'), health economics ('Standardised health economics'), records derived from other clinical tables ('Standardized derived elements') and 'Standardized metadata'). The semantic layer is provided by the 'Standardized vocabularies', which has to be populated by the OMOP standard concepts and their relationships. The standardized vocabularies are centrally maintained and distributed. The semantic interoperability is especially important in Europe with its myriad of terminology systems and languages.

Tools are available for data standardisation, data quality, and data analysis, commercially and in the public domain (see for example, https://www.ohdsi.org/software-tools). The OMOP CDM is designed for federated querying and analytics, whereby applications are run locally at the data partners site and only aggregated results are shared. This privacy-by-design approach is compliant with data protections requirements.

The OMOP CDM is a proven model for large-scale observational health studies. It has been used in many observational studies including studies that informed regulatory decision-making, and a large number of European databases are already available in OMOP CDM format. For example, the European Health Data and Evidence Network (EHDEN) project is investing €17M private/public funding in standardising health data to the OMOP-CDM through the Innovative Medicines Initiative (Federated Network of EHDEN data partners). A growing number of European funded projects have adopted the OMOP CDM and National OHDSI/OMOP Nodes have been established.

Quality Control Mechanisms

As part of Data Partner (DP) onboarding, the DARWIN EU Coordination Centre (CC) assesses the quality of the data. The DARWIN EU data quality package will be aligned with the Data Quality Framework of EMA. The benefit of an interoperable data model, is dat we can use standardised tooling to extract objective data quality metrics. For this purpose we are currently using two packages:

CDM Onboarding

CdmOnboarding is an R Package to support the onboarding process of a Data Partner (DP) into the DARWIN EU® Data Network. It extracts statistics from the DP's OMOP CDM instance and produces a Word document. The goal of this onboarding report is to provide insight into the completeness, transparency and quality of the performed Extraction Transform, and Load (ETL) process and the readiness of the data partner to be onboarded in the DARWIN EU®  data network and participate in DARWIN EU® studies.

The onboarding report consists of three sections. The Clinical Data section reports on data table counts, data density, follow-up period length and date ranges. The Vocabulary Mapping section is especially important for data quality, as it shows the concept mapping coverage per domain and the top mapped/unmapped codes. Finally, the Technical Infrastructure section gives insight into the readiness of the DP to execute studies, with overviews of the query timings, installed packages and system information.

CdmOnboarding is run on-site by the DP, and extracts data directly from the OMOP CDM and from pre-calculated tables from Achilles (OHDSI R package for OMOP CDM characterisation). The resulting Word document is required as an annex to the main Onboarding Document, to be delivered upon first onboarding. However, CdmOnboarding is required to be run on every CDM refresh, and results shared with the CC for inspection.

Data Quality Dashboard

The DataQualityDashboard (DQD) is an R package maintained by the OHDSI community. It performs a set of over 3000 standardised checks on a populated OMOP CDM instance. The goal is to evaluate observational data quality in a systematic and transparent way.

The quality checks are organized according to the Kahn Framework which uses a system of categories and contexts that represent strategies for assessing data quality. DQD contains 24 checks defined within this framework that can be systematically executed against all relevant tables and fields in the OMOP CDM.

Examples of checks executed by DQD are:

  • does the table/field exist, is populated and does it have the right data type (cdmField, cdmDataType, isRequired)
  • does the field follow the standard semantic interoperability (isStandardValidConcept)
  • gender-specific diagnosis/procedure associated with correct person gender (gender)
  • measurement value within extreme ranges (valueLow, valueHigh)

Clair Blacketer, Frank J Defalco, Patrick B Ryan, Peter R Rijnbeek, Increasing trust in real-world evidence through evaluation of observational data quality, Journal of the American Medical Informatics Association, Volume 28, Issue 10, October 2021, Pages 2251–2257, https://doi.org/10.1093/jamia/ocab132