Data Provenance (DPROV) for Health IT

October 13, 2015

Data Provenance for Health IT

Data Provenance (DPROV) in the health Information Technology (IT) context refers to the creation of health IT data and the tracking of its permutations throughout its life cycle.  As the demand for data exchange increases, the need for confidence in the “authenticity, trustworthiness and reliability” of the data being shared also increases, to ensure robust privacy, safety, and security-enhanced health information exchange. [Federal Register, ONC Certification Criteria]

Synopsis of the Trend

Health Level 7, International (HL7) and Office of the National Coordination for Healthcare IT (ONC) coined the term “Data Provenance” to refer to “evidence and attributes” that describe the origin of electronic health information as it is captured, modified and exchanged throughout its lifespan.” [Federal Register, NPR]  DPROV refers specifically to the lifecycle and chain of trust of a record entry in a health information system, whether the information is owned by the health care provider or the patient, in the case of Patient-Generated Health Data (PGHD).  DPROV as described in the draft HL7 Data Provenance Implementation Guide [HL7 DPROV IG (DSTU)] is based on HL7 Clinical Document Architecture (CDA). DPROV is specifically designed for use with CDA and its probable replacement standard for Electronic Health Record (EHR) data content and exchange, FHIR.  Fast Healthcare Interoperability Resources (FHIR) is a grounds-up architecture for health IT data definition and exchange currently being developed and piloted. [FHIR, Project Scope Statement]

Why it Matters

Before DPROV, there was no way to ensure vendors and organizations could handle provenance of Health IT data in a well-defined, consistent way. Issues of authenticity, veracity and quality invariably arose when health information, especially Personally Identifiable Information (PII), was created, exchanged and integrated across multiple organizations, parties and systems. As the sheer volume of health information has exploded in the past decade, the demand to track the provenance of that data also increased. There was no way to know whether the information had been tampered with (unauthorized use) or corrupted for malicious intent (medical identity theft). Even when provenance systems were implemented, there were no guarantees they could communicate. Interoperability of provenance data was not possible because there was no minimum set of provenance data elements, metadata and vocabulary to allow the data to be routed correctly, and so it did not fall into the wrong (unauthenticated) hands.

“Confidence in the authenticity, trust worthiness and reliability of the data being shared is fundamental to robust, privacy, safety, and security-enhanced health information exchange.”[Data+Provenance+Charter] The DPROV CDA Implementation Guide (IG) presents a standardized way of capturing data provenance, retaining and exchanging the provenance (metadata) of health data, and using the existing CDA (and future FHIR) standard as their vehicle for provenance data content and exchange. Ultimately, DPROV will support use cases for clinical care, interventions, analysis, decision making and clinical research, legal and others such as “chain of trust” and “chain of custody” and other business and legal requirements. Evidentiary support and clinical decision support are examples of the latter.

Standardization Initiatives

Data Provenance is an emerging healthcare standard. Work is underway in several international standards organizations, including HL7, ISO and W3C, to standardize DPROV, expand its scope and applicability, and create a common reference model.  Last year HL7 balloted the DPROV CDA Implementation Guide as a Draft Standard for Trial Use.  The IG is structured as an overlapping set of templates which allows prospective users to pick and choose, a la “cafeteria style,” the functionality and outcomes they need, depending on their business requirements. Thusly, the IG structure can be applied to any CDA EHR, irrespective of the use case or model applied. ISO issued two technical specifications: ISO/TS 8000-110 and ISO/TS 8000-120. The latter, issued in 2009, includes a Unified Modeling Language (UML) conceptual model for provenance data. W3C, a WWW international standards organization, has published four (4) PROV specifications: PROV Data Model (PROV-DM), PROV Ontology (PROV-O), PROV Constraints, PROV Notation (PROV-N). [Soho] Currently, the models are not reconciled.


DPROV refers to the tracking of evidence and attributes describing the origin of an EHR, encoded in CDA, and supported throughout the lifespan of the data, as it is created, modified and exchanged. Work is underway in several international standards organizations, including HL7, ISO and W3C, to expand its scope and applicability and create a common reference model. Its legitimacy and maturity is indicated by the fact that the Centers for Medicare & Medicaid Services (CMS) recently (March 20, 2015) included the HL7 Data Provenance Implementation Guide in its Notice of Proposed Rulemaking for Meaningful Use Stage 3 (MU3).

Although DPROV did not make the final cut this time, its inclusion in the proposed Certification Criteria attests to its maturity and growing acceptance among clinicians, researchers, vendors and security and privacy experts and the U.S. government, notably at CMS, ONC and the Federal Health Architecture/FHIM. The ONC Standards & Interoperability Initiative continues to work on Data Provenance.

Contributed by: Judy Fincher

Key References