Sorry, you need to enable JavaScript to visit this website.
Provenance, Lineage, Pedigree: Are they the Same? April 13, 2020

by Diana Proud-Madruga

Healthcare information can come from many sources: multiple EHRs, patient provided, wearable medical devices, and any number of healthcare Internet of Things devices. But can that information be trusted? The solution may be in the data’s provenance, lineage, and pedigree.

What do these terms mean?

Provenance: The origin of data and the process by which it arrived at the database. This term has been around for centuries and is still common in art when determining a piece’s authenticity. 

Lineage: The line of descent. Lineage of derived products in geographic information systems (GIS) is the information that describes materials and transformations applied to derive the data.

Pedigree: The history or provenance of a person or thing, especially as conferring distinction, the word “pedigree” seems to include the idea of quality in addition to origin and line of descent.

What’s the difference between these terms?

Research has found provenance and lineage to be used interchangeably when referencing data. These two terms reference the entire lineage of the data, including where the data is from, who saw the data, and where the data has been.

However, pedigree has an important distinction from the other two terms. Pedigree includes not only the history of the data but also the quality of it. Knowing the accuracy, correctness, completeness, and timeliness of a data element, and its compliance with established standards, will help the owner determine how trustworthy the data is based on the level of quality it has.

Why do these terms matter in healthcare?

Patient safety relies on healthcare providers having access to reliable, trusted patient health information. Defining these terms when referring to data is important in order to ensure everyone is using the terms consistently. Initial research has concluded that provenance and lineage are used interchangeably and only reference origins and transformations of data. The term pedigree is becoming increasingly important as it references not only the origin and path of data but the quality.

The idea of quality in data is rapidly gaining momentum and becoming a critical part of any provenance system. Work by the OMG Data Provenance and Pedigree Working Group is in the process of furthering standards that include this concept of “pedigree” when talking about the provenance of data.

Diana Proud-Madruga, CISSP, is a Senior Security Analyst with Electrosoft.

Sources:

  • TAN, W. 2004. Research Problems in Data Provenance. IEEE Data Engineering Bulletin, 27(4):45−52.
  • BUNEMAN, P., S. KHANNA, et al. 2000. Data Provenance: Some Basic Issues. FSTTCS, New Delhi, India, 87−93.
  • Lineage | Origin and Meaning of Lineage by Online Etymology Dictionary. https://www.etymonline.com/word/lineage. Accessed 14 Oct. 2019.
  • D. P. Lanter, "Design Of A Lineage-Based Meta-Data Base For GIS," in Cartography and Geographic Information Systems, vol. 18, 1991, pp. 255-261.
  • Teplitzky, Phil. Data Lineage and Pedigree – One and the Same? - Fintech Today. 20 Dec. 2017, https://www.financialtechnologytoday.com/data-lineage-pedigree-one/#.XXqqIS5Kipo.
Return to Electroblog
Top