Characterizing database granularity using SNOMED-CT hierarchy.
Ostropolets A., Reich C., Ryan P., Weng C., Molinaro A., DeFalco F., Jonnagaddala J., Liaw S-T., Jeon H., Park RW., Spotnitz ME., Natarajan K., Argyriou G., Kostka K., Miller R., Williams A., Minty E., Posada J., Hripcsak G.
Multi-center observational studies require recognition and reconciliation of differences in patient representations arising from underlying populations, disparate coding practices and specifics of data capture. This leads to different granularity or detail of concepts representing the clinical facts. For researchers studying certain populations of interest, it is important to ensure that concepts at the right level are used for the definition of these populations. We studied the granularity of concepts within 22 data sources in the OHDSI network and calculated a composite granularity score for each dataset. Three alternative SNOMED-based approaches for such score showed consistency in classifying data sources into three levels of granularity (low, moderate and high), which correlated with the provenance of data and country of origin. However, they performed unsatisfactorily in ordering data sources within these groups and showed inconsistency for small data sources. Further studies on examining approaches to data source granularity are needed.