Data Mining as an Essential Informatics Skill Set
Clinical Integrated Data Repositories are now become common at academic medical centers. With tools like i2b2 and RemedyMD, plus a broad range of analytic tools, access to large volumes of clinical data for research and population management is coming to maturity. The opportunities for use of this data in enabling clinical trials and accelerating research are promising. Quality and patient safety can also be enhanced through use of electronic medical records; a recent New England Journal of Medicine article by Dean Sittig details how to “Use EHRs to Monitor and Improve Patient Safety.” ”Organizations must leverage EHRs to facilitate rapid detection of common errors (including EHR-related errors), to monitor the occurrence of high-priority safety events, and to more reliably track trends over time.”
To maximize these opportunities, physicians and other health professionals must develop skills in understanding and utilizing this data. Medical informatics has been successful in developing tools for data mining, but translating raw data into research questions and disease trends requires training medical professionals in new ways of thinking. Understanding clinical workflow in an EMR does not directly translate into this type of research. One must understand how the data is organized and coded to create disease cohorts for analysis. Informaticists are key in training a new generation of physicians in this skill. Because of the complexity of this clinical data, there are three approaches to this data mining and analysis:
- Self-service data mining enabled by cohort definition tools, both vendor developed and open source
- Analyst provided data – skilled data analysts can pull relevant data sets based on their understanding of the research question and the data. However, there are limitations on the number of experienced data analyst any organization can afford to meet the coming demand
- Predictive analytics – this is the realm of the biostatistician who will be key consumers of large data sets to create predictive models to be used in clinical practice. This is also a limited resource, so prioritizing predictive modeling projects which major impact is key
Data mining and analytics should be taught in medical schools for the next generation of providers. Data visualization will be helpful in exploring this complex, big data. More on this in a future post.
Interests in social media in healthcare and Clinical Research Informatics including secondary use of EMR data. He presents on healthcare social media nationally and internationally. He is also an adjunct professor in health informatics at Kent State University in Ohio. He serves on the advisory board of several health IT startups.
The opinions expressed are my own and not those of my ...