Machine learning for healthcare just got a whole lot easier

The software is designed to streamline healthcare machine learning. They do this by including functionality specific to healthcare, as well as simplifying the workflow of creating and deploying models.

Learn more about machine learning via the community by reading and subscribing to our weekly blogs, viewing our weekly YouTube live event broadcasts, and engaging our data science team with questions and answers via email or live events.

Next Live Broadcast
Hosted by
Mike Mastanduno and Mike Levy

ML #28 – 2.0

Most Recent Broadcast Replay

ML #1 – Getting Started in R and RStudio

Everything you need to get started

What has has been used to do?

  • Drive $1M plus in annual savings realized by eliminating an outsourced service line reporting solution.
  • Achieve 50% reduction in central line-associated blood stream infection (CLABSI) rates at a large academic medical center.
  • Improve self-pay collections using intelligent workflows across more than 150K patients per month.
  • Produce literature-beating models across readmissions, infection, and finance—helping clinicians and operations to prioritize resources.

What can I do with

  • Create and compare models based on your data.
  • Save and deploy a model.
  • Perform risk-adjusted comparisons.
  • Do trend analysis following Nelson rules.
  • Improve sparse data via longitudinal imputation.
  • Fill in missing data via imputation.
  • Deploy a model to produce daily predictions.
  • Write predictions back to a database.
  • Learn what factors drive each prediction.

How is it tailored to healthcare?

  • Longitudinal machine learning via mixed models.
  • Longitudinal imputation.
  • Risk-adjusted comparisons.


Our goal with this project is to expedite adoption of ML in healthcare by building pragmatic world class tools to help anyone with access to healthcare data.

You can help in many ways:

  • Try out the packages and let us know what needs improvement!
  • Check out our Github repos

How do I get started? is available in packages for both R and Python, two of the most common languages used by data scientists. If you don’t previous experience with either language, we recommend the R package as it currently has more features and R is more newbie-friendly.

Let's do this!

Access documentation, installation instructions, feature references, as well as hints and tips.

How does focus on healthcare?
Both packages differ from other machine learning packages in that they focus on data issues specific to healthcare. This means that we pay attention to longitudinal questions, offer an easy way to do risk-adjusted comparisons, and provide easy connections and deployment to databases.
Who is designed for?
While data scientists in healthcare will likely find these packages valuable, the audience targets are those analysts, BI developers, and SQL developers that would love to create appropriate and accurate models with healthcare data.

Learn about machine learning in healthcare

Learn from our team of data scientists

Levi Thatcher VP, Data Science
Mike Mastanduno Data Scientist, Health Catalyst
Taylor Miller Data Scientist, Health Catalyst
Taylor Larsen Data Science Engineer, Health Catalyst
Mike Levy Data Scientist, Health Catalyst
YouTube Live

Hands-On Healthcare Machine Learning Weekly Broadcasts

Join Levi Thatcher and the data science team as they discuss machine learning topics with open Q&A every Thursday at 3:00 PM EST

ML #1 – Getting Started in R and RStudio

ML #5 – Open Source Tools for Data Science

ML #6 – for Predicting Extended Length of Stay

ML #8 – Open Healthcare Datasets

ML #9 – From Zero to Your First Open Source Contribution: It Happens Today!

ML #11 – How Do You Evaluate Model Performance?

ML #12 – Deep Dive into Heart Failure Readmissions with Joe Smith

ML #13 – Basic Feature Engineering in Healthcare

ML #14 – A Day In the Life of A Data Scientist

ML #15 – Multiclass Machine Learning in Using XGBoost

ML #16 – Data Science at an Academic Medical Center with Risa Myers

ML #17 – Healthcare Text Analytics and NLP with Mike Dow

ML #18 – Healthcare Analytics and Open Source with Josh O’Rourke

ML #20 – Exploratory Data Analysis in R

ML #21 – Central Line Infection Prevention at IU Health, with Kristen Kelley

ML #22 – Machine Learning 101

ML #23 – A Survey of the Opioid Epidemic

ML #24 – Training for Healthcare Machine Learning with Rick Wolf of Insight Data Science

ML #25 – The How and Why of R for Data Work, with Xiao Liu

ML #26 – Start-up Healthcare Machine Learning with Eric Carlson

ML #27 – Contributing to Open Source Projects Using Github

Read the latest from our Data Science Blog

View weekly blogs for tips and advice on machine learning in healthcare.
Subscribe to receive posts via email.

Levi Thatcher April 23, 2018

­­­­We started in late 2016 to bring machine learning (ML) to the healthcare masses. As we release version 2.0 of the software (on April 20th), it’s worth stepping back to fully understand why we invest in this open-source project, which is freely available to all. Why would a for-profit firm spend time investing in this public good? Since the 2009 HITECH act incentivized EHR adoption, data has become much more ubiquitous in healthcare. Despite all that’s gone wrong in US healthcare, the fact that healthcare data is…

Levi Thatcher March 28, 2018

Many vendors deliver machine learning models with different applications in healthcare. But they don’t all deliver accurate models that are easy to implement, targeted to a specific use case, connected to actionable interventions, and surrounded by a machine learning community and support team with extensive, exclusive healthcare experience. These machine learning qualities are possible only through a machine learning model delivered by a vendor with a unique set of capabilities. There are five differentiators behind effective machine learning models and vendors:
  1. Vendor’s expertise and exclusive focus on healthcare.
  2. Machine learning model’s access to extensive data sources.
  3. Machine learning model’s ease of implementation.
  4. Machine learning model’s interpretability and buy-in.
  5. Machine learning model’s conformance with privacy standards.
These five factors separate the high-value vendors and models from the crowd, so healthcare systems can quickly implement machine learning and start seeing improvement results.

Why did Health Catalyst open source this?

We believe that everyone benefits when healthcare is made more efficient and outcomes are improved. Machine learning is surprisingly still fairly new to healthcare and we want to quickly take healthcare down the machine learning adoption path. We believe that making helpful, simple tools widely available is one small way to help healthcare organizations transform their data into actionable insight that can be used to improve outcomes.