Machine learning for healthcare just got a whole lot easier

The packages are designed to streamline healthcare machine learning. They do this by including functionality specific to healthcare, as well as simplifying the workflow of creating and deploying models.

Learn more about machine learning via the community by reading and subscribing to our weekly blogs, viewing our weekly YouTube live event broadcasts, and engaging our data science team with questions and answers via email or live events.

Most Recent Broadcast Replay
Hosted by
Levi Thatcher and Mike Mastanduno

ML #27 – Contributing to Open Source Projects Using Github

Everything you need to get started

What has has been used to do?

  • Drive $1M plus in annual savings realized by eliminating an outsourced service line reporting solution.
  • Achieve 50% reduction in central line-associated blood stream infection (CLABSI) rates at a large academic medical center.
  • Improve self-pay collections using intelligent workflows across more than 150K patients per month.
  • Produce literature-beating models across readmissions, infection, and finance—helping clinicians and operations to prioritize resources.

What can I do with

  • Create and compare models based on your data.
  • Save and deploy a model.
  • Perform risk-adjusted comparisons.
  • Do trend analysis following Nelson rules.
  • Improve sparse data via longitudinal imputation.
  • Fill in missing data via imputation.
  • Deploy a model to produce daily predictions.
  • Write predictions back to a database.
  • Learn what factors drive each prediction.

How is it tailored to healthcare?

  • Longitudinal machine learning via mixed models.
  • Longitudinal imputation.
  • Risk-adjusted comparisons.


Our goal with this project is to expedite adoption of ML in healthcare by building pragmatic world class tools to help anyone with access to healthcare data.

You can help in many ways:

  • Try out the packages and let us know what needs improvement!
  • Check out our Github repos

How do I get started? is available in packages for both R and Python, two of the most common languages used by data scientists. If you don’t previous experience with either language, we recommend the R package as it currently has more features and R is more newbie-friendly.

Let's do this!

Access documentation, installation instructions, feature references, as well as hints and tips.

How does focus on healthcare?
Both packages differ from other machine learning packages in that they focus on data issues specific to healthcare. This means that we pay attention to longitudinal questions, offer an easy way to do risk-adjusted comparisons, and provide easy connections and deployment to databases.
Who is designed for?
While data scientists in healthcare will likely find these packages valuable, the audience targets are those analysts, BI developers, and SQL developers that would love to create appropriate and accurate models with healthcare data.

Learn about machine learning in healthcare

Learn from our team of data scientists

Levi Thatcher VP, Data Science
Mike Mastanduno Data Scientist, Health Catalyst
Taylor Miller Data Scientist, Health Catalyst
Taylor Larsen Data Science Engineer, Health Catalyst
Mike Levy Data Scientist, Health Catalyst
YouTube Live

Hands-On Healthcare Machine Learning Weekly Broadcasts

Join Levi Thatcher and the data science team as they discuss machine learning topics with open Q&A every Thursday at 3:00 PM EST

ML #27 – Contributing to Open Source Projects Using Github

ML #26 – Start-up Healthcare Machine Learning with Eric Carlson

ML #25 – The How and Why of R for Data Work, with Xiao Liu

ML #24 – Training for Healthcare Machine Learning with Rick Wolf of Insight Data Science

ML #23 – A Survey of the Opioid Epidemic

ML #22 – Machine Learning 101

ML #21 – Central Line Infection Prevention at IU Health, with Kristen Kelley

ML #20 – Exploratory Data Analysis in R

ML #18 – Healthcare Analytics and Open Source with Josh O’Rourke

ML #17 – Healthcare Text Analytics and NLP with Mike Dow

ML #16 – Data Science at an Academic Medical Center with Risa Myers

ML #15 – Multiclass Machine Learning in Using XGBoost

ML #14 – A Day In the Life of A Data Scientist

ML #13 – Basic Feature Engineering in Healthcare

ML #12 – Deep Dive into Heart Failure Readmissions with Joe Smith

ML #11 – How Do You Evaluate Model Performance?

ML #9 – From Zero to Your First Open Source Contribution: It Happens Today!

ML #8 – Open Healthcare Datasets

ML #6 – for Predicting Extended Length of Stay

ML #5 – Open Source Tools for Data Science

ML #1 – Getting Started in R and RStudio

Read the latest from our Data Science Blog

View weekly blogs for tips and advice on machine learning in healthcare.
Subscribe to receive posts via email.

Yannick Van Huele January 25, 2018

Last summer we discussed the simplified interface of the 1.0 CRAN release of, and we’re now thrilled to demo new features related to clinician guidance in the 1.2 version. We’re calling this Patient Impact Predictor (PIP). Patients like this, should be treated like this This week we’d like to highlight new functionality that allows one to go a step beyond surfacing predictions to also surface targeted interventions. Risk scores are a great first step, but prescriptive guidance is where the results of machine learning (ML) may actually…

Mike Mastanduno October 10, 2017

A good data scientist will have command of a large breadth of knowledge, from machine learning and statistics to business instinct or software engineering. Part of what makes this job exciting is the possibility of driving insights or improvements from any one of those skills. A data scientist may or may not know all the skills ahead of time, but they are able to step back, understand where there might be a high return on investment, and learn the skills necessary to take advantage. Recently, our team announced the release…

Why did Health Catalyst open source this?

We believe that everyone benefits when healthcare is made more efficient and outcomes are improved. Machine learning is surprisingly still fairly new to healthcare and we want to quickly take healthcare down the machine learning adoption path. We believe that making helpful, simple tools widely available is one small way to help healthcare organizations transform their data into actionable insight that can be used to improve outcomes.