Machine learning for healthcare just got a whole lot easier

The software is designed to streamline healthcare machine learning. They do this by including functionality specific to healthcare, as well as simplifying the workflow of creating and deploying models.

Learn more about machine learning via the community by reading and subscribing to our weekly blogs, viewing our weekly YouTube live event broadcasts, and engaging our data science team with questions and answers via email or live events.

Most Recent Broadcast Replay
Hosted by
Mike Mastanduno

ML #28 – 2.0

Everything you need to get started

What has has been used to do?

  • Drive $1M plus in annual savings realized by eliminating an outsourced service line reporting solution.
  • Achieve 50% reduction in central line-associated blood stream infection (CLABSI) rates at a large academic medical center.
  • Improve self-pay collections using intelligent workflows across more than 150K patients per month.
  • Produce literature-beating models across readmissions, infection, and finance—helping clinicians and operations to prioritize resources.

What can I do with

  • Create and compare models based on your data.
  • Save and deploy a model.
  • Perform risk-adjusted comparisons.
  • Do trend analysis following Nelson rules.
  • Improve sparse data via longitudinal imputation.
  • Fill in missing data via imputation.
  • Deploy a model to produce daily predictions.
  • Write predictions back to a database.
  • Learn what factors drive each prediction.

How is it tailored to healthcare?

  • Longitudinal machine learning via mixed models.
  • Longitudinal imputation.
  • Risk-adjusted comparisons.


Our goal with this project is to expedite adoption of ML in healthcare by building pragmatic world class tools to help anyone with access to healthcare data.

You can help in many ways:

  • Try out the packages and let us know what needs improvement!
  • Check out our Github repos

How do I get started? is available in packages for both R and Python, two of the most common languages used by data scientists. If you don’t previous experience with either language, we recommend the R package as it currently has more features and R is more newbie-friendly.

Let's do this!

Access documentation, installation instructions, feature references, as well as hints and tips.

How does focus on healthcare?
Both packages differ from other machine learning packages in that they focus on data issues specific to healthcare. This means that we pay attention to longitudinal questions, offer an easy way to do risk-adjusted comparisons, and provide easy connections and deployment to databases.
Who is designed for?
While data scientists in healthcare will likely find these packages valuable, the audience targets are those analysts, BI developers, and SQL developers that would love to create appropriate and accurate models with healthcare data.

Learn about machine learning in healthcare

Learn from our team of data scientists

Jason Jones PhD Chief Data Scientist
Health Catalyst
Mike Mastanduno Data Scientist
Health Catalyst
Taylor Larsen Data Scientist
Health Catalyst
Daniel Barlow
Yannick Van Huele Data Science Intern, Health Catalyst
YouTube Live

Hands-On Healthcare Machine Learning Weekly Broadcasts

Join Levi Thatcher and the data science team as they discuss machine learning topics with open Q&A every Thursday at 3:00 PM EST

ML #28 – 2.0

ML #27 – Contributing to Open Source Projects Using Github

ML #26 – Start-up Healthcare Machine Learning with Eric Carlson

ML #25 – The How and Why of R for Data Work, with Xiao Liu

ML #24 – Training for Healthcare Machine Learning with Rick Wolf of Insight Data Science

ML #23 – A Survey of the Opioid Epidemic

ML #22 – Machine Learning 101

ML #21 – Central Line Infection Prevention at IU Health, with Kristen Kelley

ML #20 – Exploratory Data Analysis in R

ML #18 – Healthcare Analytics and Open Source with Josh O’Rourke

ML #17 – Healthcare Text Analytics and NLP with Mike Dow

ML #16 – Data Science at an Academic Medical Center with Risa Myers

ML #15 – Multiclass Machine Learning in Using XGBoost

ML #14 – A Day In the Life of A Data Scientist

ML #13 – Basic Feature Engineering in Healthcare

ML #12 – Deep Dive into Heart Failure Readmissions with Joe Smith

ML #11 – How Do You Evaluate Model Performance?

ML #9 – From Zero to Your First Open Source Contribution: It Happens Today!

ML #8 – Open Healthcare Datasets

ML #6 – for Predicting Extended Length of Stay

ML #5 – Open Source Tools for Data Science

ML #1 – Getting Started in R and RStudio

Read the latest from our Data Science Blog

View weekly blogs for tips and advice on machine learning in healthcare.
Subscribe to receive posts via email.

Why did Health Catalyst open source this?

We believe that everyone benefits when healthcare is made more efficient and outcomes are improved. Machine learning is surprisingly still fairly new to healthcare and we want to quickly take healthcare down the machine learning adoption path. We believe that making helpful, simple tools widely available is one small way to help healthcare organizations transform their data into actionable insight that can be used to improve outcomes.