is open source software
streamlining healthcare machine learning

What is

The packages are designed to streamline healthcare machine learning. They do this by including functionality specific to healthcare, as well as simplifying the workflow of creating and deploying models. We believe that machine learning is too helpful and important to be handled solely by full-time data scientists. These packages are a humble attempt at machine learning democratization in a realm that needs it most—healthcare.

What does it do?

Both packages provide an easy way to create models on your data. This includes linear and random forest models, ways to handle missing data, guidance on feature selection, proper performance metrics, and easy database connections.

How does focus on healthcare?

The two packages that comprise differ from other machine learning packages in that they focus on data issues specific to healthcare. This means that we pay attention to longitudinal questions, offer an easy way to do risk-adjusted comparisons, and provide easy connections and deployment to databases.

Who is designed for?

While data scientists in healthcare will likely find these packages valuable, the audience targets are those BI developers, data architects, and SQL developers that would love to create appropriate and accurate models with healthcare data. While existing machine learning packages are certainly irreplaceable, we think that there is a set of data problems specific to healthcare that warrant new tools.

Why did Health Catalyst open source this?

We believe that everyone benefits when healthcare is made more efficient and outcomes are improved. Machine learning is surprisingly still fairly new to healthcare and we want to quickly take healthcare down the machine learning adoption path. We believe that making helpful, simple tools widely available is one small way to help healthcare organizations transform their data into actionable insight that can be used to improve outcomes.