Yannick Van Huele June 23, 2017

In a previous blog post, Which Algorithms are in healthcare.ai, we gave a broad overview of the various machine learning algorithms available in the healthcareai package.  This week, we’ll delve into the details of one of these models: the lasso. The lasso is an elegant generalization of classical linear models such as linear regression and logistic regression which reduces overfitting and automatically performs feature selection. In particular, the lasso provides a way to fit a linear model to data when there are more variables than data points (for example,…

Taylor Larsen June 02, 2017

In previous posts, we’ve touched on feature engineering and the importance of understanding the use case your machine learning (ML) model intends to accommodate; however, we haven’t touched a lot on defining and choosing the right outcome (dependent) variable. There is definitely a critical thinking component and a technical component (both are important), but if you spend enough time working through the complexities and nuances behind the outcome variable that you are considering, the technical piece will seem more clear. Investing time up front can save significant time…

Mike Mastanduno May 30, 2017

New features in R! One of the most fun things that we do is introduce new features into healthcare.ai. This kind of development satisfies our desire to build things, and when we use the functionality on client projects, it gives that warm fuzzy feelings of having made something useful. Today’s post will outline the use case that led us down the road of developing support for XGBoost and go into the process of how that is done. Along the way, we’ll describe some practical things to know about…

Taylor Miller May 12, 2017

Background – Why Profile? Often models are trained on retrospective data, which is typically highly available and clean. When models are deployed, realtime production data is never as clean or available. The Feature Availabilty Profiler can expose some of these problems. More Details Let’s say that I developed and trained a predictive machine learning model on retrospective data from an inpatient unit. I may be trying to identfy patients at risk for a certain outcome during their stay in the hospital. Let’s presume that the model is performing well with…

Mike Mastanduno April 07, 2017

Ahhh, science. The pinnacle of controlled experiments and the attempt to understand the relationships between different conditions. Does a new drug really have an impact on the treatment of a specific disease? Two identical groups of patients can be set up to either receive or not receive the drug. After the evaluation period, the study can say with statistical certainty whether the drug influenced the disease. We have published other posts (see here and here) on evaluating performance of machine learning models created with healthcare.ai, but lately, we…

Taylor Larsen April 05, 2017

During our regular work building and deploying healthcare machine learning (ML) models, we find ourselves asking interesting questions and investing some serious thought in answering them. When the question seems to be pervasive across use-cases and projects, we figure the discussion might be useful to others in the healthcare ML community. Let’s explore one such topic: Each time we run a model using healthcare.ai, a prediction is output for each specified row. Let’s say that we’re predicting whether Joe (a hypothetical patient) will be readmitted to the hospital within…

Taylor Larsen March 10, 2017

Since we launched healthcare.ai back in December, our development work on the package has kept pace, and feedback and collaboration continue to build momentum. While we can’t deny that healthcare.ai had some great functionality to begin with… we are so motivated to continually improve. With that said, we’re excited to announce some recent improvements to the healthcare.ai package that we hope you will find useful, especially considering that several were direct suggestions from you all. What’s new? We’re official! The healthcare.ai package for R is now up on CRAN…

Mike Mastanduno March 02, 2017

Data science is a hot new field. The basic job function seems to have solidified as “use large datasets to surface actionable insights.” That could mean combing through Facebook’s giant SQL tables to see which users are engaging with the site the most. Or it could mean editing the machine learning code to predict which Spotify songs should show up in a user’s discover weekly playlist. The Data Scientist role is hugely open-ended and varies greatly across companies and industries. Like the role, the path to get there is…

Daniel Barlow February 27, 2017

Guest Post! Thanks so much to Daniel Barlow, MD for contributing from a clinician’s perspective. The healthcare industry generates large amounts of data with patient notes, lab data, time stamps for when events occurred, medication information, radiology images, scheduling data, billing data, and so forth. With the US healthcare system rapidly adopting electronic records, the amount of data being collected is growing, and more data is available for analytics. The rich, growing data sources provide opportunities and problems. A single physician has so much data available at their fingertips to…

Taylor Miller February 14, 2017

Imagine a tool that can read in columnar data, manipulate, transpose, derive, query, describe, analyze, visualize and more. That’s python’s pandas library! In our healthcare.ai python package, we use pandas extensively under the hood since it is robust, fast and proven in data science. It seemed apropos to introduce you to it. This post is also written as a juptyer notebook hosted on our github site. You can follow along there or download it and run it yourself if you prefer. What is Pandas? Pandas is…

Levi Thatcher February 10, 2017

In designing healthcare.ai, we are excited to provide practical machine learning (ML) guidance to health data folks, and feature importance guidance is near the top of the practicality list. When combining algorithms and data to create models, often you don’t want to keep all of your initial features (i.e., input columns) in your final model. You might be wondering why–wouldn’t the model be smart enough to adjust for non-helpful features, such that each feature is used appropriately? Certainly! But, it’s often the case that production environments are resource-constrained by frequent…

Mike Mastanduno February 10, 2017

Note: this post follows a Jupyter Notebook. We have had a lot of say about R on the blog lately! R is a great statistical language to do data manipulation, cleaning, machine learning, and visualization. One reason we really like it on our team is how neatly packaged the tools are. However, there is a significant portion of the machine learning community that uses Python. For our purposes, the benefits of python mainly relate to speed, deep learning, and the ease of working with massive datasets. Until one…

Levi Thatcher January 31, 2017

Whenever we use machine learning (ML) for prediction, the question of how to evaluate the model is of the utmost importance. Of course, different metrics are appropriate for different kinds of models and business questions. If we were to predict patient length of stay (LOS)—a numeric value—we’d use a different model evaluation metric than if we were evaluating a model that predicts 30-day readmissions, which is a binary column (i.e., Y or N). It’s these types of decisions that healthcare.ai streamlines. In healthcare, we’re most commonly focused binary predictions—think…

Mike Mastanduno January 26, 2017

At the most basic level, Machine Learning (ML) is a category of algorithms that learn from historical data and generalize to future data. Having good data is becoming more and more important to successful organizations today. It’s almost becoming a form of currency. But having access to big data is only the first step. Using it effectively is another matter entirely. Healthcare organizations have collected data for years, but data will always be a finite resource. This post seeks to help you decide how to best allocate your data…

Taylor Larsen January 24, 2017

In previous blog posts, we’ve discussed specific applications of machine learning (ML) in healthcare and the available algorithms in healthcare.ai. As you build an ML model, creating and selecting the right features can be just as foundationally important as matching the right algorithm with the right use case. In this post, we will discuss how domain knowledge of healthcare data can be used to create features that make your ML models more accurate and useful. This process is known as feature engineering. Using an ML model off…

Mike Mastanduno January 19, 2017

This blog has been talking a lot about Machine Learning (ML) with regard to tabular data. That makes sense because predictive algorithms based on tabular data are often easy to implement and have a lot of potential to improve outcomes. Also, we have access to a lot of tabular data from the EHR. However, ML is capable of doing a lot more than predicting probabilities on tabular data, and there are incredible opportunities in other areas of healthcare. One in particular is in Radiology and Pathology departments. These departments generate…

Levi Thatcher January 17, 2017

When working with data in healthcare, business intelligence (BI) folks often turn to tools like Excel, SSMS, Tableau, and Qlik. Typically, multiple tools will be used when analyzing a dataset. Sometimes the analyst will use Excel to look at the data, get a sense for how the columns are distributed, perhaps make a histogram or scatterplot. Often, analysts will later turn to Qlik and/or Tableau to provide an interactive app, often hosted on a dedicated server so folks in other departments can explore the same data. In this same process,…

Mike Mastanduno January 12, 2017

The purpose of this post is to help you become familiar with Git, an essential part of contributing to healthcare.ai. Git is essentially a collaboration tool for software developers, and Github is the accompanying online storage platform. If you have been reading about healthcare.ai, you probably know that it is an open source software package. Open source means that we aren’t hiding anything from our users. They can use the package, view the contents, and modify the package for their particular needs. We chose to make healthcare.ai…

Taylor Larsen January 11, 2017

As time goes on, we will not only discuss healthcare machine learning (ML) and health in the US at a high level, but also specific ways ML might help drive outcomes improvements. Many health systems are working on reducing their readmission rate—which is often considered a measure of quality of care and can be tied to penalties. For hospital systems progressing toward ML for readmissions—or any measure—the first step is to identify your most important business questions; the next step is creating a suitable dataset to create the model. There…

Levi Thatcher January 08, 2017

While our previous posts have focused on healthcare machine learning, we’re also excited to post analyses of health data using R and Python. We do this to hopefully elevate the national discussion around health data, enhance the community’s understanding of health in the United States (US), and provide guidance as to how communities and health systems might increase the quality and length of people’s lives. Health Catalyst is an outcomes improvement company, and we realize that the inpatient setting is only one of several venues that affect a person’s health…

Taylor Larsen January 06, 2017

To leverage lessons learned during our model building engagements here at Health Catalyst, let’s explore the subject of data leakage. Data leakage occurs when a predictive model is trained using information that is available in training data but not actually available for predicting outcomes in production. Models with data leakage tend to be very accurate in development, but perform poorly in production, where they are ultimately used. More specifically, leakage in the context of healthcare machine learning occurs when: A feature is used to train the model that would…

Levi Thatcher December 21, 2016

Machine learning has been around for decades and has been used to solve lots of problems. Some of these include spam filtering for email, suggestions on Netflix, optimized playlists on Spotify, custom recommendations on Amazon, facial recognition on Facebook, voice recognition on your phone, language translation on demand, image search in your photo app, and many more. While reading that long and varied list, you may be wondering where healthcare stands by comparison. Even though machine learning can solve many problems…

Mike Mastanduno December 22, 2016

Now that we have been through some of the applications of machine learning (ML) in mainstream technology, we thought it would be nice to give a broader overview of some of the different types of ML and how they might be applied to improve patient care. We explored the algorithms that currently make up healthcare.ai, and alluded to the fact that there is lots of room for expansion. We’ll take this post as an opportunity to speculate on where healthcare ML could go in the near and distant future.

Mike Mastanduno December 15, 2016

Before a new technique in healthcare can be introduced to patient use, it must pass a rigorous set of quality standards. Then, to actually be adopted and see widespread use, a technique must be trusted and accepted by physicians and other front line care workers. For example, new drugs are evaluated in several steps before making into human trials, and then still have several hurdles to clear before they can be accepted as standard of care. Machine learning is poised to make a significant impact in clinical care in the…

Levi Thatcher December 12, 2016

After reading a few articles on healthcare.ai, some of you may be saying, well, that’s great–but what has Health Catalyst actually used it for? Since Health Catalyst has been open with sharing the tool set, it only makes sense that they’d also be willing to share details of its use. As the Director of Data Science at Health Catalyst and founder of healthcare.ai, I oversee all client predictive engagements, and will make a point of frequently updating the community on our work. If you have questions, comments, or criticism, please…

Levi Thatcher December 09, 2016

Many of you might be wondering how your organization could benefit from healthcare.ai. Even though you’re read the broad statements on the home page, you might be asking yourself, “how does healthcare.ai enable my team of analysts or data scientists? And how can it finally bring accurate, informative models to my health system for the first time?” While those looking to get into healthcare machine learning (ML) can certainly use R’s caret package or Python’s scikit learn package to create models, we believe that’s not the most…

Levi Thatcher December 05, 2016

If you read much about technology, you have likely heard about machine learning, but may be wondering how it would work in healthcare. Where’s the low-hanging fruit? And how could it help my clinical team? Throughout healthcare, and many other industries, there are heuristics and established best practices that help people make decisions. A popular example in healthcare is the LACE index, which provides the likelihood of patient 30-day readmission risk. You might have also heard of similar tools like the SOFA Score, Apgar Score, PRISM…

Levi Thatcher December 02, 2016

As time goes on, this blog will touch on many of the technical choices made at Health Catalyst. It will mostly focus on data science. If there’s a particular topic that interests, contact us! Some posts will be short, while others will be in-depth. The tone will be informal, with a focus on content and frequent posts (twice per-week) rather than polish. When we talk about doing things with data, we’ll post the code, so you can follow along. When doing healthcare machine learning, why’d we choice R and…

Levi Thatcher December 01, 2016

Health Catalyst’s data science team is excited to present healthcare.ai. This ambitious new project offers healthcare-specific machine learning packages, as well as analysis, commentary, and advice on leveraging machine learning within any health system, regardless of size. While companies like Google, Microsoft, and IBM are doing machine learning on the outskirts of healthcare, we work from the center. We bring practical, accurate predictive models to health systems interested in improving their operational, financial, and clinical efficiencies. First and foremost, we want to improve patient outcomes. Machine…