A good data scientist will have command of a large breadth of knowledge, from machine learning and statistics to business instinct or software engineering. Part of what makes this job exciting is the possibility of driving insights or improvements from any one of those skills. A data scientist may or may not know all the skills ahead of time, but they are able to step back, understand where there might be a high return on investment, and learn the skills necessary to take advantage. Recently, our team announced the release…
Data Science Blog
healthcareai-R 1.0: more features, simpler interface
A good data scientist will have command of a large breadth of knowledge, from machine learning and statistics to business instinct or software engineering. Part of what makes this job exciting is the possibility of driving insights or improvements from any one of those skills. A data scientist may or may not know all the skills ahead of time, but they are able to step back, understand where there might be a high return on investment, and learn the skills necessary to take advantage.
Recently, our team announced the release of https://healthcare.ai/announcing-1-0-release-healthcareai-python/“>healthcare.ai Python version 1.0. Many of the pain points that were addressed in this update were also present in the R version. Rather than a huge refactor, we decided to address only the most pressing needs. Now that the updated, version 1.0 of the R package is https://cran.r-project.org/web/packages/healthcareai/index.html“>up on CRAN, we’d like to announce it here.
Goals of release 1.0
healthcare.ai is intended to serve a wide range of users, from the least technical to the most technical. In using the R package to deploy machine learning models at health systems all across the country, we discovered 3 things:
1. The package was very good at making the process of development and deployment simple for non-experts.
2. It was hard to customize usage of the model building pipeline.
3. Client requests were outpacing the existing functionality in a few key areas.
Basic users get simpler tools with higher performance
The main goal of healthcare.ai is to make it easy for non-experts to build and deploy models. The existing tools accomplish that goal well, but struggled with datasets of several hundred thousand rows.
We addressed this by removing the
testWindowCol parameter that differentiated training data from test data in model deployment. Many users found this confusing and are enjoying the simplified workflow. Model training is now only done once, in development, and model deployment only requires new data to be loaded into memory, instead of requiring the entire dataset to be present. This change reduces predictive load on servers, and makes real time machine learning possible.
While most users were able to get everything that they wanted out of healthcare.ai, we found that some model builders were left wanting more. The trade off between abstracting away obscure details and opening them up is always difficult, and we decided to shift the tools towards customization in a few places.
– All of the SQL functionality has been decoupled from the deployment process. This allows users to write to flat .csv files or any SQL engine they prefer.
– Additionally, we gave users the ability to name their models to simplify model management and versioning.
– Finally, we updated documentation to thoroughly describe all the methods of each class, making advanced usage more transparent.
New features to get more out of the same data
Improving healthcare outcomes is the core of healthcare.ai’s usage. We felt compelled to add new features to address more machine learning use cases and drive outcome improvements in more ways. Three examples are:
- Multiclass Classification: A client wanted to assign each patients’ service lines (50 possible) for a reporting initiative. They used this multiclass functionality to save over $1 million dollars annually.
Finding Variation: This functionality automates the task of looking across groups and automatically show the subsets with the highest variance and biggest differences from the mean. A client used it to show outliers (both good and bad) of physicians’ use of high-cost imaging methods, reducing unnecessary exams and saving costs.
Kmeans Clustering: This classic clustering method has been applied to find similar groups of patients, challenge incorrect diagnoses, find missing diagnoses, and even group conference attendees by networking preferences.
The bright future of healthcare.ai
With this new set of changes and features behind us, we are even more excited to help you develop custom machine learning models on your data, for your challenges.
As this process is ongoing, we have already begun talking about the next round of features and simplifications. Next up we hope to deepen our telemetry offerings and refactor some of the code’s guts to be more consistent with R code in general to better connect with the open-source community.
We invite all of you to install the latest version and try out the new features.
Get it within R using
install.packages("healthcare.ai"), or download the https://github.com/HealthCatalyst/healthcareai-r“>bleeding edge from github.
Please https://healthcare.ai/slack/“>join our Slack channel to get help if you are stuck and to discuss ideas for using healthcare.ai to improve healthcare outcomes!