ML #10 - Create Actionable Machine Learning Visualizations that Clinicians Can Trust

Hosted by Taylor Larsen

April 27, 2017 - 20min

Share this content:

Building a predictive model is one thing, but effectively delivering guidance to clinicians can be a project in itself. In this episode we'll discuss:

- How to create an actionable user interface
- How to balance simplicity and transparency
- How to integrate with

Full Broadcast Transcript

Links to Materials Mentioned:


Levi: What’s on the top of the dock here today? What have we got going on? Mailbag seems a decent place to start.

Taylor: Yeah, for sure.

Levi: So first question. It’s a good one actually. It popped up in the Slack Chat this week. It’s, “When I’m doing machine learning, how do I handle my binary columns?”

So let’s say you have a gender column, or a yes-no column – it’d be more appropriate. So, Taylor, you’ve dealt with this a lot. So any points of uni instructions as to the best way to go in the machine learning world?

Taylor: For handling binary columns, I mean, pulling them in as either a dummy variable which will help you do as well or bringing them in just normally as, say, male – female for you gender or a 1 – 0. But you want to be careful not to make it look like there is some sort of linear relationship between those values.

Levi: Yeah, that’s the problem with 1 – 0’s sometimes.

Taylor: Yes. Yup.

Levi: So it’s kind of hard. We have to assume certain things in, in the packages or in the software. And so it’s actually helpful if you code yes or no as Y or N instead of the 0 or 1 because then the software knows like, “Okay, well, this isn’t actually a linear thing, it’s more of a binary type thing.”

Taylor: Yup.

Levi: So we should make that more clear in the docks because that did pop up in the Slack Channel. But please fire away in the live chat or also on the Slack Channel if you have more in-depth questions. We’d love to engage you and learn more about these things together.

Taylor: Oh, and the software will pick up on the yes-no for like your outcome variable if you’re doing a classification problem. But, yeah, definitely in your regular binary predictive columns, you want to be sure and make it do explicit—

Levi: Yeah. Takes away some of the ambiguity, I guess, is a way to put it.

Taylor: Yeah. 

Levi: So the next question is, “Why can’t we just use SQL for machine learning?” So this is something that we’ve come across in our engagements with analysts in healthcare. And a lot of folks say, “Well, yeah, SQL does a lot of stuff.” And you’ve been using SQL for years.

Taylor: Yeah. 

Levi: How long did you get into SQL? How did that—

Taylor: Yeah. Just kind of out of grad school – a means to an end. So yeah, that’s how you access the data to then use it in SaaS or in other areas. And then, of course, here at Health Catalyst there are so many things that we’re doing with SQL that that’s a lot of folks’ comfort zone. And so, it’s easy to want to revert to that. But there are some limitations, just while there’s some great things about it. We’re encouraging folks to pull the data with SQL if that’s the mode you choose but there’s definitely some limitations.

Levi: Yeah, yeah. Definitely. So, it feels like more of a rule-based type system where you’re saying, “Okay, all the times you’ll have these case statements where you’re saying if this, then that.”

Taylor: Mm-hmm. Yeah, so you’re definitely hard coding specific things like almost kind of creating your own coefficients where if a certain feature value is above this or below this, then categorize it as such or give it a point value of something. There’s also many common cases of creating a regression of some sort and then hard coding coefficients into SQL. And that just keeps things a lot more static and doesn’t allow for some of the dynamic things that inherently come up in the data.

Levi: Yeah. That’s great, the dynamic part. So with machine learning and R and Python, it gets you away from those hard coded rules and really makes your job a lot easier. And then it sets the rules for you automatically.

Taylor: Yeah. And with the rules, you’re definitely relying on either your own expertise or someone else’s expertise, or—

Levi: In SQL, right?

Taylor: In SQL, yeah. Or some sort of published results where, with machine learning and, you’re definitely able to find out the features that are important that you might not have thought were important before when you’re kind of creating your rule-based system in SQL so.

Levi: That’s a good point.

Taylor: Yeah.

Levi: Awesome. So we love the questions. Keep them coming in chat, in Slack. Reach out because we want this, not just to be us talking but we want to learn together. That’s the whole point of It’s that it’s not people working in silos and figuring out these problems on their own but really like we’re helping each other and kind of all rising together. And that’s kind of what differentiates us from a lot of other machine learning education as well is that we’re learning from you and you’re learning from us. It’s sort of a two-way street here. So we’re really excited to see what you guys [inaudible 00: 04: 31] on your projects.

Taylor: Yeah. The community aspect is pretty awesome when folks reach out and say, “What about this?” And occasionally, when it’s like we’ve really got to consider, “what about that?” We haven’t thought of everything. We’d love to think of everything though with your help, so we appreciate it.

Levi: Yeah, for sure.

So mailbag. That’s it for mailbag this week. But keep them coming.

Should we dive into visualizations?

Taylor: Sure.

Levi: It feels like something we talked about at all. So is great in delivering a probability, a risk of something, but how do we start to think about how to present this to a clinician? It’s such a broad topic.

Taylor: Yeah. And so, we’ve kind of thought about that as far as how to present some of these topics. And some of the ways we do that are through work lists and through some charts and things. We’ll get to some of that in a little bit.

And I also wanted to keep it kind of high enough level that we chat more about concepts than just our specific use cases which we’ve gone over in previous broadcasts. But just for those of you that are either new to or haven’t gotten all the way through the steps to have some output, I threw up on the screen kind of what our output includes so that you kind of know what you have to visualize.

So the output for is a grain ID which would be something like a visit ID or a patient ID. The last load date, since a lot of times we’re appending yesterday’s predictions to today’s predictions. And then tomorrow’s predictions will get appended to those. And then also, the predicted probability. So the probability that someone is going to be re-admitted, the probability that they’re going to have an extended length of stay or pay their bill – whatever the problem is that you’re solving with

And then another handy feature is also the three most contributing factors to that high risk. And so, kind of the little example I threw on the screen, you could see that there’s a couple of grain IDs for each person over two different dates. And you can see that the predictive probability can change over time. However, this isn’t maybe the best to way to visualize it but this is the output of the model so.

Levi: Yeah. That’s a great point.

So let’s imagine that you’ve created your model. You’ve deployed predictions to SQL server or your other database. If you don’t mind just going back real quick, so this grain ID, I wasn’t familiar with grain before, but you use that. So you’re a [inaudible 00: 07: 04] for a while?

Taylor: Yeah.

Levi: Was that used there?

Taylor: Yes. It’s kind of a common— you kind of always need a unique identifier. And so, that’s kind of what the grain ID serves a purpose as. But it’s not always unique inherently. But a visit ID is an example, a patient ID, maybe even the combination of a visit ID and a day if, say, you’re predicting things like infections on a daily level or something.

Levi: Oh, that’s a good point.

Taylor: But yeah, it kind of just depends on the thing you’re looking at. If you were predicting whether or not a provider would do some specific thing, maybe the provider’s ID would be your grain ID.

Levi: Oh yeah.

Taylor: It kind of just depends on how deep into the data you’re going. Yeah.

Levi: That’s a great point. Awesome.

So let’s see. We have the results in a database—

Taylor: Yeah. Sorry [inaudible 00: 07: 55] back and forth.

Levi: No, no.

No. That’s good because we have the results in the database. And then kind of where do we head from there?

Taylor: Yeah.

So going from the output to some sort of visualization– like I mentioned, as we were kind of opening, sorting lists, visualizing risk over time, and visualizing the risk distribution of a population. And now that you saw kind of the output, you can kind of picture kind of these use cases or the ways you do it.

So sorting lists, an example of that is on our COPD visit list here from one of our demos where the visit list is actually sorted in a ranked order from highest risk to lowest risk. And so, that’s just using that predicted probability column. And you’re maybe doing some sort of a selection to say, “I want to look at the patient’s most recent risk score.” So then you might also need to use that last load date. And this just kind of takes like a visit list which we have in a lot of our applications, or I feel like it’s a pretty common thing just visualizing a list of values, and adding some sort of context or meaning to it. So, all of a sudden, bringing those highest-risk patients to the top adds a little bit more value versus like a list that maybe is sorted by alphabetical order or in some sort of random order. This provides a little bit more context.

Levi: Yeah. That happens a lot in healthcare. So a lot of times, these lists aren’t ordered in a helpful way. So this is fantastic.

So we had a question that came in. Someone asks, “How would you go about doing predictions on infections on a daily basis?” So can this handle— what kind of frequency of predictions does— you know, is it flexible?

Taylor: Yeah.

I mean, it’s as flexible as your data is. So if you had a frequent enough data, you could run—I mean, you can run live predictions and predictions often as the data is refreshed. But if you were to predict infections on a daily basis then maybe you’re just considering any new values for your variables each day.

Levi: Yeah.

So maybe you have like your bathing rate changes throughout the day. You’re getting more baths or you’re getting different lines in for like CLABSI, if you’re talking central line infections. Or with sepsis, it seems like a lot of variables might change with sepsis [inaudible 00: 10: 17]. 

Taylor: Yeah. Sepsis, you know, your risk could change instantly as your temperature changes as you maybe get a specific intervention or a certain lab value finally comes back from the lab, you would maybe see that change.

Levi: It’s a great point.

So how it’d work is that makes new prediction based on that new attribute. It goes to the csv file or the database. And then you’ll be able to see it right here, with this framework.

Taylor: Yeah.

So then, a list like this would just update the same thing also as often as your data is updating. And if, all of a sudden, two people kind of switched order in who is at most risk, someone new might float to the top of that list which would kind of bring them to the attention of a clinician or someone that could do something.

Levi: Yeah. That’s fantastic.

It’s such a new concept – decision support tools. So across healthcare, the idea of using data to drive an optimized decision making is so brand new. Like, if you had–

You work with clients from time to time, so how does the clinician receive this? Or like what’s their thought on this sort of dashboard compared to what was used in the past?

Taylor: Yeah.

Well, we can touch on some of these concepts as we go but keeping things simple. Like in this example, having the sorting done for them so that they don’t need to decide like, “Okay, I want to sort by risk”. Or having the population already filtered down. So maybe going into things like, just an exacerbated population or just folks that are in the hospital, makes it a lot cleaner versus a lot of interaction just to get down to the list that they want. So having those conversations upfront, asking what things the clinician would like to see makes a lot more helpful.

And in this example, I apologize. The screenshot is really small on this slide.

But also doing things like translating a field like this where you can see, on this one, prednisone or pulmonary rehab referral. Things like that are in kind of their original variable format there. They’re not as in much of a sentence format. And so, maybe creating a reference table so that a clinician is not seeing something like a ‘prednisone flag.y’ that’s coming straight from the model output. And instead, maybe running it through a reference table to translate into that – “The patient is at higher risk because they were on prednisone” or something like that to make it a little bit more user friendly, a little bit more easy to read. And keeping in mind that there are certain conventions that you might want to follow that should be consistent with other ways they’ve interacted with data in the past or with work lists in the past. And you might have a really great shiny new idea. Sometimes, the best thing is to keep things very simple and to kind of know your audience so.

Another interesting visualization with that same type of data, after we’ve kind of looked at the visit list for a little bit, is the distribution of risk for your whole population. So if you’re kind of trying to figure out how many folks are in a high-risk category or a low-risk category? How’s my population distributed based on this model? You’re able to just do a simple graph like this.

We’re not trying to get into any too fancy a visualization or anything today but at least have some things to show of what have been useful in the past for when we present something to clinicians and they want to know how things look on a broader perspective as well as on a really detailed perspective. The work list goes straight from very, very detailed to a much higher and much more aggregate level.

And then maybe to the point of, say, the daily infection risk. This is just a very basic but a graph of one patient’s risk over time. And I think this is really telling if you’re producing predictions on a daily basis or even an hourly basis, watching for these blips in risk maybe lead to this increase in risk. Well, of course, you also have the three most contributing factors on hand so that might be a drill down for this risk score. So clicking into this specific prediction, you’d be able to know a bit more about that.

If it’s something like an infection risk, you would hope to see that risk go down over time, maybe towards the end of the hospital visit. If it’s something like a patient’s likelihood to acquire a certain disease like diabetes, you would maybe, as a clinician, hope that their risk is staying stable or was high and is coming back down. And these type of things help illustrate the point to a clinician but they also help the clinician deliver this type of conversation to a patient, I think.

Levi: Yeah, that’s kind of a two-step process.

Taylor: Yup.

Levi: Well, that’s a great point. So it’s kind of two different dimensions. You could think of it as so first, stratifying people today and then also showing a person’s risk over time.

Related to that, we had a question from Jonas who asked, “Would you include a probability value with the classification prediction?”

Taylor: Yeah.

That one is kind of tough. We’ve gotten feedback with the probability value. Trying to have the context for that is a 63% likelihood of something high, is that low? So then, maybe pairing that with a percentile like if you have a probability of 63%, maybe also visualize that that’s in the top 90th percentile or—

Levi: Some coloring.

Taylor: Exactly. Yeah, so the same with the shading. And then other things would be if they’re willing to categorize to a low, moderate, or high so.

Levi: It’s a great question.

I guess, to step one higher level – one level up from that is some people wonder if actually produces yes-no’s or probability. So we do offer the probability since, depending on your use case, you may want to work with your thresholds and say, “Anybody above a 0.5, let’s put as a red. Anybody above 0.4—“ so it gives you some flexibility, I guess, I would say.

Taylor: Yup.

And so, in the use case where we had this kind of dark blue down to light blue, it wasn’t so much about a high or low. Ot was a relative to other folks on the list based on the things that you have filtered. And then same thing here, showing the probability might be helpful. But also, the percentile is in there as well.

Levi: Yeah. It’s a hard tradeoff but keeping it super simple seems like a good way to go.

Taylor: Yup.

The simplicity seems to go quite a long ways. And then just having the conversation of how things are going to be used, what they’re going to be used for – all those questions, so.

Yup, so some of the kind of design techniques that Health Catalyst uses and that uses, as we try and visualize some of these things, are the structure and hierarchy, so keeping all of that in mind, the clarity – making sure that the main point of your visualization is clear. So like that, I thought that the distribution one is really clear. You’re just showing based on the different risk levels, who falls into which category, and kind of how that shape fits in. And then also the simplicity – not trying to make everything so complicated. Not trying to give the user everything in the world. Just giving them the things that they need which usually is found through things like user testing.

So kind of a reminder, I saw this quote out there from Albert Einstein, “make things as simple as possible but not simpler.” And that’s a fine balancing.

Levi: Yeah. Yeah. You can only dumb things down so much, I guess, we can put it that way. You still have to deliver something new and novel or else what’s the insight or the addition? 

Taylor: Yup. So, [inaudible 00: 18: 32] things like a simple arrow to show that someone’s risk level is trending up versus trending down. If that is going to answer the question, then don’t make it more complicated than a color-coded arrow or something but—

Levi: For sure.

Taylor: Yeah.

So kind of some of our typical design process steps are talk and listen. Those are the conversations with the clinicians. So you can get their questions answered. They can understand.

On the top part they can understand what’s possible, technically. And you can hear what they need, clinically. And then sketching out, not spending too much time building a finished app or a finished visualization if you don’t really know what they want. So sketching things out is great, then prototyping, and user testing are really great.

So I also threw up some questions on how to have some of those conversations for that talk and listen part. So, “What am I trying to prove or learn? What does the person need?” Those are some of the key questions to ask if you really want to get something that is going to be trusted by the end users and it’s going to be useful versus like something that looks really awesome and you’re really proud, from a technical perspective, but is maybe not as useful as you’d hope.

Levi: Yeah. I feel like that happens a lot with technology. It’s like [inaudible 00: 19: 52] the tech geeks like us, we’re super excited about machine learning and visualization, but you have to know your business questions and your end user. It’s like a lot of times we lose focus of that. And it’s great you brought that up.

Taylor: Yeah. Yeah. And so, then just a few reminders on when people are reading charts, visually your eyes don’t always go in order. You’re going to go to the thing that is most emphasized on the chart so make sure that’s your main point of the chart. Make sure it’s not some sidebar, that just provides some context or some additional information, but the main point really stands out.

And keep in mind that the mind can only process so many things at once, so trying to keep the cognitive load as low as possible and do as much of the work for the user as possible with the visualization so that they’re not having to store things in their memory. Go back and compare two things if that’s the point of the chart, so keeping that in mind. As I mentioned earlier, relying on conventions and metaphors so that people can be comfortable with what they know and what they’re used to.

Yeah, here are some kind of great ways to get started, I threw these out there. There is a Qlik view and tableau personal versions. We love Shiny with R for visualizing straight from R and within kind of that same realm. And so, you can kind of go out and try some of these visualizations – oop clicked away from there, and end up with the output from And one of these visualization tools – any visualization tool will really get you started.

Levi: That’s an awesome point. So we’re excited to promote free tools and help you people get started on their laptops today.

So in terms of the workflow, if we just think through tor a second. So you have some data. Put it in That creates a prediction that you put either into a database or a csv file. And then these can read from that database or csv file, right?

Taylor: Yeah.

Levi: You can use tableau and Qlik, I would imagine?

Taylor: Yeah. Yeah, you can load however you need to.

Levi: Awesome. Yeah, so [inaudible 00: 22: 00] files or databases, whatever is simplest for you, and we have instructions on about using either of those. And then you can dive into your tool of choice here.

Did you use Qlik or Tableau more? How—what’s [inaudible 00: 22: 14]?

Taylor: So like that earlier visualization with the work list, that’s a pretty standard tableau visualization that we use pretty regularly. Qlik view can do quite similar things. And it’s kind of just another one of those and it sort of depends on your comfort level.

Levi: For sure. And it’s a little bit different. So, in R Python you can very easily create 2D images and plots but this is more of something that’s interactive, that’s dynamic where you can sort, filter. Those are kind of the most common.

Taylor: Yeah. Sort and filter. And click from one spot and be able to filter down on a different tab or a different chart is really nice. And so, you’re still keeping everything at that same grain ID as you feed into here so that you have a ton of flexibility on what you want to filter and visualize.

Levi: Yeah, interactivity is really fantastic with those.

Taylor: Yup.

Levi: Great stuff, Taylor. Really excited about all of this.

Taylor: Yeah. And just a couple of additional resources, some user experience reading lists, just some general books that we recommend for good charts, usability and then some good design guidelines. And so these are pretty high level. Some of them have some good humor, some good suggestions. And they don’t necessarily apply 100% to machine learning but applying those same concepts, when you’re visualizing the data from your models is great to keep in mind.

Levi: Yeah, fantastic resources. I just had a comment from [inaudible 00: 23: 43] saying that, “As a storyteller in tableau, it’s always difficult to decide where the fine line is between overwhelming someone and not giving them enough.” I guess it’s just iterative process?

Taylor: Yeah, iterative. Yeah, it’s definitely related to that conversation piece where you’re really trying to understand the user. And then I would say– like I said, the prototype and then getting the user, you know, turning over the application or the visualization really quickly to let them start interacting with it and having kind of a small user group that, all of a sudden, they’ll do something that you didn’t expect them to do or they’ll ask for something that’s not there. And so, kind of let that evolve a little bit more organically than trying to guess and throw everything in the world at them because that is a lot of times what a user thinks they do want.

Levi: Yeah, yeah.

Taylor: They think they want everything. They may only need certain things. And they may actually be missing something that you should have had on there so.

Levi: That’s a good point. And it kind of ties into the agile methodology of software development so bring the user in early and iterate with small little changes. That’s beautiful.

Taylor, we’d love to have you back soon. Thanks for coming in.

Taylor: Okay, yeah.

Levi: Thanks for joining us. We’ll talk to you next week. We appreciate you being with us.

Taylor: Yeah. Thank you.

Levi: Thank you for joining us today. Remember to like, share and subscribe. Comment below and click the links below to join our Slack Channel and our Community.

What topic or projects should we feature?

Let us know what you think would make it great.