Ethan Taft Intern, Data Science

Collaborative Filtering: Making Recommendations

Share this content:

Most of us have used an online shopping retailer like Amazon, and we’ve all most likely used an online movie service like Netflix. Have you noticed the recommended items to purchase or the recommended movies to watch? Amazon and Netflix build recommendations using similarity scores. One of the popular methods of building and offering a recommendation is called collaborative filtering.

What is collaborative filtering and how does it work?

Here is the logic behind collaborative filtering: If Joe and Jill each buy the book, “Machine Learning with R,” they now have similar purchase histories. If Jill buys a second book, “Learning Bayesian Models with R,” and Joe has not yet, the system will recommend Jill’s second book to Joe. Think of all the other books there are to buy! But the system chose one book based on Joe’s preferences in collaboration with Jill’s preferences. That is why it is called “collaborative.” It interacts with those who are similar. There are a few other recommendation models that differ in implementation from collaborative filtering. Explore some of those here: knowledge-based, content-based.

Item and user-based collaborative filtering

There are two main types of collaborative filtering: item-based (IBCF) and user-based (UBCF). Both can operate on the same data.

IBCF works like this: First, it calculates the similarity between all item pairs. Second, it generates a recommendation. For example, if Joe has already bought an item, the recommendation step takes the items most similar to what Joe has already bought and recommends those items to him.

UBCF works differently: First, it calculates how similar each user is to Joe. Second, it identifies which of those users are most similar to Joe. Third, it rates all items (average rating among similar users) purchased by the most similar users. Fourth, it picks the top-rated items among similar users and recommends them to Joe.

Note: Similarity scores in IBCF and UBCF are calculated using similarity functions like cosine distance, Euclidean distance, and Pearson correlation.

Collaborative filtering in healthcare

There are many use cases for collaborative filtering that go beyond recommending books and movies to users. For example, in healthcare, think of a system to recommend certain providers and clinics to patients based on reviews for particular types of care. Say Jill and Joe each give their primary care physician, Dr. Tom at IHC Downtown, a rating. If Jill and Joe give similar ratings on Dr. Tom, we can assume they may have similar preferences. So if Jill gave a good rating to Dr. Marilyn in oncology at IHC Midtown, the system would also recommend Dr. Marilyn to Joe if needs that type of care in his future. Collaborative filtering would provide a way for Jill and Joe to get their preferred type of care with the best provider or clinic.


With all the available statistical and machine learning models trying to predict healthcare outcomes, it’s important to remember the driving force behind healthcare: the patient. Recommender models are another way to increase patient specific-care and patient satisfaction. Simply put, recommenders enable patients to make choices based on their preferences. Care should be taken, however, not to limit choices, but rather to provide meaningful recommendations along with context and transparency. More specifically, to let users know why they are being recommended a certain doctor or hospital (because of their preferences) and what data the recommender used to determine this.

We’d love to hear about the great work you’re doing in the world of healthcare data science and machine learning, so please reach out to us or chat with the community on Slack.