Collaborative Filtering (CF) is a Recommendation System approach powered by interactions that past users have had with items. By mapping out users that tend to like and dislike the same items, CF systems give recommendations based on what similar users like. This approach is, therefore, based on the assumption that similar people tend to like similar items.
In our approach, we have made use of item-item CF. We train a K-Nearest Neighbors (KNN) model on upwards of 270,000 user-recipe interactions, with 7000+ unique users and 16,000+ unique recipes. Every row in our training set represents a unique recipe and every column represents a unique user.
A user’s list of liked and disliked recipes is taken as input and fitted into the model. The model selects the K recipes with the smallest cosine distance each of the liked recipes and returns a dictionary of recipe-distance pairs.
Similar approach to CF. Instead of calculating cosine similarity in terms of user-recipe interactions, this is done via item features. Items that have similar features will, therefore, have a smaller distance than items with completely different features. In recommendation, a user’s items are taken as input and items with similar features are returned. However, recommendations with similar features are not always quality recommendations.
In the case of recipes, the extracted features at our disposal are ingredients. As above, we train a KNN model on a larger dataset of 150,000+ recipes. In the model input side, each row in our dataset represents a recipe and each column represents an ingredient. Then, for every recipe in our list of liked recipes we get the K most similar recipes in terms of ingredients. The output is a dictionary of recipe-distance pairs.
Although our collaborative filtering approach finds similar users on the basis of recipes they have liked, it does not capture fine-grain information on the food ingredients that users tend to like. To mend this, we bridge the two aforementioned models by creating a third, Taste Model. This is done by feature-engineering taste profiles for every user, resulting in a matrix where every row represents a unique user and each column an ingredient. The value for user i at ingredient k represents their “score” for the ingredient. This is calculated by adding one point for each time the ingredient k is found in a recipe that they like and, conversely, subtracting one point for each time it is found in a recipe they dislike.
Using a KNN approach we find the K users with the most similar ingredient preferences to our input user. Then, for each similar user, we calculate the minimum distance between each of their liked recipes and each of the input user’s liked recipes. The collection of recipes liked by similar users is returned in a recipe-distance dictionary.
We understand that our user base may have different allergies or ingredients they don’t like that do not all fit in one list. For this reason, we allow users to type ingredients they can’t eat in a text box. We then use a fuzzy matching algorithm to remove recipes that contain that ingredient from our recommendations. Our filtering function also takes into account a collection of dietary preferences, such as vegetarian or vegan that the user can select when signing up for their account.
Note: The product website runs on a lighter version of our dataset which includes ~ 16,000 recipes