Recommendation System for Grocery Store

4 min readOct 23, 2019

Introduction

Recommender systems are one of the most successful and widespread applications of machine learning technologies in business. You can find large-scale recommender systems in retail, video on demand, or music streaming services. In order to develop and maintain such systems, a company typically needs a group of experienced data scientists and engineers.

To build a recommendation system for a grocery store to find out what product a customer is going to purchase is quite complicated. In the case of this grocery store, the same customers can buy a different basket of products each time. Analyzing the cases to solve this problem in this recommendation system we cannot simply feed data in a machine learning model for the recommendation. This recommendation system is not actually like other recommendation systems.

Data processing

4000 customers were selected who usually visit every month. With a higher number of visits, we could get a higher number of baskets of that customer. The data that was taken for the recommendation was the unique products in baskets in each specific transaction. There are more than 90 unique sections in the overall data. Association rule is applied in this system to identify the next section that is in the basket. The main reason to select this algorithm instead of other machine learning algorithms is to give priority to the customer choice that can be identified using past transactions and baskets. If this algorithm fails to identify the next section using the two combinations of sections then 5 similar customers are selected and the next section is identified using their baskets and transaction. K-Nearest Neighbour algorithm is used to select similar customers.

Association Rule:

Association rule mining is a technique to identify underlying relations between different items. Take the example of a Supermarket where customers can buy a variety of items. Usually, there is a pattern in what the customers buy. For instance, mothers with babies buy baby products such as milk and diapers. Damsels may buy makeup items whereas bachelors may buy beers and chips etc. In short, transactions involve a pattern. More profit can be generated if the relationship between the items purchased in different transactions can be identified.

For instance, if items A and B were bought together more frequently then several steps can be taken to increase the profit. For example:

A and B can be placed together so that when a customer buys one of the products he doesn’t have to go far away to buy the other product.
People who buy one of the products can be targeted through an advertisement campaign to buy the other.
Collective discounts can be offered on these products if the customer buys both of them.
Both A and B can be packaged together.

The process of identifying an association between products is called association rule mining.

K-Nearest Neighbour Algorithm:

In pattern recognition, the k-nearest neighbor algorithm (k-NN) is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression:

In k-NN classification, the output is a class membership. An object is classified by a plurality vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor.
In k-NN regression, the output is the property value for the object. This value is the average of the values of k nearest neighbors.

Flowchart

Two combinations of the section are given to specific customers.

For example: { Noodles & Soup, Beverage}

All the basket of that customer is extracted.
Using the Association rule find two sections with higher confidence.
If the sections are found, the product from those sections is recommended, Otherwise, 5 similar customers are selected, and using the confidence from Association rule 2 sections are suggested.

Output

Result

Among 3000 customers almost 2000 customers were scored more than 0 which means,

Score 0: No section matched

Score 1: 1 section matched

Score 3: 3 section matched

Conclusion

Well, this recommendation system result is appreciable as it guessed 70% of sections in the basket. This can be further optimized by understanding and analyzing more about section recommendations.