Association Rule Learning — Apriori Algorithm!

This blog aims to explain the concept of the Apriori Algorithm in detail with examples & a project!

Harshit Dawar
5 min readFeb 20, 2022
Image by Author!

This article has a pre-requisite of Association Rule Learning topic. If you aren’t aware of the same, you can refer to the link mentioned below:

Apriori Algorithm is a very famous Association Rule Learning Algorithm, using which, any product-based business can gain a boom.

This is enough suspense for the algorithm, let’s discuss it.

Apriori Algorithm!

This algorithm is based on the 3 different entities, when combined, produces an insight that is used in the businesses.

This algorithm has some prior knowledge using which it fetches the insights to be used in businesses, that is why this algorithm name is Apriori.

The entities on which this algorithm depends are

  1. Support
  2. Confidence
  3. Lift

To understand all of the above-mentioned entities in detail, let’s take a real example of a grocery store, on top of the data collected by this grocery store, we will apply all of the above-mentioned entities (Apriori Algorithm) & then fetch the insights.

Scenario: A grocery store has collected the sales data for a month. Let’s say there is a total of 1 lakh (0.1 million) transactions in the data. Now, the store want to identify the items in such a way that, let’s say there is any item “a”, which is the other item “b” that is most probably purchased by the majority of the customers with item “a”. In other words, if a customer is buying item “a”, then which other item “b” is been purchased by the same customer in a single transaction. Logically, this operation should not be done on all the items from the grocery store, this operation should be done on the items that are actually sold ≤ some threshold per day. For example, there might be a chance someone purchased 2 items only once in a month, so, this item is not been sold much, therefore, it will be a useless transaction is considered. Further decisions based on this logic in mind, are discussed in the entities explained below.

Support

This is a simple entity that signifies how much something is popular as a whole. With respect to business, how many times an item is sold in a given interval of time.

Its formula is:

“Number of transactions in which an item “a” is sold / Total number of transactions”

Based on the scenario explained above, let’s say, an item “a” is sold 500 times in the total time span in which the data is collected.

Therefore, its support will be = 500 / 1 Lakh = 0.005

Confidence

This is an entity that involves the other item “b” with the item “a”. This entity signifies the strength of involvement of item “b” with the item “a”.

Its formula is:

“Number of transactions in which the items “a” & “b” is sold / Total number of transactions in which item “a” is sold”

Based on the scenario explained above, let’s say, the total number of transactions in which items “a” & “b” are sold is 300, The total number of transactions in which item “a” is sold is 500 (already mentioned in ‘support’ entity section above).

Therefore, its confidence will be = 300 / 500 = 0.6

Lift

This entity is the most important one, this gives the actual insight (how? explained below)

Its formula is:

Confidence of items “a” & “b” / Support of item “a”

Based on the above calculations, lift will be = 0.6 / 0.005 = 120. This is a huge lift, in general, you will not see this much huge value of the lift, this is just a hypothetical example, that is why this value is there.

Explanation of insight from lift: Let’s say the item is “a” is “Pasta” & item “b” is “white sauce” or “red sauce”. From the grocery store data, we find out the support for item “a” that is “Pasta”, & we find out the confidence value for the people who are buying any of the sauce, how many of them are actually buying pasta with it. Therefore, in general, if we only recommend pasta, it is been sold, it is fine, but, if we recommend pasta to a person who is buying sauce, then how much our recommendation will be beneficial for us (obviosuly, the shopkeeper has to make a combo deal in such a way that its price is profitable), this is been told by the entitiy ‘lift’. The higher the value of lift is, the more chance is there of the clubbing of the items are right & beneficial.

Here, the prior knowledge of the selling % of an item alone & with the other item(s) is present with the algorithm, that is why it is known as Apriori Algorithm.

Important Note:

This algorithm is not limited to just finding the clubbing of the best two items, one can use it for any number of items, the groups of the items will be automatically be formed by the algorithm & then the result will be provided.

This algorithm is not there in the “sklearn” module in python, one has to install a library/module known as “apyori” in python, inside this library, the “apriori” algorithm is been implemented that is ready for use.

By gaining the insights from this algorithm, many businesses are already been benefitted & you can also leverage the power of this in any way you want.

Project Link

If you are interested in checking out the implementation of the algorithm in a project, do check out the link given below of my GitHub repository containing the code of the project.

I hope my article explains each and everything related to the topic with all the detailed concepts and explanations. Thank you so much for investing your time in reading my blog & boosting your knowledge. If you like my work, then I request you to applaud this blog & follow me on Medium, GitHub, & LinkedIn for the more amazing content on multiple technologies and their integration!

Also, subscribe to me on Medium to get the update of all my blogs!

--

--

Harshit Dawar

AIOPS Engineer, have a demonstrated history of delivering large and complex projects. 14x Globally Certified. Rare & authentic content publisher.