Concepts behind Association Rule Mining

Aug. 3, 2017, 8:35 p.m. By: Vishakha Jha

Association Rule

Machine Learning provides us with a large number of Algorithms some of which are used for data mining whereas some are too mathematical. But Association Rule is quite favourable in terms of non-numeric or categorical data. Association rule learning is one of the most well-renowned methods to uncover the association between variables in large databases. The rule focuses on looking for frequent co-occurring associations among a collection of random items. It does not focus on the sequence of the item either during or across a transaction.

In order to select interesting rules among all existing ones, many constraints are applied. There are certain important concepts related to Associative Rule Learning such as Support, Confidence and Lift. The best and widely known constraints are minimum thresholds on support and confidence.

  1. Support provides us with an estimation of how often the itemset appears in the dataset.

  2. Confidence gives us an indication of how many times the rule analysed has been found to be accurate.

  3. The lift of a rule represents the ratio of the discovered support to that expected if X and Y are independent.

To achieve more efficient results we need to reduce the itemset and for this we apply Apriori algorithm whose principle states that if an itemset is infrequent, then all its subsets must also be infrequent.

The association rule mining has a large number of applications. MapReduce turns out to be efficient for this algorithm and can also guide us to some new learnings. The rule is applied to into vast variety of fields including bioinformatics, Web usage mining and intrusion detection.

It is preferred for basket data analysis which brings us to analyse purchased items in a single basket. It is also considered crucial for Cross marketing and Catalogue designing. Cross marketing refers to working with the organisation that could complement your business and help to expand it whereas Catalogue design refers to the selecting items in such a manner that purchase of one item could lead to the purchase of other.

Association rules in data mining are resourceful for examining and predicting customer conduct. Programmers require association rules for the construction of programs that are capable of machine learning. In a similar manner, the rule has been applied to a number of scenarios and turns out to be beneficial.

Example: Association rule for market basket analysis

Video Source: Vamsidhar Ambatipudi