Apriori Algorithm - Classical algorithm for data mining

June 27, 2017, 9:56 a.m. By: Vishakha Jha

Apriori Algorithm

Apriori is an unsupervised algorithm used for frequent item set mining. It generates associated rules from given data set and uses 'bottom-up' approach where frequently used subsets are extended one at a time and algorithm terminates when no further extension could be carried forward.

Apriori algorithm works on its two basic principles, first that if an itemset occurs frequently then all subset of itemset occurs frequently and other is that if an itemset occurs infrequently then all superset has infrequently occurrences.

It helps to reduce the number of possible interesting itemsets and the minimum support level required by an algorithm is just the input and data set. It is one of the easiest to implement and can be parallelized easily. It makes the use of large item set properties though it also suffers from a number of inefficiencies which have resulted in the production of other algorithms. The algorithm needs to rescan dataset after each time of increasing the length of frequent item set resulting in reducing the speed. It is also expensive to calculate as it has to examine entire database.

The algorithm is efficient for Market Basket analysis and helps to increase market sale by assisting customers during the purchase of the item. It is also applicable in the field of health care for detecting drug reactions. It analyses and produces association rule which identifies adverse drug effect through patient characteristic and medication.

Another one of the most popular application is Google Auto-complete in which the search engine suggests the other associated words according to your specified word. It is also used in Amazon recommendation system. Python implementation for Apriori is through PyPi and in R through arules.

More Information: A beginner's tutorial on the apriori algorithm in data mining with R implementation