| 181 | 1 | 284 |
| 下载次数 | 被引频次 | 阅读次数 |
频繁项集挖掘是数据挖掘领域的核心任务之一,其目标是发现在数据库中频繁出现的模式。这些模式对于关联规则、分类、异常检测等多个数据挖掘任务都具有重要作用。由于随着项集大小的增加,项集的组合数量呈指数级增长,导致计算复杂性急剧上升,研究人员一直在努力开发高效的算法来解决这一问题。面向频繁项集挖掘的算法、紧凑表示和前沿应用,深入探讨不同技术的的工作原理、优势和局限性,从而对这一领域的研究现状进行全面总结。最后,进一步探讨了该领域的前沿发展趋势,指出计算效率、基于约束的频繁项集挖掘、模式的可解释性以及算法在不同领域的创新应用等未来潜在研究方向。
Abstract:Frequent itemset mining is one of the core tasks in the field of data mining, aiming to discover patterns that frequently occur in a database. These patterns play a crucial role in various data mining tasks such as association rule discovery, classification, and anomaly detection and so on. As the size of the itemset increases, the number of combinations of itemset increases exponentially, leading to a sharp increase in computational complexity. Researchers have been working hard to develop efficient algorithms to solve this problem. This study focuses on algorithms,compact representations, and cutting-edge applications for frequent itemset mining, exploring the working principles,advantages, and limitations of different technologies in depth, in order to comprehensively summarize the research status in this field. Finally, this study further discusses the frontier development trend in this field, and points out the future potential research directions, such as computational efficiency, constraint-based frequent itemset mining,interpretability of patterns and innovative applications of algorithms in different fields.
[1] Luna J M, Fournier-Viger P, Ventura S. Frequent itemset mining:A 25 years review[J]. WIREs Data Mining and Knowledge Discovery, 2019, 9(6):1329.
[2] Han J, Cheng H, Xin D, et al. Frequent pattern mining:current status and future directions[J]. Data Mining and Knowledge Discovery, 2007, 15(1):55-86.
[3] Agrawal R, Imieliński T, Swami A. Mining association rules between sets of items in large databases[C]//Proceedings of the 1993 ACM SIGMOD international conference on Management of data. New York, NY, USA:Association for Computing Machinery, 1993:207-216.
[4] De Graaf J M, De Menezes R X, Boer J M, et al. Frequent Itemsets for Genomic Profiling[C]. Computational Life Sciences.Berlin, Heidelberg:Springer Berlin Heidelberg,2005:104-116.
[5] Ilayaraja M, Meyyappan T. Mining medical data to identify frequent diseases using Apriori algorithm[C].2013International Conference on Pattern Recognition. Informatics and Mobile Engineering. 2013:194-199.
[6]郑会,何静,李鹏.医疗诊断与预测中的增量式Apriori方法研究[J].计算机时代, 2021(8):53-56.
[7] Aggarwal C C, Bhuiyan M A, Hasan M A. Frequent Pattern Mining Algorithms:A Survey[M]. Cham:Springer International Publishing, 2014:19-64.
[8]韩家炜.数据挖掘:概念与技术[M].北京:机械工业出版社, 2012.
[9] Agrawal R, Srikant R. Fast algorithms for mining association rules[C].Proc. 20th int conf. Santiago:very large data bases,VLDB. 1994:487-499.
[10] Solanki S, Soni N. A survey on frequent pattern mining methods Apriori, Eclat, FP growth[J]. International Journal of Computer Techniques, 2013, 10(X):86-100.
[11] Mannila H, Toivonen H, Verkamo A I. Efficient algorithms for discovering association rules[C].Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining. Seattle, WA:AAAI Press, 1994:181-192.
[12] Toivonen H. Sampling Large Databases for Association Rules[C].Proceedings of the 22th International Conference on Very Large Data Bases. San Francisco, CA, USA:Morgan Kaufmann Publishers Inc., 1996:134-145.
[13] Savasere A, Omiecinski E, Navathe S B. An Efficient Algorithm for Mining Association Rules in Large Databases[C].Proceedings of the 21th International Conference on Very Large Data Bases. San Francisco, CA,USA:Morgan Kaufmann Publishers Inc., 1995:432-444.
[14] Lucchese C, Orlando S, Perego R. DCI Closed:A Fast and Memory Efficient Algorithm to Mine Frequent Closed Itemsets[C].Proceedings of the 2nd Workshop Frequent Item Set Mining Implementations. Brighton, UK:CEUR-WS.org,2004:3.
[15] Du J, Zhang X, Zhang H, et al. Research and improvement of Apriori algorithm[C].2016 Sixth International Conference on Information Science and Technology(ICIST). IEEE, 2016:117-121.
[16] Xiao M, Yin Y, Zhou Y, et al. Research on improvement of apriori algorithm based on marked transaction compression[C].2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference(IAEAC). IEEE, 2017:1067-1071.
[17] Zhang K, Liu J, Chai Y, et al. A method to optimize apriori algorithm for frequent items mining[C].2014 Seventh International Symposium on Computational Intelligence and Design. IEEE, 2014:71-75.
[18] Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation[J]. ACM SIGMOD Record, 2000, 29(2):1-12.
[19] Grahne G, Zhu J. Fast algorithms for frequent itemset mining using FP-trees[J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(10):1347-1362.
[20]谭军,卜英勇,杨勃.一种基于FP阵列技术的频繁模式挖掘算法[J].计算机科学, 2009, 36(7):208-210.
[21]李也白,唐辉,张淳,等.基于改进的FP-tree的频繁模式挖掘算法[J].计算机应用, 2011, 31(1):101-103.
[22] Zaki M J. Scalable algorithms for association mining[J]. IEEE Transactions on Knowledge and Data Engineering, 2000,12(3):372-390.
[23] Borgelt C. Frequent item set mining[J]. WIREs Data Mining and Knowledge Discovery, 2012, 2(6):437-456.
[24]冯培恩,刘屿,邱清盈,等.提高Eclat算法效率的策略[J].浙江大学学报(工学版), 2013, 47(2):223-230.
[25] Zaki M J, Gouda K. Fast vertical mining using diffsets[C].Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data Mining. New York, NY, USA:Association for Computing Machinery, 2003:326-335.
[26] Pei J, Han J, Lu H, et al. H-mine:hyper-structure mining of frequent patterns in large databases[C].Proceedings 2001IEEE International Conference on Data Mining. 2001:441-448.
[27] Pyun G, Yun U, Ryu K H. Efficient frequent pattern mining based on Linear Prefix tree[J]. Knowledge-Based Systems,2014(55):125-139.
[28] Chee C-H, Jaafar J, Aziz I A, et al. Algorithms for frequent itemset mining:a literature review[J]. Artificial Intelligence Review, 2019, 52(4):2603-2621.
[29] Sinthuja M, Puviarasan N, Aruna P. Mining Frequent Itemsets Using Proposed Top-Down Approach Based on Linear Prefix Tree(TD-LP-Growth)[C]. International Conference on Computer Networks and Communication Technologies. Singapore:Springer, 2019:23-32.
[30] Deng Z, Wang Z, Jiang J. A new algorithm for fast mining frequent itemsets using N-lists[J]. Science China Information Sciences, 2012, 55(9):2008-2030.
[31] Lin MY, Lee PY, Hsueh SC. Apriori-based frequent itemset mining algorithms on MapReduce[C].Proceedings of the6th International Conference on Ubiquitous Information Management and Communication. New York, NY, USA:Association for Computing Machinery, 2012:1-8.
[32] Yahya O, Hegazy O, Ezat E. An efficient implementation of Apriori algorithm based on Hadoop-Mapreduce model[J]. Int J Rev Comput, 2012(12):5.
[33] Qiu H, Gu R, Yuan C, et al. Yafim:a parallel frequent itemset mining algorithm with spark[C].2014 IEEE international parallel&distributed processing symposium Workshops.IEEE, 2014:1664-1671.
[34] Li H, Wang Y, Zhang D, et al. Pfp:parallel fp-growth for query recommendation[C].Proceedings of the 2008 ACM conference on Recommender Systems.New York, NY, USA:Association for Computing Machinery, 2008:107-114.
[35] Shi X, Chen S, Yang H. DFPS:Distributed FP-growth algorithm based on Spark[C].2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference(IAEAC). 2017:1725-1731.
[36] Cai Z, Zhu X, Zheng Y, et al. A Caching-Based Parallel FPGrowth in Apache Spark[C]. Algorithms and Architectures for Parallel Processing. Cham:Springer International Publishing,2018:519-533.
[37] Moens S, Aksehirli E, Goethals B. Frequent Itemset Mining for Big Data[C].2013 IEEE International Conference on Big Data. 2013:111-118.
[38] Feng X, Zhao J, Zhang Z. MapReduce-Based H-Mine Algorithm[C].very large data bases, VLDB.2015 Fifth International Conference on Instrumentation and Measurement, Computer, Communication and Control(IMCCC). 2015:1755-1760.
[39] Sohrabi M K, Taheri N. A haoop-based parallel mining of frequent itemsets using N-Lists[J]. Journal of the Chinese Institute of Engineers, 2018, 41(3):229-238.
[40] Bayardo R J. Efficiently mining long patterns from databases[C].Proceedings of the 1998 ACM SIGMOD international conference on Management of Data. New York,NY, USA:Association for Computing Machinery, 1998:85-93.
[41] Burdick D, Calimlim M, Flannick J, et al. MAFIA:a maximal frequent itemset algorithm[J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(11):1490-1504.
[42] Gouda K, Zaki M J. GenMax:An Efficient Algorithm for Mining Maximal Frequent Itemsets[J]. Data Mining and Knowledge Discovery, 2005, 11(3):223-242.
[43] Pasquier N, Bastide Y, Taouil R, et al. Discovering frequent closed itemsets for association rules[C].Database Theory—ICDT’99:7th International Conference Jerusalem,Israel:Springer, 1999:398-416.
[44] Zaki M J, Hsiao CJ. An Efficient Algorithm for Closed Itemset Mining[C].Proceedings of the 2002 SIAM International Conference on Data Mining(SDM). Society for Industrial and Applied Mathematics, 2002:457-473.
[45] Pei J, Han J, Mao R. CLOSET:An efficient algorithm for mining frequent closed itemsets.[C].ACM SIGMOD workshop on research issues in data mining and knowledge Discovery.2000, 4(2):21-30.
[46] Wang J, Han J, Pei J. CLOSET+:searching for the best strategies for mining frequent closed itemsets[C].Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data Mining. New York, NY, USA:Association for Computing Machinery, 2003:236-245.
[47] Lucchese C, Orlando S, Perego R. DCI Closed:A Fast and Memory Efficient Algorithm to Mine Frequent Closed Itemsets.[C].Fimi. 2004.
[48] Carmona-Saez P, Chagoyen M, Rodriguez A, et al. Integrated analysis of gene expression by association rules discovery[J].BMC Bioinformatics, 2006, 7(1):54.
[49] Rodríguez A, Carazo J M, Trelles O. Mining association rules from biological databases[J]. Journal of the American Society for Information Science and Technology, 2005, 56(5):493-504.
[50] Mukhopadhyay A, Maulik U, Bandyopadhyay S. A Novel Biclustering Approach to Association Rule Mining for Predicting HIV-1–Human Protein Interactions[J]. PLOS ONE, 2012, 7(4):e32289.
[51] Fu W, Sanders-Beer B E, Katz K S, et al. Human immunodeficiency virus type 1, human protein interaction database at NCBI[J]. Nucleic acids research, 2009,37(sup_1):417-422.
[52] Bebek G, Yang J. PathFinder:mining signal transduction pathway segments from protein-protein interaction networks[J]. BMC Bioinformatics, 2007, 8(1):335.
[53] Ramezankhani A, Pournik O, Shahrabi J, et al. An Application of Association Rule Mining to Extract Risk Pattern for Type 2 Diabetes Using Tehran Lipid and Glucose Study Database[J]. International Journal of Endocrinology and Metabolism, 2015, 13(2):25389.
[54]王立军,宋余庆,谢从华,等.基于二叉频繁模式树的医学图像关联规则挖掘[J].计算机工程与应用, 2006(13):182-184+229.
[55] Nguyen H, Liu W, Chen F. Discovering Congestion Propagation Patterns in Spatio-Temporal Traffic Data[J].IEEE Transactions on Big Data, 2017, 3(2):169-180.
[56] Inoue R, Miyashita A, Sugita M. Mining spatio-temporal patterns of congested traffic in urban areas from traffic sensor data[C].2016 IEEE 19th International Conference on Intelligent Transportation Systems(ITSC). 2016:731-736.
[57]时宇杰.数据分析在道路交通事故中的研究与应用[D].杭州:浙江工业大学, 2018.
[58]刘云翔,韩贝.基于改进FP算法的隧道交通事故关联分析[J].现代电子技术, 2018, 41(17):141-144.
[59]王霄,纪龙杰,陈潘曦.基于关联规则的酒醉驾交通事故成因分析[J].广东公安科技, 2021, 29(1):44-47.
[60] John M, Shaiba H. Apriori-Based Algorithm for Dubai Road Accident Analysis[J]. Procedia Computer Science, 2019, 163:218-227.
[61]朱兴动,章思宇,王正.飞机故障维修记录关联规则挖掘方法[J].兵器装备工程学报, 2019, 40(7):164-169.
[62]杨琦,李卫国.断路器故障诊断系统中关联规则的提取[J].陕西电力, 2008(7):5-8.
[63]张春,郭玉霞.一种基于改进FP-Growth算法的动车组故障预测研究[J].铁路计算机应用, 2017, 26(12):1-4.
[64]张衡,王大勇,宋朋.改进FP-Growth算法下云服务器故障诊断研究[J].计算机仿真, 2022, 39(12):373-377.
[65] Xu Y, Zhang J, Xiongwei W, et al. Research on optimization of crane fault predictive control system based on data mining[J]. Nonlinear Engineering, 2023, 12(1):15-26.
[66] Xu Y, Wang M, Fan W. Defect Data Association Analysis of the Secondary System Based on AFWA-H-Mine[J].Energies, 2021, 14(14):4228.
[67] Qian K, Gao S, Yu L. Marginal frequent itemset mining for fault prevention of railway overhead contact system[J]. ISA Transactions, 2022, 126:276-287.
[68] Fu X, Budzik J, Hammond K J. Mining Navigation History for Recommendation[C].Proceedings of the 5th International Conference on Intelligent User Interfaces. New York, NY,USA:Association for Computing Machinery, 2000:106-112.
[69]唐灿,唐亮贵,刘波.一个面向新兴趣点发现的模糊兴趣挖掘算法[J].计算机科学, 2007(6):204-206.
[70] Wang F-H, Shao H-M. Effective personalized recommendation based on time-framed navigation clustering and association mining[J]. Expert Systems with Applications,2004, 27(3):365-377.
[71] Liu Z, Ma Y, Zheng H, et al. Human resource recommendation algorithm based on improved frequent itemset mining[J]. Future Generation Computer Systems,2022, 126:284-288.
[72]张雷,董万富,阚欢迎,等.基于改进Apriori算法的客户需求数据分析方法[J].机械设计与制造, 2020(5):185-188.
[73] Ghorashi S H, Ibrahim R, Noekhah S, et al. A frequent pattern mining algorithm for feature extraction of customer reviews[J]. International Journal of Computer Science Issues(IJCSI), 2012, 9(4):29.
[74] Cheng H, Yan X, Han J, et al. Discriminative Frequent Pattern Analysis for Effective Classification[C].2007 IEEE23rd International Conference on Data Engineering. 2007:716-725.
[75] Fernando B, Fromont E, Tuytelaars T. Effective Use of Frequent Itemset Mining for Image Classification[C].Computer Vision–ECCV 2012. Berlin, Heidelberg:Springer, 2012:214-227.
[76] Liu M, Ye Y, Jiang J, et al. MANIEA:a microbial association network inference method based on improved Eclat association rule mining algorithm[J]. Bioinformatics, 2021,37(20):3569-3578.
[77] Bonchi F, Lucchese C. Extending the state-of-the-art of constraint-based pattern discovery[J]. Data&Knowledge Engineering, 2007, 60(2):377-399.
[78] Leung C K-S, Carmichael C L. FpViz:a visualizer for frequent pattern mining[C].Proceedings of the ACM SIGKDD Workshop on Visual Analytics and Knowledge Discovery:Integrating Automated Analysis with Interactive Exploration.New York, NY, USA:Association for Computing Machinery,2009:30-39.
基本信息:
中图分类号:TP311.13
引用信息:
[1]张晴,谭旭,吕欣.频繁项集挖掘研究前沿及展望[J].深圳信息职业技术学院学报,2024,22(01):1-14.
基金信息:
广东省普通高校创新团队及特色创新项目(项目编号:2020KCXTD040,2020KTSCX302)
2024-02-15
2024-02-15