R U G
Transcription
R U G
Journal of Research in Electrical and Electronics Engineering (ISTP-JREEE) REDUCTION OF UNWANTED RULES BY GENETIC ALGORITHM USING DISTANCE WEIGHT VECTOR Sanjay Tiwari, Mahainder Kumar Rao, HOD, Department of Information Technology, AIET,Jaipur,,India, M.Tech, Department of Computer Science and Engineering,AIET,Jaipur,India, Abstract Three different kinds of the novel enhanced genetic algorithm procedures including the hybrid genetic algorithm, interval genetic algorithm and hybrid interval genetic algorithm are respectively presented. As the results of the proven systems show, the hybrid genetic algorithm can determines the better optimum design than the traditional optimization algorithms and genetic algorithm. The classical association rule mining generate rule with several problem such as pruning pass of transaction database, negative rule generation and supremacy of rule set. Time to time various researchers modified classical association rule mining with different approach. But in current scenario association rule mining suffered from supremacy rule generation. In this dissertation we proposed a new algorithm distance weight optimization of association rule mining. In this method we find the near distance of rule set using equalize distance formula and generate two class higher class and lower class .the validate of class check by distance weight vector. Basically distance weight vector maintain a threshold value of rule item set. In whole process we used genetic algorithm for optimization of rule set. Keywords Data Mining, KNN, GAPO-ARM, Genetic Algorithm, MORA. I. Introduction Optimization is the process of making something better. In engineering, optimization algorithms have been extensively developed and well used in all respects for a long time. An engineer or a scientist conjures up a new idea and optimization improves on that idea. Optimization consists in trying variations on an initial concept and using the information gained to improve on the idea. Many optimization problems from the industrial engineering world, in particular the manufacturing systems, are very complex in nature and quite hard to solve by conventional optimization techniques. Genetic Algorithm (GA) is one of the optimization algorithms, which is invented to mimic some of the processes observed in natural evolution. The Genetic Algorithm is stochastic search techniques based on the mechanism of natural selection and natural genetics. That is a general one, capable of being applied to an extremely wide range of problems. The GA, differing from conventional search techniques, start with an nitial set of random solutions called population. Each individual in the population is called a chromosome, some measures of fitness. To create the next generation, new chromosomes, called offspring, are form by either merging two chromosomes form current generation using a crossover operator or modifying a chromosome using a mutation operator. A new generation is form by selecting, according to the fitness values, some of the parents and offspring; and rejecting others so as to keep the population size constant. Fitter chromosomes have higher probabilities of being selected. After several generations, the algorithms converge to the best chromosome, which hopefully represents the optimum or suboptimal solution to the problem. The GA has received considerable attention regarding their potential as a novel optimization technique. There are three major advantages when applying the GA to optimization problems. ● The GA do not have much mathematical requirements about the optimization problems. Due to their evolutionary nature, the GA will search for solutions without regard to the specific inner workings of the problem. ● The evolution operators make GA effective at performing global search. The traditional approaches perform local search by convergent stepwise procedure, which compares the values of nearby points and moves to the relative optimal points. Global optimum can be found only if the problem possesses certain convexity properties that essentially guarantee that any local optimum is a global one. ● GA provide a great flexibility to hybridize with domain dependent heuristics to make an efficient implementation for a specific problem. In the above statement indicate, the GA have much advantages. Even though the GA can locate the solution in the whole domain, it does not solve complexes constraint problems easily, especially for exact constraints. And huge evaluations for generations and populations sometime are time-consuming. To account some of the defects and employ the advantages of the GA, the enhanced GA is proposed and applied for the optimization design. II. Proposed Method We proposed a novel algorithm for optimization representing a solution to problem at hand. The chromosomes association rule mining, the proposed algorithm resolve evolve through successive iterations, called generations. problem of negative rule generation and also optimized During each generation, the chromosomes are evaluated, using ISSN: 2321-2667 Volume 3, Issue 2, March 2014 of the the 52 Journal of Research in Electrical and Electronics Engineering (ISTP-JREEE) process of supremacy of rules[2]. Supremacy of association rule mining is a great challenge for large dataset. In the generation of supremacy of rules association existing algorithm or method generate a series of negative rules, which generated rule affected a performance of association rule mining. In the process of rule generation various multi objective association rule mining algorithm are proposed but all these are not solve supremacy problem of association rule mining. In this algorithm we used second odder quadratic equation and nearest neighbour classification technique for the selection of set of candidate of supremacy of key for generation of rules. In the generation of rule selection of support value of transaction data set is play a important role , for this role we used heuristic search algorithm for better searching of support value for generation of optimized association rule. III. Genetic Algorithm Genetic Algorithm (GA), first introduced by John Holland in the early seventies, is the powerful stochastic algorithm based on the principles of natural selection and natural genetics, which has been quite successfully, applied in machine learning and optimization problems. To solve a Problem, a GA maintains a population of individuals (also called strings or chromosomes) and probabilistically modifies the population by some genetic, operators such as selection, crossover and mutation, with the intent of seeking a near optimal solution to the problem. Coding to Strings in GA[5,6], each individual in a population is usually coded as coded as a fixed-length binary string. The length of the string depends on the domain of the parameters and the required precision. For example, if the domain of the parameter x is [2,5] and the precision requirement is six places after the decimal point, then the domain [2,5] should be divided into 7,000,000 equal size ranges. The implies that the length of the string requires to be 23, for the reason that 4194304=222<7000000<223=8388608 the decoding from a binary string <b22b21…b0> into a real number is straightforward and is completed in two steps. (1). Convert the binary string <b22b21…b0> from the base 10 by The initial process is quite simple. We create a population of individuals, where individual in a population is a binary string with a fixed-length, and every bit of the binary string is initialized randomly [5]. B. Evaluation In each generation for which the GA is run, each individual in the population is evaluated against the unknown environment. The fitness values are associated with the values of objective function. C. Genetic Operators Genetic operators drive the evolutionary process of a population in GA, after the Darwinian principle of survival of the fittest and naturally occurring genetic operations. The most widely used genetic operators are reproduction, crossover and mutation. To perform genetic operators, one must select individuals in the population to be operated on .The selection strategy is chiefly based on the fitness level of the individuals actually presented in the population [7]. Let us illustrate these three genetic operators. As an individual is selected, reproduction operators only copy it from the current population into the new population (i.e., the new generation) without alternation. The crossover operator starts with two selected individuals and then the crossover point (an integer between 1 and L-1, where L is the length of strings) is selected randomly. Assuming the two parental individuals are x1 and x2, and the crossover point is 5 (L=20). If X1= (01001|101100001000101) X2= (11010|011100000010000) Then the two resulting offspring are X’1= (01001|011100000010000) X’2= (11010|101100001000101) The third genetic operator, mutation, introduces random changes in structures in the population, and it may occasionally have beneficial results: escaping from a local optimum. In our GA, mutation is just to negate every bit of the strings, i.e., changes a 1 to 0 and vice versa, with probability pm. IV. Proposed Algorithm …………………………1 (2). Calculate the corresponding real number x by ……………..2 A. Initial Population ISSN: 2321-2667 GAPO-ARM The proposed algorithm is a combination of support weight value and near distance of superior candidate key. Support weight key is a vector value given by the transaction data set. The support value passes as a vector for finding a near distance between superior candidate key. When we finding a higher candidate key the nearest distance divides into two classes, one class take a higher order value and another class gain lower value for the rule generation process. The process of selection of class also reduces the passes of the data set dividing. After finding a class of lower and higher of giving support value, compare the value of distance weight vector. Volume 3, Issue 2, March 2014 53 Journal of Research in Electrical and Electronics Engineering (ISTP-JREEE) Here distance weight vector work as a fitness function for selection process of genetic algorithm. Here we present the steps of process of algorithm step by step and finally draw a flow chart of complete process. processed to reduce negative rules and used to find the positive rules and insignificant data points from the dataset. IV. Implementation Experimental Results and To investigate the effectiveness of the proposed method implemented on Mat lab 7.8.0.347(R2009a). In the research work, I have measured execution time of the rule generation procedure and no. of rule generation. To evaluate these performance parameters I have used the Wine dataset and the Chess data set datasets from UCI machine learning repository [10]. Experimental result demonstrates that the proposed GAPO-ARM algorithm reduces the execution time of MORA algorithm. It also improves rule generation process and accuracy for all five dataset than MORA. The experimental results also demonstrate that the proposed GAPO-ARM method is best then MORA in association rule mining algorithm. This makes using GAPO-ARM to process large datasets inefficient. The main purpose of the proposed GAPO-ARM algorithm is to reduce negative rule generation of rule generation procedure with positive rule generation and improve the ability of positive rule generation in dealing with various dataset problems. Figure 1: Architecture of our proposed algorithm. The whole process of proposed algorithm is shown in the architecture of the proposed algorithm in Figure 1. In the architecture model ellipse represents input or output and rectangle represent the calculation process. In architecture model, it is shown that as an input datasets from UCI machine learning repository [10] are used, which consists of various attributes and number of instances. The input dataset is pre- ISSN: 2321-2667 Figure 2: shows the comparative value of minimum support and execution time of MORA and GAPO-ARM algorithm for the processor extraction of rule. GAPOARM takes more time in comparison of sample MORA algorithm. GAPO-ARM algorithm can achieves this goal for small as well as large datasets and overcomes the drawback of MORA. Experimental result demonstrates that the proposed GAPO-ARM algorithm increase the execution time as comparison to the MORA algorithm but it reduce the rule generation of rule generation procedure. Volume 3, Issue 2, March 2014 54 Journal of Research in Electrical and Electronics Engineering (ISTP-JREEE) Figure 3: Shows the comparative value of data set and No. of rule generation of MORA and GAPO-ARM algorithm for the processor extraction of rule. GAPO-ARM generates less No. of rules in comparison of sample MORA algorithm. The experimental results demonstrate that the proposed GAPO-ARM method is best then MORA in association rule mining algorithm. Figure 4: Shows the comparative value of minimum support and execution time of MORA and GAPO-ARM algorithm for the processor extraction of rule. GAPOARM takes more time in comparison of sample MORA algorithm. Figure 5: Shows the comparative value of data set and No. of rule generation of MORA and GAPO-ARM algorithm for the processor extraction of rule. GAPO-ARM generates less No. of rules in comparison of sample MORA algorithm. V. Conclusions and Future Work We proposed a novel method for optimization of association rule mining. Our propped algorithm is combination of distance function and genetic algorithm. We have observed that when we modify the distance weight new rules in large numbers are found. This implies that when weight is solely determined through support and confidence, there is a high chance of eliminating interesting rules. With more rules emerging it implies there should be a mechanism for managing their large numbers. The large generated rule is optimized with genetic algorithm. We theoretically proofed a relation between locally large and globally large patterns that is used for local pruning at each site to reduce the searched candidates. We derived a locally large threshold using a globally set minimum recall threshold. Local pruning achieves a reduction in the number of searched candidates and this reduction has a proportional impact on the reduction of exchanged messages. Suggestion for future work, in the process of distance weight calculation large number of rule set generated that rule set take huge amount of time in comparison of MORA algorithm. In future we minimize the time complexity of our method. VI. References [1] By Rakesh Agrawal Tomasz Imielinski Arun Swami Mining Association Rules between Sets of Items in Large Databases ACM SIGMOD Conference Washington DC, USA, May 1993. ISSN: 2321-2667 [2] By Rakesh Agrawal Ramakrishnan Srikant_ Fast Algorithms for Mining Association RulesVLDB ConferenceSantiago, Chile, 1994. [3] By Ramakrishnan Srikant* Rakesh Agrawal Mining Generalized Association Rules VLDB Conference Zurich, Swizerland, 1995. Volume 3, Issue 2, March 2014 55 Journal of Research in Electrical and Electronics Engineering (ISTP-JREEE) [4] By N. Chaiyarataiia and A. M. S. Zalzala Recent Developments in Evolutionary and Genetic Algorithms: Theory and Applications Innovations and Applications, 2-4 September 1997, Conference Publication NO. 4 4 6 ,IEEE , 1997. [5] By Q. C. Meng , T.J. Feng I , 2. Chen I , C.J. Zhou , J.H. Bo2 Genetic Algorithms Encoding Study and A Sufficient Convergence Condition of GAS 0-7803-5731-0/9)sk$l0.00 0 IEEE 1999. [6] By Pengfei Guo Xuezhi Wang Yingshi Han The Enhanced Genetic Algorithms for the Optimization Design 978-1-42446498-2/10/$26.00 © IEEE 2010. [20] By Anirban Mukhopadhyay_,Ujjwal Mauliky,Sanghamitra Bandyopadhyayz and Roland Eils “Mining Association Rules from HIV-Human Protein Interactions “Systems in Medicine and Biology 16-18 December 2010, [7] By Masaya Yoshikawa and Hidekazu Terai A Hybrid Ant Colony Optimization Technique for Job-Shop Scheduling Problems Software Engineering Research, Management and Applications (SERA’06) 0-7695-2656-X/06 $20.00 © 2006. [21] By Peter P. Wakabi ,Waiswa ,Venansius Baryamureeba, Karunakaran and Sarukesi “Optimized Association Rule Mining with Genetic Algorithms” Natural Computation 978-14244-9953-3/11/$26.00 © IEEE 2011. [8] By Chi-Ren Shyu1,2, Matt Klaric1,2, Grant Scott1,2, and Wannapa Kay Mahamaneerat1 Knowledge Discovery by Mining Association Rules and Temporal-Spatial Information from Large-Scale Geospatial Image Databases 0-7803-95107/06/$20.00 © IEEE 2006. [22] By A.B.M.Rezbaul Islam, Tae-Sun Chung ” An Improved Frequent Pattern Tree Based Association Rule Mining Technique” 978-1-4244-9224-4/11/$26.00 ©IEEE 2011. [9] By LI Tong-yan, LI Xing-ming New Criterion for Mining Strong Association Rules in Unbalanced Events Intelligent Information Hiding and Multimedia Signal Processing 978-07695-3278-3/08 $25.00 © IEEE 2008. [18] By R. Uday Kiran and P. Krishna Reddy An Improved Multiple Minimum Support Based Approach to Mine Rare Association Rules 978-1-4244-2765-9/09/$25.00 © IEEE 2009. [19] By By Walter Christian Kammergruber, Maximilian Viermetz, Karsten Ehms ” Using Association Rules for Discovering Tag Bundles in Social Tagging Data” 978-1-4244-78187/10/$26.00_c IEEE 2010. [23] By Li-Min Tsai, Shu-Jing Lin, and Don-Lin Yang “Efficient Mining of Generalized Negative Association Rules” Granular Computing 978-0-7695-4161-7/10 $26.00 © IEEE 2010. [24] By Sandeep Singh Rawat, Lakshmi Rajamani “Probability Apriori based Approach to Mine Rare Association Rules” Data Mining and Optimization (DMO) 28-29 June 2011, Selangor, Malaysia 978-1-61284-212-7/11/$26.00 ©IEEE 2011. [10] By Zhibo Chen, Carlos Ordonez, Kai Zhao Comparing Reliability of Association Rules and OLAP Statistical Tests Data Mining Workshops 978-0-7695-3503-6/08 $25.00 © IEEE 2008. [25] [11] By Lijuan Zhou Linshuang Wang Xuebin Ge Qian Shi A Clustering-Based KNN Improved Algorithm CLKNN for Text Classification Informatics in Control, Automation and Robotics 978-1-4244-5194-4/10/$26.00 ©IEEE 2010 . By Sahar M. Ghanem Mona A. Mohamed Magdy H. Nagi EDP-ORD: Efficient Distributed/Parallel Optimal Rule Discovery 978-1-4577-0681-3/11/$26.00 © IEEE 2011. [26] [12] By XING Xue CHEN Yao WANG Yan-en Study on Mining Theories of Association Rules and Its Application Information Technology and Ocean Engineering 978-0-7695-3942-3/10 $26.00 © 2010 IEEE By R. T. Alves, M. R. Delgado and A. A. Freitas Knowledge Discovery with Artificial Immune Systems for Hierarchical Multi-label Classification of Protein Functions 978-1-4244-8126-2/10/$26.00 © IEEE 2010. [27] By LI Tong-yan, LI Xing-ming New Criterion for Mining Strong Association Rules in Unbalanced Events Intelligent Information Hiding and Multimedia Signal Processing 9780-7695-3278-3/08 $25.00 © IEEE 2008. [28] By Senduru Srinivasulu P.Sakthivel Extracting Spatial Semantics in Association Rules for Weather Forecasting Image 978-1-4244-9008-0/10/$26.00 ©IEEE 2010. [29] By Xinqi Zheng1 Lu Zhao2 Association Rule Analysis of Spatial Data Mining Based on Matlab Knowledge Discovery and Data Mining 0-7695-3090-7/08 $25.00 © IEEE 2008. [30] By Chuan-xiang Ma Ze-ming Fang Rui Chen An Intrusion Detection Method for Ad hoc network Based on Class Association Rules Mining Conference on Frontier of Computer Science and Technology 978-0-7695-3932-4/09 $26.00 © IEEE 2009. [13] By Senduru Srinivasulu P.Sakthivel Extracting Spatial Semantics in Association Rules for Weather Forecasting Image 978-1-4244-9008-0/10/$26.00 © IEEE 2010. [14] By TIAN He, XU Jing, LIAN Kunmei, ZHANG Ying Research on Strong-association Rule Based Web Application Vulnerability Detection 978-1-4244-4520-2/09/$25.00 © IEEE 2009. [15] By Dieferson Luis Alves de Araujo’ , Heitor S. Lopes’, Alex A. Freitas2 A Parallel Genetic Algorithm for Rule Discovery in Large Databases 0-7803-5731-0/99$$10.00109 99 IEEE. [16] By Xiaofeng Yuan, Hualong Xu, and Shuhong Chen “Improvement on the Constrained Association Rule Mining Algorithm of Separate” 1-4244-0682-X/06/$20.00 © IEEE 2006. [17] By WEI Yong-qing1, YANG Ren-hua2, LIU Pei-yu2 An Improved Apriori Algorithm for Association Rules of Mining 978-1-4244-3930-0/09/$25.00 © IEEE 2009. ISSN: 2321-2667 Volume 3, Issue 2, March 2014 56