R U G

Transcription

R U G
Journal of Research in Electrical and Electronics Engineering (ISTP-JREEE)
REDUCTION OF UNWANTED RULES BY GENETIC
ALGORITHM USING DISTANCE WEIGHT VECTOR
Sanjay Tiwari, Mahainder Kumar Rao, HOD, Department of Information Technology, AIET,Jaipur,,India, M.Tech, Department of
Computer Science and Engineering,AIET,Jaipur,India,
Abstract
Three different kinds of the novel enhanced genetic algorithm
procedures including the hybrid genetic algorithm, interval
genetic algorithm and hybrid interval genetic algorithm are
respectively presented. As the results of the proven systems
show, the hybrid genetic algorithm can determines the better
optimum design than the traditional optimization algorithms
and genetic algorithm. The classical association rule mining
generate rule with several problem such as pruning pass of
transaction database, negative rule generation and supremacy
of rule set. Time to time various researchers modified classical
association rule mining with different approach. But in current
scenario association rule mining suffered from supremacy rule
generation. In this dissertation we proposed a new algorithm
distance weight optimization of association rule mining. In
this method we find the near distance of rule set using
equalize distance formula and generate two class higher class
and lower class .the validate of class check by distance weight
vector. Basically distance weight vector maintain a threshold
value of rule item set. In whole process we used genetic
algorithm for optimization of rule set.
Keywords
Data Mining, KNN, GAPO-ARM, Genetic
Algorithm, MORA.
I. Introduction
Optimization is the process of making something better. In
engineering, optimization algorithms have been extensively
developed and well used in all respects for a long time. An
engineer or a scientist conjures up a new idea and
optimization improves on that idea. Optimization consists in
trying variations on an initial concept and using the
information gained to improve on the idea. Many optimization
problems from the industrial engineering world, in particular
the manufacturing systems, are very complex in nature and
quite hard to solve by conventional optimization techniques.
Genetic Algorithm (GA) is one of the optimization algorithms,
which is invented to mimic some of the processes observed in
natural evolution. The Genetic Algorithm is stochastic search
techniques based on the mechanism of natural selection and
natural genetics. That is a general one, capable of being
applied to an extremely wide range of problems. The GA,
differing from conventional search techniques, start with an
nitial set of random solutions called population. Each
individual in the population is called a chromosome,
some measures of fitness. To create the next generation, new
chromosomes, called offspring, are form by either merging
two chromosomes form current generation using a crossover
operator or modifying a chromosome using a mutation
operator. A new generation is form by selecting, according to
the fitness values, some of the parents and offspring; and
rejecting others so as to keep the population size constant.
Fitter chromosomes have higher probabilities of being
selected. After several generations, the algorithms converge to
the best chromosome, which hopefully represents the
optimum or suboptimal solution to the problem. The GA has
received considerable attention regarding their potential as a
novel optimization technique. There are three major
advantages when applying the GA to optimization problems.
● The GA do not have much mathematical requirements
about the optimization problems. Due to their evolutionary
nature, the GA will search for solutions without regard to the
specific inner workings of the problem.
● The evolution operators make GA effective at
performing global search. The traditional approaches
perform local search by convergent stepwise procedure,
which compares the values of nearby points and moves to
the relative optimal points. Global optimum can be found
only if the problem possesses certain convexity properties
that essentially guarantee that any local optimum is a global
one.
● GA provide a great flexibility to hybridize with
domain dependent heuristics to make an efficient
implementation for a specific problem.
In the above statement indicate, the GA have much
advantages. Even though the GA can locate the solution in the
whole domain, it does not solve complexes constraint
problems easily, especially for exact constraints. And huge
evaluations for generations and populations sometime are
time-consuming. To account some of the defects and employ
the advantages of the GA, the enhanced GA is proposed and
applied for the optimization design.
II. Proposed Method
We proposed a novel algorithm for optimization
representing a solution to problem at hand. The chromosomes association rule mining, the proposed algorithm resolve
evolve through successive iterations, called generations. problem of negative rule generation and also optimized
During each generation, the chromosomes are evaluated, using
ISSN: 2321-2667
Volume 3, Issue 2, March 2014
of
the
the
52
Journal of Research in Electrical and Electronics Engineering (ISTP-JREEE)
process of supremacy of rules[2]. Supremacy of association
rule mining is a great challenge for large dataset. In the
generation of supremacy of rules association existing
algorithm or method generate a series of negative rules, which
generated rule affected a performance of association rule
mining. In the process of rule generation various multi
objective association rule mining algorithm are proposed but
all these are not solve supremacy problem of association rule
mining.
In this algorithm we used second odder quadratic equation
and nearest neighbour classification technique for the
selection of set of candidate of supremacy of key for
generation of rules. In the generation of rule selection of
support value of transaction data set is play a important role ,
for this role we used heuristic search algorithm for better
searching of support value for generation of optimized
association rule.
III. Genetic Algorithm
Genetic Algorithm (GA), first introduced by John Holland
in the early seventies, is the powerful stochastic algorithm
based on the principles of natural selection and natural
genetics, which has been quite successfully, applied in
machine learning and optimization problems.
To solve a Problem, a GA maintains a population of
individuals (also called strings or chromosomes) and
probabilistically modifies the population by some genetic,
operators such as selection, crossover and mutation, with the
intent of seeking a near optimal solution to the problem.
Coding to Strings in GA[5,6], each individual in a population
is usually coded as coded as a fixed-length binary string. The
length of the string depends on the domain of the parameters
and the required precision. For example, if the domain of the
parameter x is [2,5] and the precision requirement is six places
after the decimal point, then the domain [2,5] should be
divided into 7,000,000 equal size ranges. The implies that the
length of the string requires to be 23, for the reason that
4194304=222<7000000<223=8388608 the decoding from a
binary string <b22b21…b0> into a real number is
straightforward and is completed in two steps.
(1). Convert the binary string <b22b21…b0> from the base 10
by
The initial process is quite simple. We create a population
of individuals, where individual in a population is a binary
string with a fixed-length, and every bit of the binary string is
initialized randomly [5].
B. Evaluation
In each generation for which the GA is run, each
individual in the population is evaluated against the unknown
environment. The fitness values are associated with the values
of objective function.
C. Genetic Operators
Genetic operators drive the evolutionary process of a
population in GA, after the Darwinian principle of survival of
the fittest and naturally occurring genetic operations. The most
widely used genetic operators are reproduction, crossover and
mutation. To perform genetic operators, one must select
individuals in the population to be operated on .The selection
strategy is chiefly based on the fitness level of the individuals
actually presented in the population [7]. Let us illustrate these
three genetic operators. As an individual is selected,
reproduction operators only copy it from the current
population into the new population (i.e., the new generation)
without alternation. The crossover operator starts with two
selected individuals and then the crossover point (an integer
between 1 and L-1, where L is the length of strings) is selected
randomly. Assuming the two parental individuals are x1 and
x2, and the crossover point is 5 (L=20). If
X1= (01001|101100001000101)
X2= (11010|011100000010000)
Then the two resulting offspring are
X’1= (01001|011100000010000)
X’2= (11010|101100001000101)
The third genetic operator, mutation, introduces random
changes in structures in the population, and it may
occasionally have beneficial results: escaping from a local
optimum. In our GA, mutation is just to negate every bit of the
strings, i.e., changes a 1 to 0 and vice versa, with probability
pm.
IV.
Proposed
Algorithm
…………………………1
(2). Calculate the corresponding real number x by
……………..2
A. Initial Population
ISSN: 2321-2667
GAPO-ARM
The proposed algorithm is a combination of support weight
value and near distance of superior candidate key. Support
weight key is a vector value given by the transaction data set.
The support value passes as a vector for finding a near
distance between superior candidate key. When we finding a
higher candidate key the nearest distance divides into two
classes, one class take a higher order value and another class
gain lower value for the rule generation process. The process
of selection of class also reduces the passes of the data set
dividing. After finding a class of lower and higher of giving
support value, compare the value of distance weight vector.
Volume 3, Issue 2, March 2014
53
Journal of Research in Electrical and Electronics Engineering (ISTP-JREEE)
Here distance weight vector work as a fitness function for
selection process of genetic algorithm. Here we present the
steps of process of algorithm step by step and finally draw a
flow chart of complete process.
processed to reduce negative rules and used to find the
positive rules and insignificant data points from the dataset.
IV.
Implementation
Experimental Results
and
To investigate the effectiveness of the proposed
method implemented on Mat lab 7.8.0.347(R2009a). In the
research work, I have measured execution time of the rule
generation procedure and no. of rule generation. To evaluate
these performance parameters I have used the Wine dataset
and the Chess data set datasets from UCI machine learning
repository [10]. Experimental result demonstrates that the
proposed GAPO-ARM algorithm reduces the execution time
of MORA algorithm. It also improves rule generation process
and accuracy for all five dataset than MORA. The
experimental results also demonstrate that the proposed
GAPO-ARM method is best then MORA in association rule
mining algorithm. This makes using GAPO-ARM to process
large datasets inefficient. The main purpose of the proposed
GAPO-ARM algorithm is to reduce negative rule generation
of rule generation procedure with positive rule generation and
improve the ability of positive rule generation in dealing with
various dataset problems.
Figure 1: Architecture of our proposed algorithm.
The whole process of proposed algorithm is shown in the
architecture of the proposed algorithm in Figure 1. In the
architecture model ellipse represents input or output and
rectangle represent the calculation process. In architecture
model, it is shown that as an input datasets from UCI machine
learning repository [10] are used, which consists of various
attributes and number of instances. The input dataset is pre-
ISSN: 2321-2667
Figure 2: shows the comparative value of minimum
support and execution time of MORA and GAPO-ARM
algorithm for the processor extraction of rule. GAPOARM takes more time in comparison of sample MORA
algorithm.
GAPO-ARM algorithm can achieves this goal for
small as well as large datasets and overcomes the drawback of
MORA. Experimental result demonstrates that the proposed
GAPO-ARM algorithm increase the execution time as
comparison to the MORA algorithm but it reduce the rule
generation of rule generation procedure.
Volume 3, Issue 2, March 2014
54
Journal of Research in Electrical and Electronics Engineering (ISTP-JREEE)
Figure 3: Shows the comparative value of data set and No.
of rule generation of MORA and GAPO-ARM algorithm
for the processor extraction of rule. GAPO-ARM
generates less No. of rules in comparison of sample MORA
algorithm.
The experimental results demonstrate that the
proposed GAPO-ARM method is best then MORA in
association rule mining algorithm.
Figure 4: Shows the comparative value of minimum
support and execution time of MORA and GAPO-ARM
algorithm for the processor extraction of rule. GAPOARM takes more time in comparison of sample MORA
algorithm.
Figure 5: Shows the comparative value of data set and No.
of rule generation of MORA and GAPO-ARM algorithm
for the processor extraction of rule. GAPO-ARM
generates less No. of rules in comparison of sample MORA
algorithm.
V. Conclusions and Future Work
We proposed a novel method for optimization of
association rule mining. Our propped algorithm is
combination of distance function and genetic algorithm. We
have observed that when we modify the distance weight new
rules in large numbers are found. This implies that when
weight is solely determined through support and confidence,
there is a high chance of eliminating interesting rules. With
more rules emerging it implies there should be a mechanism
for managing their large numbers. The large generated rule is
optimized with genetic algorithm.
We theoretically proofed a relation between locally large
and globally large patterns that is used for local pruning at
each site to reduce the searched candidates. We derived a
locally large threshold using a globally set minimum recall
threshold. Local pruning achieves a reduction in the number
of searched candidates and this reduction has a proportional
impact on the reduction of exchanged messages. Suggestion
for future work, in the process of distance weight calculation
large number of rule set generated that rule set take huge
amount of time in comparison of MORA algorithm. In future
we minimize the time complexity of our method.
VI. References
[1] By Rakesh Agrawal Tomasz Imielinski Arun Swami Mining
Association Rules between Sets of Items in Large Databases
ACM SIGMOD Conference Washington DC, USA, May 1993.
ISSN: 2321-2667
[2]
By Rakesh Agrawal Ramakrishnan Srikant_ Fast Algorithms
for Mining Association RulesVLDB ConferenceSantiago,
Chile, 1994.
[3]
By Ramakrishnan Srikant* Rakesh Agrawal Mining
Generalized Association Rules VLDB Conference Zurich,
Swizerland, 1995.
Volume 3, Issue 2, March 2014
55
Journal of Research in Electrical and Electronics Engineering (ISTP-JREEE)
[4]
By N. Chaiyarataiia and A. M. S. Zalzala Recent
Developments in Evolutionary and Genetic Algorithms:
Theory and Applications Innovations and Applications, 2-4
September 1997, Conference Publication NO. 4 4 6 ,IEEE ,
1997.
[5]
By Q. C. Meng , T.J. Feng I , 2. Chen I , C.J. Zhou , J.H. Bo2
Genetic Algorithms Encoding Study and A Sufficient
Convergence Condition of GAS 0-7803-5731-0/9)sk$l0.00 0
IEEE 1999.
[6]
By Pengfei Guo Xuezhi Wang Yingshi Han The Enhanced
Genetic Algorithms for the Optimization Design 978-1-42446498-2/10/$26.00 © IEEE 2010.
[20] By Anirban Mukhopadhyay_,Ujjwal Mauliky,Sanghamitra
Bandyopadhyayz and Roland Eils “Mining Association Rules
from HIV-Human Protein Interactions “Systems in Medicine
and Biology 16-18 December 2010,
[7]
By Masaya Yoshikawa and Hidekazu Terai A Hybrid Ant
Colony Optimization Technique for Job-Shop Scheduling
Problems Software Engineering Research, Management and
Applications (SERA’06) 0-7695-2656-X/06 $20.00 © 2006.
[21] By Peter P. Wakabi ,Waiswa ,Venansius Baryamureeba,
Karunakaran and Sarukesi “Optimized Association Rule
Mining with Genetic Algorithms” Natural Computation 978-14244-9953-3/11/$26.00 © IEEE 2011.
[8]
By Chi-Ren Shyu1,2, Matt Klaric1,2, Grant Scott1,2, and
Wannapa Kay Mahamaneerat1 Knowledge Discovery by
Mining Association Rules and Temporal-Spatial Information
from Large-Scale Geospatial Image Databases 0-7803-95107/06/$20.00 © IEEE 2006.
[22] By A.B.M.Rezbaul Islam, Tae-Sun Chung ” An Improved
Frequent Pattern Tree Based Association Rule Mining
Technique” 978-1-4244-9224-4/11/$26.00 ©IEEE 2011.
[9]
By LI Tong-yan, LI Xing-ming New Criterion for Mining
Strong Association Rules in Unbalanced Events Intelligent
Information Hiding and Multimedia Signal Processing 978-07695-3278-3/08 $25.00 © IEEE 2008.
[18] By R. Uday Kiran and P. Krishna Reddy An Improved
Multiple Minimum Support Based Approach to Mine Rare
Association Rules 978-1-4244-2765-9/09/$25.00 © IEEE
2009.
[19] By By Walter Christian Kammergruber, Maximilian Viermetz,
Karsten Ehms ” Using Association Rules for Discovering Tag
Bundles in Social Tagging Data” 978-1-4244-78187/10/$26.00_c IEEE 2010.
[23]
By Li-Min Tsai, Shu-Jing Lin, and Don-Lin Yang
“Efficient Mining of Generalized Negative Association
Rules” Granular Computing 978-0-7695-4161-7/10 $26.00
© IEEE 2010.
[24]
By Sandeep Singh Rawat, Lakshmi Rajamani “Probability
Apriori based Approach to Mine Rare Association Rules”
Data Mining and Optimization (DMO) 28-29 June 2011,
Selangor, Malaysia 978-1-61284-212-7/11/$26.00 ©IEEE
2011.
[10] By Zhibo Chen, Carlos Ordonez, Kai Zhao Comparing
Reliability of Association Rules and OLAP Statistical Tests
Data Mining Workshops 978-0-7695-3503-6/08 $25.00 ©
IEEE 2008.
[25]
[11] By Lijuan Zhou Linshuang Wang Xuebin Ge Qian Shi A
Clustering-Based KNN Improved Algorithm CLKNN for Text
Classification Informatics in Control, Automation and Robotics
978-1-4244-5194-4/10/$26.00 ©IEEE 2010 .
By Sahar M. Ghanem Mona A. Mohamed Magdy H. Nagi
EDP-ORD: Efficient Distributed/Parallel Optimal Rule
Discovery 978-1-4577-0681-3/11/$26.00 © IEEE 2011.
[26]
[12] By XING Xue CHEN Yao WANG Yan-en Study on Mining
Theories of Association Rules and Its Application Information
Technology and Ocean Engineering 978-0-7695-3942-3/10
$26.00 © 2010 IEEE
By R. T. Alves, M. R. Delgado and A. A. Freitas
Knowledge Discovery with Artificial Immune Systems for
Hierarchical Multi-label Classification of Protein Functions
978-1-4244-8126-2/10/$26.00 © IEEE 2010.
[27]
By LI Tong-yan, LI Xing-ming New Criterion for Mining
Strong Association Rules in Unbalanced Events Intelligent
Information Hiding and Multimedia Signal Processing 9780-7695-3278-3/08 $25.00 © IEEE 2008.
[28]
By Senduru Srinivasulu P.Sakthivel Extracting Spatial
Semantics in Association Rules for Weather Forecasting
Image 978-1-4244-9008-0/10/$26.00 ©IEEE 2010.
[29]
By Xinqi Zheng1 Lu Zhao2 Association Rule Analysis of
Spatial Data Mining Based on Matlab Knowledge
Discovery and Data Mining 0-7695-3090-7/08 $25.00 ©
IEEE 2008.
[30]
By Chuan-xiang Ma Ze-ming Fang Rui Chen An Intrusion
Detection Method for Ad hoc network Based on Class
Association Rules Mining Conference on Frontier of
Computer Science and Technology 978-0-7695-3932-4/09
$26.00 © IEEE 2009.
[13] By Senduru Srinivasulu P.Sakthivel Extracting Spatial
Semantics in Association Rules for Weather Forecasting Image
978-1-4244-9008-0/10/$26.00 © IEEE 2010.
[14] By TIAN He, XU Jing, LIAN Kunmei, ZHANG Ying
Research on Strong-association Rule Based Web Application
Vulnerability Detection 978-1-4244-4520-2/09/$25.00 © IEEE
2009.
[15] By Dieferson Luis Alves de Araujo’ , Heitor S. Lopes’, Alex
A. Freitas2 A Parallel Genetic Algorithm for Rule Discovery in
Large Databases 0-7803-5731-0/99$$10.00109 99 IEEE.
[16] By Xiaofeng Yuan, Hualong Xu, and Shuhong Chen
“Improvement on the Constrained Association Rule Mining
Algorithm of Separate” 1-4244-0682-X/06/$20.00 © IEEE
2006.
[17] By WEI Yong-qing1, YANG Ren-hua2, LIU Pei-yu2 An
Improved Apriori Algorithm for Association Rules of Mining
978-1-4244-3930-0/09/$25.00 © IEEE 2009.
ISSN: 2321-2667
Volume 3, Issue 2, March 2014
56