Christian W. Günther and Wil MP van der Aalst
Transcription
Christian W. Günther and Wil MP van der Aalst
Fuzzy Mining -Adaptive Process Simplification Christian W. Günther and Wil M.P. van der Aalst Outline • Introduction • Less-Structured Process-the Infamous Spaghetti Affair • An Adaptive Approach for Process simplification • Log-based Process Metrics • Adaptive Graph Simplification • Implementation and Application • Related Work • Discussion and Future work Introduction •What is Fuzzy mining? - Adaptive Process Simplification based on Multi-perspective metrics. •Why do we need this approach? - The traditional approach shows all detail, and we need to simplify this one to see the point Less-structured process: the infamous spaghetti affair •The fundamental idea of process •More making less structured mining is both simple and persuasive. mining model to make flexible more mining model gets mass •Real life is unpredictable and very flexible unlike computer system •It makes process model look like spaghetti •However, many useful and technical mining algorithms support that proper way to build system. Mass directions on the process model seem to be infeasible task. With this two assumptions, traditional process mining cannot reach deriving more reality. Assumption 1: All logs are reliable and trustworthy Assumption 2: There exists an exact process which is reflected in the logs To make process mining a useful tool in practical, lessstructured settings, these assumptions need to be discarded. An Adaptive Approach for process simplification Process mining need to be able to provide high-level view on the process so that it is suitable for less-structured environment. Like the road map, we can derive the simple adaptive approach for process by emphasizing four facts such as aggregation, abstraction, emphasis, and customization. To develop the simple and visual process model, there are two fundamental metrics which can support such decisions. Significance measures the relative importance of behavior Correlation measures how closely related related two events following one another are. For process simplification, we can demonstrate our approach. There is a set of metrics to measure significance or correlation based on different perspectives. The user can customize the produced results to a large degree. Log-Based Process Metrics To approach making configureable and extensible framework, there are three types of metrics. Metrics Framework - Because the log contains a large number of undesired events, actual causal dependencies may go unrecorded. - To solve it, we allow measuring relationships of arbitrary length ex) A B and B C A C 1.Unary Significance Metrics :can demonstrate the relative importance of an event class. - Frequency significance :weight on the log value by frequency. - Routing significance :the points are interesting in defining the structure of a process 2.Binary Significance Metrics :can describe the relative importance of a precedence relation between two even classes. - Distance significance : the more the significance of a relation differs from its source and target nodes’ significances, the less its distance significance value 3.Binary Correlation Metrics :measures the distance of events in a precedence relation Binary correlation is the main driver of the decision between aggregation or abstraction of less-significant behavior. Proximity correlation evaluates event classes. Originator correlation is determined from the names of the persons. Endpoint correlation compares the activity names of subsequent events. Adaptive Graph Simplification Most process mining techniques focus on mapping behavior found in the log to typical process design patterns. This paper focuses on high-level mapping of behavior found in the log. There are three transformation methods to the process model such as conflict resolution, edge filtering and aggregation and abstraction 1.Conflict resolution in Binary Relations. :whenever two nodes in the initial process model are connected by edges in both directions. Length-2-loop, Exception, Concurrency Relative significance 2.Edge Filtering :remove remaining edges. isolate the most important behavior Edge filtering evaluates each edge by its utility util(A, B), a weighted sum of its significance and correlation. Util(A, B) = ur X sig(A, B) + (1 - ur) X cor(A, B) The edge cutoff parameter determines the aggressiveness of the algorithm. 3.Node Aggregation and Abstraction :the most effective tool for simplification is removing nodes How ? preserve highly correlated groups of less-significant nodes as aggregated clusters. Victim : Every node whose unary significance is below this threshold becomes a victim. The first step is to build initial clusters of less-significant behavior. -For each victim, find the most highly correlated neighbor -If this neighbor is a cluster node, add the victim to this cluster -Otherwise, create a new cluster node, and add the victim as its first element. How to merge clusters and decrease their number -For each cluster, check whether all predecessors or all successors are also clusters. - If all predecessor nodes are clusters as well, merge with the most highly correlated one and move on to the next cluster - If all successors are clusters as well, merge with the most highly correlated one - Otherwise, if both the cluster’s pre- and postset contain regular nodes, the cluster is left untouched. Abstraction : removes isolated and singular cluster.