ÚÙد ÙÙ Ù٠٠ت٠رÛÙ - Hassan Saneifar Professional Page
Transcription
ÚÙد ÙÙ
ÙÙ
٠تÙ
رÛÙ - Hassan Saneifar Professional Page
Expert Systems and Knowledge Engineering Hassan Saneifar, Ph.D. Introduction to Artificial Intelligence • The subfield of computer science concerned with symbolic reasoning and non-algorithmic methods of problem solving • How to make computers do things at which people are better • Let discuss it a little more… What is Intelligence? • Ability to understand and learn things. • Ability to think and understand instead of doing things by instinct or automatically. • Ability to learn and understand, to solve problems and to make decisions. What is Artificial Intelligence? • As a science is to make machines do things that would require intelligence if done by humans. • To develop more powerful, versatile programs that can handle problems currently handled efficiently only by the human mind [Balci 1996]. What is Artificial Intelligence? How do we determine whether a particular computer has demonstrated intelligence? • From a philosophical perspective, "one considers questions regarding intelligence itself and whether machines can possess actual intelligence or merely simulate its presence. » • From an applied perspective, the question is "how technology can be applied to produce machines that behave in intelligent ways" [Brookshear 1997]. Turing Imitation Game Alan Turing questions: • • Is there thought without experience? • Is there mind without communication? • Is there language without living? • Is there intelligence without life? Turing Imitation Game: • Invented by the British mathematician Alan Turing • Around 50 years ago Turing Imitation Game Phase 1 In the first phase, the interrogator, a man and a woman are each placed in separate rooms. The interrogator’s objective is to work out who is the man and who is the woman by questioning them. The man should attempt to deceive the interrogator that he is the woman, while the woman has to convince the interrogator that she is the woman. Turing Imitation Game Phase 2 In the second phase of the game, the man is replaced by a computer programmed to deceive the interrogator as the man did. It would e v e n b e p ro g r a m m e d t o m a k e mistakes and provide fuzzy answers in the way a human would. If the computer can fool the interrogator as often as the man did, we may say this computer has passed the intelligent behavior test. Turing Remarks • By maintaining communication between the human and the machine via terminals, the test gives us an objective standard view on intelligence. • A program thought intelligent in some narrow area of expertise is evaluated by comparing its performance with the performance of a human expert. • To build an intelligent computer system, we have to capture, organize and use human expert knowledge in some narrow area of expertise. AI Examples – – – – – – – – http://www.generation5.org/jdk/demos.asp http://www.aridolan.com/ofiles/eFloys.html http://www.aridolan.com/ofiles/iFloys.html http://www.arch.usyd.edu.au/~rob/#applets http://www.softrise.co.uk/srl/old/caworld.html http://people.clarkson.edu/~esazonov/neural_fuzzy/loadsway/LoadSway.htm http://www.iit.nrc.ca/IR_public/fuzzy/FuzzyTruck.html http://www.pandorabots.com/pandora/talk?botid=f5d922d97e345aa1 Expert Systems • Expert Systems (ES) are computer programs that try to replicate knowledge and skills of human experts in some area, and then solve problems in this area (the way human experts would). • ES take their roots in Cognitive Science — the study of human mind using combination of AI and psychology. • Expert Systems which embody some non-algorithmic expertise for solving certain types of problems. • ES were the first successful applications of AI to real–world problems solving problems in medicine, chemistry, finance and even in space (Space Shuttle, robots on other planets). Expert System’s Background 1943: Post, E. L. proved that any computable problem can be solved using a set of IF–THEN rules (Production Systems) 1961: GENERAL PROBLEM SOLVER (GPS) by A. Newell and H. Simon. 1969: DENDRAL (Feigenbaum, Buchanan, Lederberg) was the first system that showed the importance of domain–specific knowledge (expertise) - (Knowledge-Based Systems). 1972 to 1980: MYCIN: Sepration of reasoning method from knowledge (Expert system’s shell) Production Systems Production systems (or rule–based systems) are programs that instead of conventional algorithms use sets of IF–THEN rules (production rules). Unlike in algorithms, the order in which these rules should be used is not specified. It is decided by the program itself with respect to a problem state. • In 1943, Post proved that any computable problem can be implemented in a production system. • Cognitive scientists became interested in production systems because they seemed to represent better the way humans think and solve problems. Introduction to post and markov also in production systems. More about conflict solving : cycles Early Expert Systems: General Problem Solver • In 1961, A. Newell and H. Simon wrote a program called General Problem Solver (GPS) that could solve many different problems using only a small set of rules • GPS used a strategy known as means–ends analysis • GPS produced solutions very similar to those people came up with • Methods that can be applied to a broad range of problems are called weak methods (because they use weak information about the problem domain). Their performance, however, is also usually weak Knowledge-Based Systems • DENDRAL (Feigenbaum et al, 1969) was a program that used rules to infer molecular structure from spectral information. The challenge was that the number of possible molecules was so large, that it was impossible to check all of them using simple rules (weak method). • The researchers consulted experts in chemistry and added several more specific rules to their program. The number of combinations the program had to test was reduced dramatically due to the added knowledge to the system. • DENDRAL demonstrated the importance of the domain–specific knowledge. • Today Expert System are Knowledge-based Systems Basic Concepts of ES • How to determine who experts are. • How expertise can be transferred from a person to a computer. • How the system works. Basic Concepts of ES • Expert: A human being who has developed a high level of proficiency in making judgments in a specific, usually narrow, domain. Basic Concepts of ES • Expertise: • A specialized type of knowledge and skill that experts have. • The implicit knowledge and skills of the expert that must be extracted and made explicit so that it can be encoded in an expert system Features of Expert Systems • Expertise Possesses expertise for expert-level decisions • Symbolic reasoning Knowledge represented by symbolic representation • Deep knowledge Complex knowledge not easily known in non- experts • Self-knowledge Examine its own reasoning; provide explanations Application of ES Instances of ES ES Components Major components • Knowledge base • Inference engine • User interface • Blackboard (Working memory) • Explanation subsystem (justifier) ES may also contain: • Knowledge acquisition subsystem • Knowledge refining system Major Components of ES Knowledge Base: A collection of facts, rules, and procedures Organized into schemas. The assembly of all the information and knowledge about a specific field of interest Inference engine: The part of an expert system that actually performs the reasoning function. User interfaces: The parts of computer systems that interact with users, accepting commands from the computer keyboard and displaying the results generated by other parts of the systems. Major Components of ES • Blackboard (Working memory): An area of working memory set aside for the description of a current problem (facts) and for recording intermediate results in an expert system. • Explanation subsystem (justifier) The component of an expert system that can explain the system’s reasoning and justify its conclusions. Architecture of an Expert Systems Explain more how an expert systems work: the task of WM and reasoning etc. Knowledge Representation • A representation is a set of conventions about how to describe a class of things. A description makes use of the conventions of a representation to describe some particular thing.[Winston 1992]. Knowledge Two special types of knowledge: • a priori • a posteriori A priori knowledge: • comes before and is independent of knowledge from the senses • is considered to be universally true and cannot be denied without contradiction • examples of a priori knowledge: logic statements, mathematical laws, and the knowledge possessed by teenagers A posteriori knowledge: • is knowledge derived from the senses: • since sensory experience may not always be reliable, a posteriori knowledge can be denied on the basis of new knowledge without the necessity of contradictions Knowledge Knowledge Knowledge Representation • Knowledge engineer • • An AI specialist responsible for the technical side of developing an expert system. The knowledge engineer works closely with the domain expert to capture the expert’s knowledge in a knowledge base Knowledge engineering (KE) • The engineering discipline in which knowledge is integrated into computer systems to solve complex problems normally requiring a high level of human expertise Knowledge Representation Schemes • Representing the knowledge of humans in a systematic manner • This knowledge is represented in a knowledge base such that it can be retrieved for solving problems. • Some of knowledge representation schemes: – Production Rules – Semantic Networks – Frames – Logic: – Propositional Logic – First-order logic – XML / RDF – … Semantic Networks • Concepts as hierarchical networks [R. Quillian (1966,1968)] • Amended with some additional psychological assumptions to characterize the structure of human semantic memory. • A semantic network is a structure for representing knowledge as a pattern of interconnected nodes and arcs: • • Nodes: concepts of entities, attributes, events, values. Arcs: relationships that hold between the concepts. • Used for propositional information • A proposition: a statement that is either true or false • A labeled, directed graph Semantic Networks • Semantic Networks [Collins and Quillian 1969]: – Concepts can be represented as hierarchies of inter-connected concept nodes (e.g. animal, bird, canary) – Any concept has a number of associated attributes at a given level (e.g. animal --> has skin; eats etc.) – Some concept nodes are superordinates of other nodes (e.g. animal > bird) and some are subordinates (canary < bird) – Subordinates inherit all the attributes of their superordinate concepts (we will talk about penguins and ostriches !!! ) General Net. vs Semantic Net Network Relationships Semantic Network representation of properties of snow and ice Semantic Networks Two types of commonly used links: • IS-A is an instance of' and refers to a specific member of a class (group of objects) • A-KIND-OF The link AKO is used here to relate one class to another • AKO relates generic nodes to generic nodes while the IS-A relates an instance or individual to a generic class • The objects in a class have one or more attributes in common • • Each attribute has a value The combination of attribute and value is a property Semantic Networks Exercises • Represent the following two sentences into the appropriate semantic network: – is_a(person, mammal) – instance_of(N. Hejazi, person) – team(N. Hejazi, Esteghlal) – score(Tractor, Piroozi, 3-1) – Ali gave Reza the book all in one graph Solution 1 • is_a(person, mammal) • instance_of(N. Hejazi, person) • team(N. Hejazi, Esteghlal) mammal is_a person has_part head instance_of N. Hejazi team Esteghlal Solution 2 • score(Tractor, Piroozi, 3-1) Game Is_a Tractor Away_team Fixture 5 Score Home_team Piroozi 3-1 Solution 3 • Ali gave Reza the ES book Gave Book Action Ali Agent Event 1 Patient Reza Instance Object ES Book Advantages of Semantic Networks • • • • • • Easy to visualize and understand. The knowledge engineer can arbitrarily define the relationships. Related knowledge is easily categorized. Efficient in space requirements. Node objects represented only once. Standard definitions of semantic networks have been developed. Limitations of Semantic Networks • The limitations of conventional semantic networks were studied extensively by a number of workers in AI. • Many believe that the basic notion is a powerful one and has to be complemented by, for example, logic to improve the notion’s expressive power and robustness. • Others believe that the notion of semantic networks can be improved by incorporating reasoning used to describe events. Limitations of Semantic Networks • Binary relations are usually easy to represent, but sometimes is difficult - John caused problem to the party when he left. • Other problematic statements: - negation: John does not go fishing - disjunction: John eats pizza or fish and chips - … • Quantified statements are very hard for semantic nets: - Every dog has bitten a postman - Every dog has bitten every postman - Solution: Partitioned semantic networks Partitioned Semantic Networks • To represent the difference between the description of an individual object or process and the description of a set of objects. The set description involves quantification [Hendrix (1976, 1979)] • Hendrix partitioned a semantic network whereby a semantic network, loosely speaking, can be divided into one or more networks for the description of an individual. Partitioned Semantic Networks • The central idea of partitioning is to allow groups, nodes and arcs to be bundled together into units called spaces – fundamental entities in partitioned networks, on the same level as nodes and arcs (Hendrix 1979:59). • Every node and every arc of a network belongs to one or more spaces. • Some spaces are used to encode 'background information' or generic relations; others are used to deal with specifics called 'scratch' space. Partitioned Semantic Networks • Suppose that we wish to make a specific statement about a dog, Danny, who has bitten a postman, Peter: – " Danny the dog bit Peter the postman" • Hendrix’s Partitioned network would express this statement as an ordinary semantic network: S1 dog bite is_a Danny postman is_a agent B is_a patient Peter Partitioned Semantic Networks • Suppose that we now want to look at the statement: – "Every dog has bitten a postman" • Hendrix partitioned semantic network now comprises two partitions SA and S1. Node G is an instance of the special class of general statements about the world comprising link statement, form, and one universal quantifier (∀) SA General Statement dog S1 is_a form G ∀ bite is_a D postman is_a agent B is_a patient P Partitioned Semantic Networks • Suppose that we now want to look at the statement: – "Every dog has bitten every postman" SA General Statement dog S1 is_a form G ∀ bite is_a D postman is_a agent ∀ B is_a patient P Partitioned Semantic Networks • Suppose that we now want to look at the statement: – "Every dog in town has bitten the postman" SA dog ako General Statement S1 is_a form G 'ako' = 'A Kind Of ' bite town dog ∀ is_a is_a D postman agent B is_a patient P Exercises • Try to represent the following two sentences into the appropriate semantic network: – "Ali believes that pizza is tasty" – "Every student loves to have an exam" !!! ;-) Solution 1: "Ali believes that pizza is tasty" believes is_a Ali agent event object space tasty pizza is_a object is_a has property Frames • Represents related knowledge about a narrow subject that has much default knowledge • A frame system would be a good choice for describing a mechanical device, for example a car • The frame contrasts with the semantic net, which is generally used for broad knowledge representation • Just as with semantic nets, there are no standards for defining frame-based systems • A frame is analogous to a record structure, corresponding to the fields and values of a record are the slots and slot fillers of a frame • A frame is basically a group of slots and fillers that defines a stereotypical object • The car is the object, the slot name is the attribute, and the filler is the value Frames Frames • Frame-based expert systems are very useful for representing causal knowledge because their information is organized by cause and effect • The slots may also contain procedures attached to the slots, called procedural attachments • The if-needed type is executed when a filler value is needed but none are initially present or the default value is not suitable • • Defaults are often used to represent commonsense knowledge • The if-added type is run for procedures to be executed when a value is to be added to a slot • An if-removal type is run whenever a value is to be removed from a slot Slot fillers may also contain relations: • e.g. a-kind-of and is-a relations Frames Frames Logic • Knowledge can also be represented by the symbols of logic, which is the study of the rules of exact reasoning. • Logic is also of primary importance in expert systems in which the inference engine reasons from facts to conclusions. • A descriptive term for logic programming and expert systems is automated reasoning systems. Formal logic Formal logic is concerned with the syntax of statements, not their semantics • An example of formal logic, consider the following clauses with nonsense words squeeg and moof Premise: All squeegs are moofs Premise: John is a squeeg Conclusion: John is a moof • The argument is valid no matter what words are used Premise: All X are Y Premise: Z is a X Conclusion: Z is a Y (is valid no matter what is substituted for X, Y, and Z) • Separating the form from the semantics, the validity of an argument can be considered objectively, without prejudice caused by the semantic Propositional logic • Propositional logic is used to assert propositions, which are statements that are either true or false. It deals only with the truth value of complete statements and does not consider relationships or dependencies between objects. • Propositional logic is concerned with the subset of declarative sentences that can be classified as either true or false Propositional logic • A sentence whose truth value can be determined is called a statement or proposition • A statement is also called a closed sentence because its truth value is not open to question • Statements that cannot be answered absolutely are called open sentences • A compound statement is formed by using logical connectives on individual statements Propositional logic Propositional logic Propositional logic Propositional logic Propositional logic Propositional logic First-Order Logic • First-order logic (FOL) is an extension and generalization of propositional logic. • Its formulas contain variables which can be quantified: Two common quantifiers are the existential ∃ and universal ∀ quantifiers. • The variables could be elements in the universe, or perhaps relations or functions over the universe. First-order Logic Variable symbols: x, y, z, ... Function symbols: f, g, h, f(x), g(x,y), ... Predicate symbols: P, Q, R, P(x), Q(x,y), Logic symbols: “¬”, “∧”, “∨”, “∃”, “∀”, “=”, “→” Punctuation symbols: “(“, “)”, and “.” First-order Logic • ∀x∀y is the same as∀y∀x • ∃x∃y is the same as ∃y∃x • ∃x∀y is not the same as ∀y∃x • ∃x ∀y Loves(x,y) “There is a person who loves everyone in the world” • ∀y ∃x Loves(x,y) “Everyone in the world is loved by at least one person” • Quantifier duality: Each can be expressed using the other ∀x Likes(x, icecream) ¬∃x ¬Likes(x, iceCream) ∃x Likes(x, broccoli) ¬∀x ¬Likes(x, broccoli) Rules • A Production Rule System emulates human reasoning using a set of ‘productions’ • Productions have two parts – Sensory precondition (“IF” part) – Action (“THEN” part) • When the state of the ‘world’ matches the IF part, the production is fired, meaning the action is executed – The ‘world’ is the set of data values in the system’s working memory – For a clinical expert systems, this is usually data about a patient, which, ideally, has come from (and may go back to) an electronic medical record, or it may be entered interactively (or usually a little of each) • So production rules link facts (“IF” parts, also called antecedents) to conclusions (“THEN” parts, also called consequents) Rules MYCIN Example • MYCIN – Developed at Stanford from 1972 to 1980 – Helped physicians diagnose a range of infectious blood diseases • Separated the methods of reasoning on productions (‘inference engine’) from the rules themselves (the ‘knowledge base) – Became the first expert systems shell when distributed as ‘empty MYCIN’ (EMYCIN) Rules MYCIN Example Example MYCIN rule: IF the stain of the organism is gram negative AND the morphology of the organism is rod AND the aerobicity of the organism is anaerobic THEN there is strongly suggestive evidence (0.8) that the class of the organism is Enterobacter iaceae. • This rule has three predicates (yes/no, or Boolean, values that determine if it should fire) • In this case each predicate involves the equality of a data field about a patient to a specific qualitative value (e.g., [stain of the organism] = ‘gram negative’) • Note that human expertise is still needed – e.g., to decide that the morphology of the organism is ‘rod’ (nonetheless to understand its vocabulary!) • Notice it produces a new fact (regarding [class of the organism]) • Note this is ‘symbolic reasoning’ – working with concepts as compared to numbers (it’s not like y = x1 + 4.6 x2) Different Types of Rules Relationship Rules: IF the battery is dead Then the car will not start Recommendation Rules: IF the car will not start THEN take a cab Different Types of Rules Directive Rules: IF the car will not start AND the fuel system is ok THEN check out the electrical system Strategy Rules: IF the car will not start THEN first check out the fuel system THEN check out the electrical system Different Types of Rules Heuristic Rules: IF the car will not start AND the car is a 1957 Ford THEN check the float Meta Rules: IF the car will not start AND the electrical System is operating normally THEN use rules concerning the fuel system Reasoning Deduction Deductive reasoning, also deductive logic or logical deduction or, informally, "top-down" logic is the process of reasoning from one or more statements (premises) to reach a logically certain conclusion. (reasoning from the general to the specific) - All men are mortal. - Socrates is a man. - Therefore, Socrates is mortal. Induction Inductive reasoning is a reasoning in which the statements (premises) seek to supply strong evidence for (not absolute proof of) the truth of the conclusion. In other words, The process of reasoning in which a conclusion about all members of a class from examination of only a few members of the class; reasoning from the particular to the general. - 100% of biological life forms that we know of depend on liquid water to exist. - Therefore, if we discover a new biological life form it will probably depend on liquid water to exist. Abduction Abductive reasoning is a form of logical inference that goes from an observation to a hypothesis that accounts for the observation, ideally seeking to find the simplest and most likely explanation. In abductive reasoning, unlike in deductive reasoning, the premises do not guarantee the conclusion. - he lawn is wet. - If it rained last night, then it would be unsurprising that the lawn is wet. - Therefore, by abductive reasoning, the possibility that it rained last night is reasonable Forward-chaining Inference Forward-chaining Inference Starts with some facts and applies rules to find all possible conclusions Steps: 1. Consider the initial facts and store them in working memory 2. Check the antecedent part of the rules. 3. If all the conditions are matched, fire the rule. 4. If there is only one rule, do the following: A. Perform necessary actions B. Modify working memory and update facts. C. Check for new conditions 5. If more than one rule is selected, use the conflict resolution strategy to select the most appropriate rule and go to Step 4 6. Continue until an appropriate rule is found and executed. Forward-chaining Inference Forward-chaining Inference Forward-chaining Example Knowledge Base Rules: Rule 1: If the patient has a sore throat AND we suspect a bacterial Infection THEN we believe the patient has strep throat Rule 2: IF the patient’s temperature is > 37 THEN the patient has a fever Rule 3: IF the patient has been sick over a month AND the patient has a fever THEN we suspect a bacterial infection Rule 4: IF the patient has a fever THEN the patient can’t go out on a date Rule 5: IF the patient can’t go out on a date THEN the patient should stay home and read a book Forward-chaining Example Forward-chaining Example Forward-chaining Example Backward-chaining Inference Backward-chaining Reasoning Starts with the desired conclusion(s) and works backward to find supporting facts Steps: 1. Start with a possible hypothesis, H. 2. Store the hypothesis H in working memory, along with the available facts. 3. If H is in the initial facts, the hypothesis is proven. Go to Step 7. 4. If H is not in the initial facts, find a rule R that has a descendent (action) part mentioning the hypothesis. 5. Store R in the working memory. 6. Check conditions of R and match with the existing facts. 7. If matched, then fire the rule R and stop. Otherwise, continue to Step 4. Backward-chaining Inference Backward-chaining Inference Backward-chaining Example Backward-chaining Example 2 Forward-Chaining vs. Backward-Chaining • Forward chaining is reasoning from facts to the conclusions resulting from those facts – E.g., if you see that it is raining before leaving from home (the fact) then you should decide to take an umbrella (the conclusion) • Backward chaining involves reasoning in reverse from a hypothesis • From a potential conclusion to be proved, to the facts that support the hypothesis – E.g., if you have not looked outside and someone enters with wet shoes and an umbrella, your hypothesis is that it is raining – In order to support this hypothesis you could ask the person if it was, in fact, raining – If the response is yes, then the hypothesis is proved true and becomes a fact Introduction to Uncertainty Defining Uncertainty • Uncertainty is defined as the lack of the exact knowledge that would enable us to reach a perfectly reliable conclusion. • Information can be incomplete, inconsistent, uncertain, or all three. In other words, information is often unsuitable for solving a problem. • Classical logic permits only exact reasoning. It assumes that perfect knowledge always exists and we deal with the exact facts. Sources of Uncertain Knowledge • Ambiguity • Incompleteness • Incorrectness • False positive (Type 1 error) • False negative (Type 2 error) • Human errors • Machine errors • Measurement errors • Precision • Accuracy • Etc. Dealing with Uncertainty • Classic Probability (Fermat & Pascal 1654) • Bayesian Probability • Hartly Theory (hartly 1928) • Shannon Theory (shannon 1948) • Dempster-Shafer Theory (Shafer 1976) • Fuzzy Theory (Zadeh 1965) Probability Theory • The concept of probability has a long history that goes back thousands of years when words like “probably”, “likely”, “maybe”, “perhaps” and “possibly” were introduced into spoken languages. However, the mathematical theory of probability was formulated only in the 17th century. • The probability of an event is the proportion of cases in which the event occurs. • Probability can also be defined as a scientific measure of chance. Probability Theory • Probability can be expressed mathematically as a numerical index with a range between zero (an absolute impossibility) to unity (an absolute certainty). • Most events have a probability index strictly between 0 and 1, which means that each event has at least two possible outcomes: - favorable outcome or success - unfavorable outcome or failure s P(success ) = p = s+ f f P( failure ) = q = s+ f Conditional Probability • • • Let A be an event in the world and B be another event. Suppose that events A and B are not mutually exclusive, but occur conditionally on the occurrence of the other. The probability that event A will occur if event B occurs is called the conditional probability. Conditional probability is denoted mathematically as p(A|B) – “Conditional probability of event A occurring given that event B has occurred”. the number of times A and B can occur p (A B )= the number of times B can occur Conditional Probability • Joint probability: • Probability that both A and B will occur, is called the joint probability of A and B p(A ∩ B ) p (A B )= p(B ) • Similarly, the conditional probability of event B occurring given that event A has occurred equals p(B ∩ A) p (B A)= p(A) Conditional Probability Hence p(B ∩ A) = p (B A)× p(A) and p(A ∩ B ) = p (B A)× p(A) p(A ∩ B ) Substituting the last equation into the equation p(A B )= p(B ) yields the Bayesian rule. Bayesian Rule p (A B )= p (B A)× p(A) p(B ) where: p(A|B) is the conditional probability that event A occurs given that event B has occurred; p(B|A) is the conditional probability of event B occurring given that event A has occurred; p(A) is the probability of event A occurring; p(B) is the probability of event B occurring. The Joint Probability n n i =1 i =1 ∑ p(A ∩ Bi )= ∑ p(A Bi )× p(Bi ) A B4 B3 B1 B2 The Joint Probability • If the occurrence of event A depends on only two mutually exclusive events, B and NOT B, we obtain: p(A) = p(A|B) × p(B) + p(A|¬B) × p(¬B) where ¬ is the logical function NOT. • Similarly, p(B) = p(B|A) × p(A) + p(B|¬A) × p(¬A) • Substituting this equation into the Bayesian rule yields: p (A B )= p (B A)× p(A) p (B A)× p(A)+ p (B ¬A)× p(¬A) Bayesian Reasoning • Suppose all rules in the knowledge base are represented in the following form: IF THEN E H is true is true {with probability p} • This rule implies that if event E occurs, then the probability that event H will occur is p. • In expert systems, H usually represents a hypothesis and E denotes evidence to support this hypothesis. Bayesian Reasoning The Bayesian rule expressed in terms of hypotheses and evidence looks like this: p (H E )= p (E H )× p(H ) p (E H )× p(H )+ p (E ¬H )× p(¬H ) where: p(H) is the prior probability of hypothesis H being true; p(E|H) is the probability that hypothesis H being true will result in evidence E; p(¬H) is the prior probability of hypothesis H being false; p(E|¬H) is the probability of finding evidence E even when hypothesis H is false. Bayesian Reasoning • In expert systems, the probabilities required to solve a problem are provided by experts. • An expert determines the prior probabilities for possible hypotheses p(H) and p(¬H), and also the conditional probabilities for observing evidence E if hypothesis H is true, p(E|H), and if hypothesis H is false, p(E|¬H). • Users provide information about the evidence observed and the expert system computes p(H|E) for hypothesis H in light of the user-supplied evidence E. • Probability p(H|E) is called the posterior probability of hypothesis H upon observing evidence E. Bayesian Reasoning • We can take into account both multiple hypotheses H1, H2,..., Hm and multiple evidences E1, E2,..., En. (The hypotheses as well as the evidences must be mutually exclusive and exhaustive) • Single evidence E and multiple hypotheses follow: p (H i E )= p (E H i )× p(H i ) m ∑ p(E H k )× p(H k ) k =1 • Multiple evidences and multiple hypotheses follow: p (H i E1 E2 . . . En )= p (E1 E2 . . . En H i )× p(H i ) m ∑ p(E1 E2 . . . En H k )× p(H k ) k =1 Bayesian Reasoning • This requires to obtain the conditional probabilities of all possible combinations of evidences for all hypotheses, and thus places an enormous burden on the expert. • Therefore, in expert systems, conditional independence among different evidences assumed. Thus, instead of the unworkable equation, we attain: p (H i E1 E2 . . . En )= p (E1 H i )× p (E2 H i )× . . . × p (En H i )× p(H i ) m ∑ p(E1 H k )× p(E2 H k )× . . . × p(En H k )× p(H k ) k =1 Ranking Potentially True Hypotheses • Let us consider a simple example: – Suppose an expert, given three conditionally independent evidences E1, E2,..., En, creates three mutually exclusive and exhaustive hypotheses H1, H2,..., Hm, and provides prior probabilities for these hypotheses – p(H1), p(H2) and p(H3), respectively. The expert also determines the conditional probabilities of observing each evidence for all possible hypotheses. The Prior and Conditional Probabilities Probability Hypothesis i =1 i =2 i =3 p (H i ) 0.40 0.35 0.25 p (E1 H i ) 0.3 0.8 0.5 p (E2 H i ) 0.9 0.0 0.7 p (E3 H i ) 0.6 0.7 0.9 Assume that we first observe evidence E3. The expert system computes the posterior probabilities for all hypotheses as: The Prior and Conditional Probabilities p (H i E3 )= p (E3 H i )× p(H i ) 3 , i = 1, 2, 3 ∑ p(E3 H k )× p(H k ) thus k =1 0.6 ⋅ 0.40 p (H1 E3 )= = 0.34 0.6 ⋅ 0.40 + 0.7 ⋅ 0.35 + 0.9 ⋅ 0.25 0.7 ⋅ 0.35 p (H 2 E3 )= = 0.34 0.6 ⋅ 0.40 + 0.7 ⋅ 0.35 + 0.9 ⋅ 0.25 0.9 ⋅ 0.25 p (H 3 E3 )= = 0.32 0.6 ⋅ 0.40 + 0.7 ⋅ 0.35 + 0.9 ⋅ 0.25 After evidence E3 is observed, belief in hypothesis H2 increases and becomes equal to belief in hypothesis H1. Belief in hypothesis H3 also increases and even nearly reaches beliefs in hypotheses H1 and H2. The Prior and Conditional Probabilities Suppose now that we observe evidence E1. The posterior probabilities are calculated as p (H i E1E3 )= 3 hence p (E1 H i )× p (E3 H i )× p(H i ) , i = 1, 2, 3 ∑ p(E1 H k )× p(E3 H k )× p(H k ) k =1 0.3 ⋅ 0.6 ⋅ 0.40 p (H1 E1E3 )= = 0.19 0.3 ⋅ 0.6 ⋅ 0.40 + 0.8 ⋅ 0.7 ⋅ 0.35 + 0.5 ⋅ 0.9 ⋅ 0.25 0.8 ⋅ 0.7 ⋅ 0.35 p (H 2 E1E3 )= = 0.52 0.3 ⋅ 0.6 ⋅ 0.40 + 0.8 ⋅ 0.7 ⋅ 0.35 + 0.5 ⋅ 0.9 ⋅ 0.25 0.5 ⋅ 0.9 ⋅ 0.25 p (H 3 E1E3 )= = 0.29 0.3 ⋅ 0.6 ⋅ 0.40 + 0.8 ⋅ 0.7 ⋅ 0.35 + 0.5 ⋅ 0.9 ⋅ 0.25 Hypothesis H2 has now become the most likely one. The Prior and Conditional Probabilities After observing evidence E2, the final posterior probabilities for all hypotheses are calculated: p (H i E1E2 E3 )= 3 hence p (E1 H i )× p (E2 H i )× p (E3 H i )× p(H i ) , i = 1, 2, 3 ∑ p(E1 H k )× p(E2 H k )× p(E3 H k )× p(H k ) k =1 0.3 ⋅ 0.9 ⋅ 0.6 ⋅ 0.40 p (H1 E1E2 E3 )= = 0.45 0.3 ⋅ 0.9 ⋅ 0.6 ⋅ 0.40 + 0.8 ⋅ 0.0 ⋅ 0.7 ⋅ 0.35 + 0.5 ⋅ 0.7 ⋅ 0.9 ⋅ 0.25 0.8 ⋅ 0.0 ⋅ 0.7 ⋅ 0.35 p (H 2 E1E2 E3 )= =0 0.3 ⋅ 0.9 ⋅ 0.6 ⋅ 0.40 + 0.8 ⋅ 0.0 ⋅ 0.7 ⋅ 0.35 + 0.5 ⋅ 0.7 ⋅ 0.9 ⋅ 0.25 0.5 ⋅ 0.7 ⋅ 0.9 ⋅ 0.25 p (H 3 E1E2 E3 )= = 0.55 0.3 ⋅ 0.9 ⋅ 0.6 ⋅ 0.40 + 0.8 ⋅ 0.0 ⋅ 0.7 ⋅ 0.35 + 0.5 ⋅ 0.7 ⋅ 0.9 ⋅ 0.25 Although the initial ranking was H1, H2 and H3, only hypotheses H1 and H3 remain under consideration after all evidences (E1, E2 and E3) were observed. Exercise • From which bowl is the cookie? To illustrate, suppose there are two bowls full of cookies. Bowl #1 has 10 chocolate chip and 30 plain cookies, while bowl #2 has 20 of each. Our friend Fred picks a bowl at random, and then picks a cookie at random. We may assume there is no reason to believe Fred treats one bowl differently from another, likewise for the cookies. The cookie turns out to be a plain one. How probable is it that Fred picked it out of bowl #1? (taken from wikipedia.org) Solution • Let H1 correspond to bowl #1, and H2 to bowl #2. It is given that the bowls are identical from Fred's point of view, thus P(H1) = P(H2), and the two must add up to 1, so both are equal to 0.5. • D is the observation of a plain cookie. From the contents of the bowls, we know that P(D | H1) = 30/40 = 0.75 and P(D | H2) = 20/40 = 0.5. Bayes' formula then yields • Before observing the cookie, the probability that Fred chose bowl #1 is the prior probability, P(H1), which is 0.5. After observing the cookie, we revise the probability to P(H1|D), which is 0.6. Certainty Factor • A certainty factor (cf), a number to measure the expert’s belief. • The maximum value of the certainty factor is, say, +1.0 (definitely true) and the minimum –1.0 (definitely false). For example, if the expert states that some evidence is almost certainly true, a cf value of 0.8 would be assigned to this evidence. • Certainty factors theory is a popular alternative to Bayesian reasoning. Uncertain Terms Term Certainty Factor _ Definitely not 1.0 _ Almost certainly not 0.8 _ Probably not 0.6 _ Maybe not 0.4 _ Unknown 0.2 to +0.2 Maybe +0.4 Probably +0.6 Almost certainly +0.8 Definitely +1.0 Certainty Factor • In expert systems with certainty factors, the knowledge base consists of a set of rules that have the following syntax: IF THEN <evidence> <hypothesis> {cf } where cf represents belief in hypothesis H, given that evidence E has occurred. Certainty Factor • The certainty factors theory is based on two functions: • Measure of belief MB(H,E) • Measure of disbelief MD(H,E) $ 1 !! MB (H, E) = # max [ p(H |E), p(H ) ] − p(H ) !! max [1, 0] − p(H ) " if p(H ) = 1 $ 1 !! MD (H, E) = # min [ p(H |E), p(H ) ] − p(H ) !! min [1, 0] − p(H ) " if p(H ) = 0 otherwise otherwise Certainty Factor • The values of MB(H, E) and MD(H, E) range between 0 and 1. • The strength of belief or disbelief in hypothesis H depends on the kind of evidence E observed. • Some facts may increase the strength of belief, but some increase the strength of disbelief. • The total strength of belief or disbelief in a hypothesis: MB(H, E )− MD(H, E ) cf = 1 - min[MB(H, E ), MD(H, E )] Certainty Factor Example: Consider a simple rule: IF THEN A is X B is Y • An expert may not be absolutely certain that this rule holds. • Also, suppose it has been observed that in some cases, even when the IF part of the rule is satisfied and object A takes on value X, object B can acquire some different value like Z. IF THEN A is X B is Y {cf 0.7}; B is Z {cf 0.2} Certainty Factor • The certainty factor assigned by a rule is propagated through the reasoning chain. • This involves establishing the net certainty of the rule consequent when the evidence in the rule antecedent is uncertain: cf (H,E) = cf (E) * cf(H) For example: IF sky is clear THEN the forecast is sunny {cf 0.8} and the current certainty factor of sky is clear is 0.5, then cf (H,E) = 0.5 * 0.8 = 0.4 This result can be interpreted as "It may be sunny". Certainty Factor • For conjunctive rules such as: <evidence E > 1 .. . AND <evidence En> THEN <hypothesis H> {cf } IF the certainty of hypothesis H, is established as follows: cf (H, E1 ∩ E2 ∩... ∩ En) = min [cf (E1), cf (E2),..., cf (En)] * cf • For example: IF sky AND the forecast THEN the action is clear is sunny is 'wear sunglasses' {cf 0.8} and the certainty of sky is clear is 0.9 and the certainty of the forecast of sunny is 0.7, then cf (H, E1 ∩ E2) = min [0.9, 0.7] * 0.8 = 0.7 * 0.8 = 0.56 Certainty Factor • For disjunctive rules such as: <evidence E > 1 .. . <evidence E > OR n THEN <hypothesis H> {cf } IF the certainty of hypothesis H, is established as follows: cf (H, E1 ∪ E2 ∪... ∪ En) = max [cf (E1), cf (E2),..., cf (En)] * cf • For example: IF sky OR the forecast THEN the action is overcast is rain is 'take an umbrella' {cf 0.9} and the certainty of sky is overcast is 0.6 and the certainty of the forecast of rain is 0.8, then cf (H, E1 ∪ E2) = max [0.6, 0.8] x 0.9 = 0.8 * 0.9 = 0.72 Certainty Factor • When the same consequent is obtained as a result of the execution of two or more rules, the individual certainty factors of these rules must be merged to give a combined certainty factor for a hypothesis. • Suppose the knowledge base consists of the following rules: Rule 1: IF THEN A is X C is Z {cf 0.8} Rule 2: IF THEN B is Y C is Z {cf 0.6} What certainty should be assigned to object C having value Z if both Rule 1 and Rule 2 are fired? Certainty Factor • Common sense suggests that, if we have two pieces of evidence (A is X and B is Y) from different sources (Rule 1 and Rule 2) supporting the same hypothesis (C is Z), then the confidence in this hypothesis should increase and become stronger than if only one piece of evidence had been obtained. Certainty Factor • To calculate a combined certainty factor we can use the following equation: $ cf1 + cf2 × (1 − cf1) if cf1 > 0 and cf2 > 0 ! !! cf1 + cf2 if cf1 < 0 or cf2 < 0 cf (cf1, cf2) = # ! 1 − min [|cf |, |cf |] 1 2 ! !" cf + cf × (1 + cf ) if cf < 0 and cf < 0 1 2 1 1 2 where: cf1 is the confidence in hypothesis H established by Rule 1; cf2 is the confidence in hypothesis H established by Rule 2; |cf1| and |cf2| are absolute magnitudes of cf1 and cf2, respectively. Certainty Factor • The certainty factors theory provides a practical alternative to Bayesian reasoning. • The heuristic manner of combining certainty factors is different from the manner in which they would be combined if they were probabilities. • The certainty theory is not “mathematically pure” but does mimic the thinking process of a human expert. Bayesian Reasoning Vs Certainty Factors • Probability theory is the oldest and best-established technique to deal with inexact knowledge and random data. • It works well in such areas as forecasting and planning, where statistical data is usually available and accurate probability statements can be made. • However, in many areas of possible applications of expert systems, reliable statistical information is not available or we cannot assume the conditional independence of evidence. As a result, many researchers have found the Bayesian method unsuitable for their work. This dissatisfaction motivated the development of the certainty factors theory. Bayesian Reasoning Vs Certainty Factors • Although the certainty factors approach lacks the mathematical correctness of the probability theory, it outperforms subjective Bayesian reasoning in such areas as diagnostics. • Certainty factors are used in cases where the probabilities are not known or are too difficult or expensive to obtain. • The certainty factors approach also provides better explanations of the control flow through a rule-based expert system. Bayesian Reasoning Vs Certainty Factors • The Bayesian method is likely to be the most appropriate if reliable statistical data exists, the knowledge engineer is able to lead, and the expert is available for serious decision-analytical conversations. • In the absence of any of the specified conditions, the Bayesian approach might be too arbitrary and even biased to produce meaningful results. • The Bayesian belief propagation is of exponential complexity, and thus is impractical for large knowledge bases. Introduction to Fuzzy Logic Definition • Experts rely on common sense when they solve problems. • How can we represent expert knowledge that uses vague and ambiguous terms in a computer? • Fuzzy logic is not logic that is fuzzy, but logic that is used to describe fuzziness. Fuzzy logic is the theory of fuzzy sets, sets that calibrate vagueness. • Fuzzy logic is based on the idea that all things admit of degrees. Temperature, height, speed, distance, beauty – all come on a sliding scale. – The motor is running really hot. – Tom is a very tall guy. Definition • How is one to represent notions like: – – – – large profit high pressure tall man moderate temperature • The principal notion underlying set theory, that an element can (exclusively) either belong to set or not belong to a set, makes it well nigh impossible to represent much of human discourse. Definition • People succeed by using knowledge that is imprecise rather than precise in many decision-making and problem-solving tasks that are too complex to be understood quantitatively • Fuzzy set theory resembles human reasoning in its use of approximate information and uncertainty to generate decisions. • Fuzzy set theory was specifically designed to mathematically represent uncertainty and vagueness. Definition • Boolean logic uses sharp distinctions. It forces us to draw lines between members of a class and non-members. • For instance, we may say, Tom is tall because his height is 181 cm. If we drew a line at 180 cm, we would find that David, who is 179 cm, is small. • Is David really a small man or we have just drawn an arbitrary line in the sand? Bit of History • Fuzzy, or multi-valued logic, was introduced in the 1930s by Jan Lukasiewicz, a Polish philosopher. While classical logic operates with only two values 1 (true) and 0 (false), Lukasiewicz introduced logic that extended the range of truth values to all real numbers in the interval between 0 and 1. • For example, the possibility that a man 181 cm tall is really tall might be set to a value of 0.86. It is likely that the man is tall. This work led to an inexact reasoning technique often called possibility theory. • In 1965 Lotfi Zadeh, published his famous paper “Fuzzy sets”. Zadeh extended the work on possibility theory into a formal system of mathematical logic, and introduced a new concept for applying natural language terms. This new logic for representing and manipulating fuzzy terms was called fuzzy logic. The Term “Fuzzy Logic” • The term fuzzy logic is used in two senses: – Narrow sense: Fuzzy logic is a branch of fuzzy set theory, which deals (as logical systems do) with the representation and inference from knowledge. Fuzzy logic, unlike other logical systems, deals with imprecise or uncertain knowledge. In this narrow, and perhaps correct sense, fuzzy logic is just one of the branches of fuzzy set theory. – Broad Sense: fuzzy logic synonymously with fuzzy set theory Fuzzy Applications • Theory of fuzzy sets and fuzzy logic has been applied to problems in a variety of fields: – taxonomy; topology; linguistics; logic; automata theory; game theory; pattern recognition; medicine; law; decision support; Information retrieval; etc. • And more recently fuzzy machines have been developed including: – automatic train control; tunnel digging machinery; washing machines; rice cookers; vacuum cleaners; air conditioners, etc. Fuzzy Applications • Extraklasse Washing Machine - 1200 rpm. The Extraklasse machine has a number of features which will make life easier for you. • Fuzzy Logic detects the type and amount of laundry in the drum and allows only as much water to enter the machine as is really needed for the loaded amount. And less water will heat up quicker - which means less energy consumption. • Foam detection Too much foam is compensated by an additional rinse cycle: If Fuzzy Logic detects the formation of too much foam in the rinsing spin cycle, it simply activates an additional rinse cycle. Fantastic! • Imbalance compensation In the event of imbalance, Fuzzy Logic immediately calculates the maximum possible speed, sets this speed and starts spinning. This provides optimum utilization of the spinning time at full speed […] • Washing without wasting - with automatic water level adjustment More Definitions • Fuzzy logic is a set of mathematical principles for knowledge representation based on degrees of membership. • Unlike two-valued Boolean logic, fuzzy logic is multi-valued. It deals with degrees of membership and degrees of truth. • Fuzzy logic uses the continuum of logical values between 0 (completely false) and 1 (completely true). Instead of just black and white, it employs the spectrum of colors, accepting that things can be partly true and partly false at the same time. 0 0 0 1 1 (a) Boolean Logic. 1 0 0 0.2 0.4 0.6 0.8 1 1 (b) Multi-valued Logic. Fuzzy Sets • The concept of a set is fundamental to mathematics. • However, our own language is also the supreme expression of sets. For example, car indicates the set of cars. When we say a car, we mean one out of the set of cars. • The classical example in fuzzy sets is tall men. The elements of the fuzzy set “tall men” are all men, but their degrees of membership depend on their height. Fuzzy Sets Degree of Membership Crisp Fuzzy Name Height, cm Chris 208 1 1.00 Mark John 205 198 1 1 1.00 0.98 Tom David 181 179 1 0 0.82 0.78 Mike 172 0 0.24 Bob Steven 167 158 0 0 0.15 0.06 Bill Peter 155 152 0 0 0.01 0.00 Crisp Vs Fuzzy Sets The x-axis represents the universe of discourse – the range of all possible values applicable to a chosen variable. In our case, the variable is the man height. According to this representation, the universe of men’s heights consists of all tall men. The y-axis represents the membership value of the fuzzy set. In our case, the fuzzy set of “tall men” maps height values into corresponding membership values. Degree of Membership 1.0 Crisp Sets 0.8 Tall Men 0.6 0.4 0.2 0.0 150 160 170 Degree of Membership 1.0 180 190 200 210 Height, cm 190 200 210 Fuzzy Sets 0.8 0.6 0.4 0.2 0.0 150 160 170 180 Height, cm A Fuzzy Set has Fuzzy Boundaries • Let X be the universe of discourse and its elements be denoted as x. In the classical set theory, crisp set A of X is defined as function fA(x) called the characteristic function of A: fA(x) : X à {0, 1}, where #1, if x ∈ A f A ( x) = " !0, if x ∉ A • This set maps universe X to a set of two elements. • For any element x of universe X, characteristic function fA(x) is equal to 1 if x is an element of set A, and is equal to 0 if x is not an element of A. A Fuzzy Set has Fuzzy Boundaries • In the fuzzy theory, fuzzy set A of universe X is defined by function µA(x) called the membership function of set A µA(x) : X à {0, 1}, where µA(x) = 1 if x is totally in A; µA(x) = 0 if x is not in A; 0 < µA(x) < 1 if x is partly in A. • This set allows a continuum of possible choices. For any element x of universe X, membership function µA(x) equals the degree to which x is an element of set A. • This degree, a value between 0 and 1, represents the degree of membership, also called membership value, of element x in set A. Fuzzy Set Representation • • First, we determine the membership functions. In our “tall men” example, we can obtain fuzzy sets of tall, short and average men. Degree of Membership 1.0 The universe of discourse – the men’s heights – consists of three sets: Short Short Tall Average Tall Men short, average and tall men. As you 0.8will see, a man who is 184 cm tall is 0.6 a member of the average men set with a degree of membership of 0.1, 0.4 and at the same time, he is also a 0.2member of the tall men set with a degree of 0.4. 0.0 150 Degree of Membership 1.0 Average Short Tall Tall Men 0.6 0.4 0.4 0.2 0.2 0.0 0.0 Degree of Membership 160 170 180 Fuzzy Sets 170 190 200 180 190 200 210 Height, cm Fuzzy Sets 0.8 0.6 150 160 Degree of Membership 1.0 Crisp Sets Short 0.8 Crisp Sets 210 Height, cm Short Tall Average Tall 150 160 170 180 190 200 210 Fuzzy Set Representation • Typical functions that can be used to represent a fuzzy set: • sigmoid • gaussian • pi. • However, these functions increase the time of computation. Therefore, in practice, most applications use linear fit functions. µ (x) X Fuzzy Subset A 1 0 Crisp Subset A Fuzziness Fuzziness x Linguistic Variables and Hedges • A linguistic variable is a fuzzy variable. For example, the statement “John is tall” implies that the linguistic variable John takes the linguistic value tall • In fuzzy expert systems, linguistic variables are used in fuzzy rules. For example: IF wind THEN sailing is strong is good IF project_duration THEN completion_risk is long is high IF speed is slow THEN stopping_distance is short Linguistic Variables and Hedges • The range of possible values of a linguistic variable represents the universe of discourse of that variable. For example, the universe of discourse of the linguistic variable speed might have the range between 0 and 220 km/h and may include such fuzzy subsets as very slow, slow, medium, fast, and very fast. • A linguistic variable carries with it the concept of fuzzy set qualifiers, called hedges. • Hedges are terms that modify the shape of fuzzy sets. They include adverbs such as very, somewhat, quite, more or less and slightly. Linguistic Variables and Hedges Degree of Membership 1.0 Short 0.8 Short Tall Average 0.6 0.4 0.2 Very Short Very Very Tall Tall Tall 0.0 150 160 170 180 190 200 210 Height, cm Linguistic Variables and Hedges Hedge Mathematical Expression A little [µA ( x )]1.3 Slightly [µA ( x )]1.7 Very [µA ( x )]2 Extremely [µA ( x )]3 Graphical Representation Linguistic Variables and Hedges Hedge Very very Mathematical Expression [µA ( x )]4 More or less µ (x) A Somewhat µ (x) A 2 [µA ( x )]2 Indeed if 0 ≤ µA ≤ 0.5 1 − 2 [1 − µA ( x )]2 if 0.5 < µA ≤ 1 Graphical Representation Characteristics of Fuzzy Sets • The classical set theory developed in the late 19th century by Georg Cantor describes how crisp sets can interact. These interactions are called operations. • Also fuzzy sets have well defined properties. • These properties and operations are the basis on which the fuzzy sets are used to deal with uncertainty on the one hand and to represent knowledge on the other. Operations Not A B A AA Complement Containment A B Intersection AA B Union Complement • • Crisp Sets: Who does not belong to the set? Fuzzy Sets: How much do elements not belong to the set? • The complement of a set is an opposite of this set. For example, if we have the set of tall men, its complement is the set of NOT tall men. When we remove the tall men set from the universe of discourse, we obtain the complement. • If A is the fuzzy set, its complement ¬A can be found as follows: µ¬A(x) = 1 − µA(x) Containment • • Crisp Sets: Which sets belong to which other sets? Fuzzy Sets: Which sets belong to other sets? • Similar to a Chinese box, a set can contain other sets. The smaller set is called the subset. For example, the set of tall men contains all tall men; very tall men is a subset of tall men. However, the tall men set is just a subset of the set of men. In crisp sets, all elements of a subset entirely belong to a larger set. In fuzzy sets, however, each element can belong less to the subset than to the larger set. Elements of the fuzzy subset have smaller memberships in it than in the larger set. Intersection • • Crisp Sets: Which element belongs to both sets? Fuzzy Sets: How much of the element is in both sets? • In classical set theory, an intersection between two sets contains the elements shared by these sets. For example, the intersection of the set of tall men and the set of fat men is the area where these sets overlap. In fuzzy sets, an element may partly belong to both sets with different memberships. • A fuzzy intersection is the lower membership in both sets of each element. The fuzzy intersection of two fuzzy sets A and B on universe of discourse X: µA∩B(x) = min [µA(x), µB(x)] = µA(x) ∩ µB(x), where x∈X Union • • Crisp Sets: Which element belongs to either set? Fuzzy Sets: How much of the element is in either set? • The union of two crisp sets consists of every element that falls into either set. For example, the union of tall men and fat men contains all men who are tall OR fat. • In fuzzy sets, the union is the reverse of the intersection. That is, the union is the largest membership value of the element in either set. The fuzzy operation for forming the union of two fuzzy sets A and B on universe X can be given as: µA∪B(x) = max [µA(x), µB(x)] = µA(x) ∪ µB(x), where x∈X Operations of Fuzzy Sets µ(x) µ(x) 1 1 B A A 0 1 x 0 1 Not A 0 Complement x 0 Containment µ(x) µ(x) 1 1 A B 0 1 B A x A∩B 0 Intersection A x x B 0 x 1 x 0 A∪B Union x Equality • Fuzzy set A is considered equal to a fuzzy set B, IF AND ONLY IF (iff): µA(x) = µB(x), ∀x∈X A = 0.3/1 + 0.5/2 + 1/3 B = 0.3/1 + 0.5/2 + 1/3 therefore A = B Inclusion • Inclusion of one fuzzy set into another fuzzy set. Fuzzy set A ⊆ X is included in (is a subset of) another fuzzy set, B ⊆ X: µA(x) ≤ µB(x), ∀x∈X Consider X = {1, 2, 3} and sets A and B A = 0.3/1 + 0.5/2 + 1/3; B = 0.5/1 + 0.55/2 + 1/3 then A is a subset of B, or A ⊆ B Cardinality • Cardinality of a non-fuzzy set, Z, is the number of elements in Z. BUT the cardinality of a fuzzy set A, the so-called SIGMA COUNT, is expressed as a SUM of the values of the membership function of A, µA(x): cardA = µA(x1) + µA(x2) + … µA(xn) = ΣµA(xi), Consider X = {1, 2, 3} and sets A and B A = 0.3/1 + 0.5/2 + 1/3; B = 0.5/1 + 0.55/2 + 1/3 cardA = 1.8 cardB = 2.05 for i=1..n Empty Fuzzy Set • A fuzzy set A is empty, IF AND ONLY IF: µA(x) = 0, ∀x∈X Consider X = {1, 2, 3} and set A A = 0/1 + 0/2 + 0/3 then A is empty Fuzzy Rules • In 1973, Lotfi Zadeh published his second most influential paper. This paper outlined a new approach to analysis of complex systems, in which Zadeh suggested capturing human knowledge in fuzzy rules. • A fuzzy rule can be defined as a conditional statement in the form: IF THEN • x y is A is B where x and y are linguistic variables; and A and B are linguistic values determined by fuzzy sets on the universe of discourses X and Y, respectively. Classical Vs Fuzzy Rules • A classical IF-THEN rule uses binary logic, for example, Rule: 1 IF speed is > 100 THEN stopping_distance is 100 Rule: 2 IF speed is < 40 THEN stopping_distance is 20 • The variable speed can have any numerical value between 0 and 220 km/ h, but the linguistic variable stopping_distance can take either value long or short. In other words, classical rules are expressed in the black-andwhite language of Boolean logic. Classical Vs Fuzzy Rules • We can also represent the stopping distance rules in a fuzzy form: Rule: 1 IF speed is fast THEN stopping_distance is long Rule: 2 IF speed is slow THEN stopping_distance is short • In fuzzy rules, the linguistic variable speed also has the range (the universe of discourse) between 0 and 220 km/h, but this range includes fuzzy sets, such as slow, medium and fast. The universe of discourse of the linguistic variable stopping_distance can be between 0 and 300 m and may include such fuzzy sets as short, medium and long. Classical Vs Fuzzy Rules • Fuzzy rules relate fuzzy sets. • In a fuzzy system, all rules fire to some extent, or in other words they fire partially. If the antecedent is true to some degree of membership, then the consequent is also true to that same degree. Firing Fuzzy Rules • These fuzzy sets provide the basis for a weight estimation model. The model is based on a relationship between a man’s height and his weight: IF height is tall THEN weight is heavy Degree of Membership 1.0 0.8 Degree of Membership 1.0 Heavy men 0.8 Tall men 0.6 0.6 0.4 0.4 0.2 0.2 0.0 160 0.0 180 190 200 Height, cm 70 80 100 120 Weight, kg Firing Fuzzy Rules • The value of the output or a truth membership grade of the rule consequent can be estimated directly from a corresponding truth membership grade in the antecedent. This form of fuzzy inference uses a method called monotonic selection. Degree of Membership 1.0 Degree of Membership 1.0 Tall men 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 160 180 190 200 Height, cm Heavy men 70 80 100 120 Weight, kg Firing Fuzzy Rules • A fuzzy rule can have multiple antecedents, for example: IF AND AND THEN project_duration is long project_staffing is large project_funding is inadequate risk is high IF service is excellent OR food is delicious THEN tip is generous • The consequent of a fuzzy rule can also include multiple parts, for instance: IF temperature is hot THEN hot_water is reduced; cold_water is increased Fuzzy Sets Example • Air-conditioning involves the delivery of air which can be warmed or cooled and have its humidity raised or lowered. • An air-conditioner is an apparatus for controlling, especially lowering, the temperature and humidity of an enclosed space. An air-conditioner typically has a fan which blows/cools/circulates fresh air and has cooler and the cooler is under thermostatic control. Generally, the amount of air being compressed is proportional to the ambient temperature. • Consider Johnny’s air-conditioner which has five control switches: COLD, COOL, PLEASANT, WARM and HOT. The corresponding speeds of the motor controlling the fan on the air-conditioner has the graduations: MINIMAL, SLOW, MEDIUM, FAST and BLAST. Fuzzy Sets Example • The rules governing the air-conditioner are as follows: RULE 1: IF RULE 2: IF RULE 3: IF RULE 4: IF RULE 5: IF TEMP is COLD THEN SPEED is MINIMAL TEMP is COOL THEN SPEED is SLOW TEMP is PLEASANT THEN SPEED is MEDIUM TEMP is WARM TEMP is HOT THEN SPEED is FAST THEN SPEED is BLAST Fuzzy Sets Example The temperature graduations are related to Johnny’s perception of ambient temperatures. where: Y : temp value belongs to the set (0<µA(x)<1) Y* : temp value is the ideal member to the set (µA(x)=1) N : temp value is not a member of the set (µA(x)=0) Temp (0C). COLD COOL PLEASANT WARM HOT 0 Y* N N N N 5 Y Y N N N 10 N Y N N N 12.5 N Y* N N N 15 N Y N N N 17.5 N N Y* N N 20 N N N Y N 22.5 N N N Y* N 25 N N N Y N 27.5 N N N N Y 30 N N N N Y* Fuzzy Sets Example Johnny’s perception of the speed of the motor is as follows: where: Y : temp value belongs to the set (0<µA(x)<1) Y* : temp value is the ideal member to the set (µA(x)=1) N : temp value is not a member of the set (µA(x)=0) Rev/sec (RPM) MINIMAL SLOW MEDIUM FAST BLAST 0 Y* N N N N 10 Y N N N N 20 Y Y N N N 30 N Y* N N N 40 N Y N N N 50 N N Y* N N 60 N N N Y N 70 N N N Y* N 80 N N N Y Y 90 N N N N Y 100 N N N N Y* Fuzzy Sets Example • The analytically expressed membership for the reference fuzzy subsets for the temperature are: • COLD: for 0 ≤ t ≤ 10 µCOLD(t) = – t / 10 + 1 • SLOW: for 0 ≤ t ≤ 12.5 for 12.5 ≤ t ≤ 17.5 µSLOW(t) = t / 12.5 µSLOW(t) = – t / 5 + 3.5 • etc… all based on the linear equation: y = ax + b Fuzzy Sets Example Fuzzy Sets Example • The analytically expressed membership for the reference fuzzy subsets for the temperature are: • MINIMAL: for 0 ≤ v ≤ 30 µCOLD(t) = – v / 30 + 1 • SLOW: for 10 ≤ v ≤ 30 for 30 ≤ v ≤ 50 µSLOW(t) = v / 20 – 0.5 µSLOW(t) = – v / 20 + 2.5 • etc… all based on the linear equation: y = ax + b Fuzzy Sets Example Speed Fuzzy Sets Truth Value 1 MINIMAL SLOW MEDIUM FAST BLAST 0.8 0.6 0.4 0.2 0 0 10 20 30 40 50 Speed 60 70 80 90 100 Exercises For A = {0.2/a, 0.4/b, 1/c, 0.8/d, 0/e} B = {0/a, 0.9/b, 0.3/c, 0.2/d, 0.1/e} Draw the Fuzzy Graph of A and B Then, calculate the following: - Support, Core, Cardinality, and Complement for A and B independently - Union and Intersection of A and B - the new set C, if C = A2 - the new set D, if D = 0.5´B - the new set E, for an alpha cut at A0.5 Solutions A = {0.2/a, 0.4/b, 1/c, 0.8/d, 0/e} B = {0/a, 0.9/b, 0.3/c, 0.2/d, 0.1/e} Support Supp(A) = {a, b, c, d} Supp(B) = {b, c, d, e} Core Core(A) = {c} Core(B) = {} Cardinality Card(A) = 0.2 + 0.4 + 1 + 0.8 + 0 = 2.4 Card(B) = 0 + 0.9 + 0.3 + 0.2 + 0.1 = 1.5 Complement Comp(A) = {0.8/a, 0.6/b, 0/c, 0.2/d, 1/e} Comp(B) = {1/a, 0.1/b, 0.7/c, 0.8/d, 0.9/e} Solutions A = {0.2/a, 0.4/b, 1/c, 0.8/d, 0/e} B = {0/a, 0.9/b, 0.3/c, 0.2/d, 0.1/e} Union AÈB = {0.2/a, 0.9/b, 1/c, 0.8/d, 0.1/e} Intersection AÇB = {0/a, 0.4/b, 0.3/c, 0.2/d, 0/e} C=A2 C = {0.04/a, 0.16/b, 1/c, 0.64/d, 0/e} D = 0.5´B D = {0/a, 0.45/b, 0.15/c, 0.1/d, 0.05/e} E = A0.5 E = {c, d}