Analyzing Algorithms Growth of Functions
Transcription
Analyzing Algorithms Growth of Functions
CSE 5350/7350 Introduction to Algorithms Analyzing Algorithms Growth of Functions (Chapters 1 & 2) Mihaela Iridon, Ph.D. mihaela@engr.smu.edu CSE 5350 - Fall 2007 Analyzing Algorithms Growth of Functions Slide 1 What means analyzing algorithms? • Predicting the required resources • What do we measure? – – – – Computational time Memory Communication bandwidth Other • Model constructs: – – – – Technology Resources (hardware & software) Associated costs Assumptions (1 processor, RAM-model: sequential operations) CSE 5350 - Fall 2007 Analyzing Algorithms Growth of Functions Slide 2 Tools used for algorithm analysis • Mathematical tools: – Discrete combinatorics – Probability theory – Ability to single out the predominant operations (most significant terms in a formula) • Modeling and simulation tools • Software utilities, benchmarking programs CSE 5350 - Fall 2007 Analyzing Algorithms Growth of Functions Slide 3 Terminology • Input Size – In general the time to execute a set of operations is dependent on the size of the input – Depends on the problem definition – = the number of items in the input – Could be more than one number (e.g. a graph) • Running Time – = the number of primitive operations (steps) executed – Machine-independent term CSE 5350 - Fall 2007 Analyzing Algorithms Growth of Functions Slide 4 Example (1) – Insertion Sort (In Place) /// <summary> /// Sorts the input list of integers by using the Insertion Sorting algorithm /// (see Cormen textbook, Chapter 1.1) /// </summary> /// <param name="input">Input list of integers (to be sorted) – input list will be modified</param> public static void InsertionSort(List<int> input) { if (input == null) return; //NULL input if (input.Count < 2) return; //one-element array; nothing to sort int i=0, key=0; for (int j = 1; j < input.Count; j++) { key = input[j]; } } Cost C1 C2 //insert input[j] into the sorted sequence input[0..j-1] 0 i = j-1; C4 while (i > -1 && input[i] > key) C5 { input[i+1] = input[i]; C6 i--; C7 } input[i+1] = key; C8 CSE 5350 - Fall 2007 Analyzing Algorithms Growth of Functions # of Times executed n n-1 n-1 n-1 Σ(j=1..n-1)tj n 1 t j 1 Σ(j=1..n-1)(tj – 1) Σ(j=1..n-1)(tj – 1) n-1 Slide 5 j Example (1) – Insertion Sort (In Place) • The running time depends on the input value – If the input is already sorted then the body of the while loop does not execute and the best case scenario/running time for insertion sort is: T(n) = c1 n + c2 (n-1) + c4 (n-1) + c5 (n-1) + c8 (n-1) = (c1 + c2 + c4 + c5 + c8) * n – (c2 + c4 + c5 + c8) = a * n + b linear function of n – If the input is in reverse sorted order: (worst case scenario) T(n) = c1 n + c2 (n-1) + c4 (n-1) + c5 (n(n+1)/2 - 1) + c6 [n(n-1)/2] + c7 [n(n-1)/2] + c8 (n-1) = a * n2 + b * n + c quadratic function of n CSE 5350 - Fall 2007 Analyzing Algorithms Growth of Functions Slide 6 Worst-case & Average-case Analysis • Worst-case running time: – The longest running time for any input of size n (i.e. the longest path in the execution) – Upper bound on the running time for any input – Occurs fairly often – The average-case ~ the worst-case • Average-case running time: – Difficult to define what average input means – Example for Insertion Sort: On average, half the elements in an array A1 ... Aj-1 are less than an element Aj, and half are greater. CSE 5350 - Fall 2007 Analyzing Algorithms Growth of Functions Slide 7 Order/Rate of Growth • Simplifying abstraction • Consider only the leading term • Ignore the leading term’s constant coefficient • Worst-case running time for Insertion Sort is Θ(n2) (theta of n-squared) • An algorithm with Θ(n2) will run faster than one with Θ(n3) CSE 5350 - Fall 2007 Analyzing Algorithms Growth of Functions Slide 8 Divide-and-Conquer Algorithms • Incremental approach (e.g. Insertion Sort) • Divide-and-conquer approach (e.g. recursive algorithms such as Merge Sort) – [Divide] Break the problem into related or similar sub-problems of smaller size; – [Conquer] Solve the sub-problems – [Combine] Combine the solutions CSE 5350 - Fall 2007 Analyzing Algorithms Growth of Functions Slide 9 Analyzing divide-and-conquer algorithms • Recursive approach use recurrence equation (recurrence) to describe the running time • T(n) = running time on a problem of size n • If n= small (n <= c) then Θ(1) • Otherwise, divide problem in a sub-problems, each of size n / b • D(n) = time to divide • C(n) = time to combine • T(n) = a*T(n/b) + D(n) + C(n) (when n > c) CSE 5350 - Fall 2007 Analyzing Algorithms Growth of Functions Slide 10 Merge Sort (1) public static void MergeSort2(List<int> input, int startIx, int endIx) { if (input == null) return; if (startIx == endIx) return; //stop condition int middle = (int) Math.Floor((endIx + startIx) / 2.0); MergeSort2(input, startIx, middle); MergeSort2(input, middle + 1, endIx); Combine2(input, startIx, middle, endIx); } CSE 5350 - Fall 2007 Analyzing Algorithms Growth of Functions Slide 11 Merge Sort (2) public static void Combine2(List<int> input, int i1, int i2, int i3) { if (input == null || input.Count == 0) return; List<int> result = new List<int>(i3 - i1 + 1); //not 100% in-place int ix1 = i1, ix2 = i2+1; while (result.Count < i3 - i1 + 1) { while (ix1 < i2 + 1 && (ix2 == (i3 + 1) || input[ix1] < input[ix2])) result.Add(input[ix1++]); while (ix2 < i3 + 1 && (ix1 == (i2 + 1) || input[ix1] > input[ix2])) result.Add(input[ix2++]); } for (int j = i1; j <= i3; j++) input[j] = result[j - i1]; } CSE 5350 - Fall 2007 Analyzing Algorithms Growth of Functions Slide 12 Analyzing Merge Sort • [Divide] • [Conquer] • [Combine] D(n) = Θ(1) 2 * T(n/2) C(n) = Θ(n) • T(n) = { Θ(1) if n=1 2T(n/2) + Θ(n) if n≥1 = Θ(n * log2 n) better than Insertion Sort CSE 5350 - Fall 2007 Analyzing Algorithms Growth of Functions Slide 13 Growth of Functions • Algorithms efficiency • Compare relative performance of alternative algorithms • Analysis for large input size: e.g. n ∞ • Asymptotic efficiency of algorithms: – Input size is large enough to make only the order of growth of the running time relevant – How the running time increases with the size of the input in the limit, as the size of the input increases without bound CSE 5350 - Fall 2007 Analyzing Algorithms Growth of Functions Slide 14 Asymptotic Notation • Notations used to describe the asymptotic running time of an algorithm • Are defined in terms of functions whose domains are the set N = {0, 1, 2, …} • Convenient for describing the worst-case running-time function T(n) • Notation abused vs. misused CSE 5350 - Fall 2007 Analyzing Algorithms Growth of Functions Slide 15 Θ-notation • Θ: asymptotically bounds a function from above and below ( g (n)) { f (n) : c1 , c2 , n0 0 0 c1 g (n) f (n) c2 g (n) n n0 } f(n) = Θ(g(n)) indicates f(n) Θ(g(n)) or g(n) is an asymptotically tight bound for f(n) CSE 5350 - Fall 2007 Analyzing Algorithms Growth of Functions Slide 16 O-notation • O: asymptotic upper bound O( g (n)) { f (n) : c, n0 0 0 f (n) cg (n) n n0 } f(n) = Θ(g(n)) f(n) = O(g(n)) • The Θ-notation is stronger than the O-notation • Example: n = O(n2) • E.g.: worst-case for insertion sort = O(n2) • Notation abuse: The running time of insertion sort is O(n2). – The running time depends on the particular input of size n. – It is true only for the worst-case scenario (i.e. no matter what particular input of size n is chosen for each value of n) CSE 5350 - Fall 2007 Analyzing Algorithms Growth of Functions Slide 17 Ω-notation • Ω: asymptotic lower bound ( g (n)) { f (n) : c, n0 0 0 cg (n) f (n) n n0 } • Theorem f(n) and g(n), f(n) = Θ(g(n)) iff f(n) = O(g(n)) and f(n) = Ω(g(n)) • Used to bound the best-case running time • E.g.: best-case for insertion sort is Ω(n) • Worst-case for insertion sort is Ω(n2) CSE 5350 - Fall 2007 Analyzing Algorithms Growth of Functions Slide 18 Graphical Comparison CSE 5350 - Fall 2007 Analyzing Algorithms Growth of Functions Slide 19 o-notation • Upper bound that is not asymptotically tight • 2n2 = O(n2) is asymptotically tight • 2n = O(n2) is not asymptotically tight o( g (n)) { f (n) : c 0, n0 0 0 f (n) cg (n) n n0 } • 2n = o(n2), but 2n2 o(n2) f ( n) lim 0 n g ( n) CSE 5350 - Fall 2007 Analyzing Algorithms Growth of Functions Slide 20 -notation • Lower bound that is not asymptotically tight • f(n) (g(n)) iff g(n) o(f(n)) ( g (n)) { f (n) : c 0, n0 0 0 cg (n) f (n) n n0 } • E.g.: n2/2 = (n), but n2/2 = (n2) lim n CSE 5350 - Fall 2007 f ( n) g ( n) Analyzing Algorithms Growth of Functions Slide 21 Comparison of Functions • Transitivity (for all five notations): If f(n) = X(g(n)) and g(n) = X(h(n)) f(n) = X(h(n)) • Reflexivity (for the big-X notations) f(n) = X(f(n)) • Symmetry: f(n) = Θ(g(n)) iff g(n) = Θ(f(n)) • Transpose symmetry: f(n) = O(g(n)) iff g(n) = Ω(n) f(n) = o(g(n)) iff g(n) = (n) CSE 5350 - Fall 2007 Analyzing Algorithms Growth of Functions Slide 22 Analyzing Algorithms Addendum •Finding the largest clique in a graph •Parsing an object model CSE 5350 - Fall 2007 Analyzing Algorithms Growth of Functions Slide 23 Finding the largest clique • Graph: G(V,E) • A graph or undirected graph G is an ordered pair G: = (V,E) that is subject to the following conditions: – V is a set, whose elements are called vertices or nodes, – E is a set of pairs (unordered) of distinct vertices, called edges or lines. • A clique in an undirected graph G is a set of vertices V such that for every two vertices in V, there exists an edge connecting the two. (Complete sub-graph) CSE 5350 - Fall 2007 Analyzing Algorithms Growth of Functions Slide 24 The Clique Problem • determining whether a graph contains a clique of at least a given size k. • Verification of actual clique : trivial • The clique problem is in NP (nondeterministic polynomial time). • NP-complete • The corresponding optimization problem, the maximum clique problem, is to find the largest clique in a graph. CSE 5350 - Fall 2007 Analyzing Algorithms Growth of Functions Slide 25 Brute force approach to the clique problem • Examine each sub-graph with at least k vertices and check to see if it forms a clique • Number of cases to inspect: • A clique C=(vi1, vi2, .., vin) exists only when its n sub-cliques each of size n-1 exist. • Event-raising mechanism to increment a counter of sub-cliques using the threshold graph • Space required to build all cliques until G=completely connected: n n n .. n 2n n 1 2 3 4 CSE 5350 - Fall 2007 Analyzing Algorithms Growth of Functions n Slide 26 Sorting an object model • Input: – A collection of property paths • Output: – A sorted collection of property paths CSE 5350 - Fall 2007 Analyzing Algorithms Growth of Functions Slide 27 Input Sample (excerpt) • • • • • • • • • • • • • • • • • • • • • • • • Company.Principals[0].References[0].ResidentialInfos[0].Address.AddressType Company.Principals[0].References[0].ResidentialInfos[1].Address.AddressType Company.Principals[0].References[1].ResidentialInfos[0].Address.AddressType Company.Principals[0].References[1].ResidentialInfos[1].Address.AddressType Company.Principals[1].References[0].ResidentialInfos[0].Address.AddressType Company.Principals[1].References[0].ResidentialInfos[1].Address.AddressType Company.Principals[1].References[1].ResidentialInfos[0].Address.AddressType Company.Principals[1].References[1].ResidentialInfos[1].Address.AddressType Company.Principals[0].References[0].ResidentialInfos[0].Address.StreetAddressLine1 Company.Principals[0].References[0].ResidentialInfos[1].Address.StreetAddressLine1 Company.Principals[0].References[1].ResidentialInfos[0].Address.StreetAddressLine1 Company.Principals[0].References[1].ResidentialInfos[1].Address.StreetAddressLine1 Company.Principals[1].References[0].ResidentialInfos[0].Address.StreetAddressLine1 Company.Principals[1].References[0].ResidentialInfos[1].Address.StreetAddressLine1 Company.Principals[1].References[1].ResidentialInfos[0].Address.StreetAddressLine1 Company.Principals[1].References[1].ResidentialInfos[1].Address.StreetAddressLine1 Company.Principals[0].References[0].ResidentialInfos[0].Address.StreetAddressLine2 Company.Principals[0].References[0].ResidentialInfos[1].Address.StreetAddressLine2 Company.Principals[0].References[1].ResidentialInfos[0].Address.StreetAddressLine2 Company.Principals[0].References[1].ResidentialInfos[1].Address.StreetAddressLine2 Company.Principals[1].References[0].ResidentialInfos[0].Address.StreetAddressLine2 Company.Principals[1].References[0].ResidentialInfos[1].Address.StreetAddressLine2 Company.Principals[1].References[1].ResidentialInfos[0].Address.StreetAddressLine2 Company.Principals[1].References[1].ResidentialInfos[1].Address.StreetAddressLine2 CSE 5350 - Fall 2007 Analyzing Algorithms Growth of Functions Slide 28 Object Model Company Principals[] FinancialStatements[] TradeLines[] Contacts[] Declarations[] Documents[] Addresses[] CompanyInfo BusinessAddress MailingAddress RelationshipSummary CSE 5350 - Fall 2007 1..* 1 1 1..* Address AddressType StreetAddressLine1 StreetAddressLine2 City County State Zipcode Country Analyzing Algorithms Growth of Functions Principal EmploymentInfos[] IncomeInfos[] TradeLines[] References[] Assets[] IdentificationInfos[] ResidentialInfos[] ContactInfo PersonalInfo PrincipalType EmployedByCompany YearsAsOwner IndividualOrJointType PersonType Slide 29 Output sample (sorted object model) • • • • • • • • • • • • • • • • • • • • • • • • Company.Principals[0].EmploymentInfos[0].EmployerAddress.AddressType Company.Principals[0].EmploymentInfos[0].EmployerAddress.StreetAddressLine1 Company.Principals[0].EmploymentInfos[0].EmployerAddress.StreetAddressLine2 Company.Principals[0].EmploymentInfos[0].EmployerAddress.City Company.Principals[0].EmploymentInfos[0].EmployerAddress.County Company.Principals[0].EmploymentInfos[0].EmployerAddress.State Company.Principals[0].EmploymentInfos[0].EmployerAddress.ZipCode Company.Principals[0].EmploymentInfos[0].EmployerAddress.Country Company.Principals[0].EmploymentInfos[0].EmployerName Company.Principals[0].EmploymentInfos[0].EmploymentType Company.Principals[0].EmploymentInfos[0].OccupationType Company.Principals[0].EmploymentInfos[0].YearsOfEmployment Company.Principals[0].EmploymentInfos[0].MonthsOfEmployment Company.Principals[0].EmploymentInfos[0].Title Company.Principals[0].EmploymentInfos[0].Department Company.Principals[0].EmploymentInfos[1].EmployerAddress.AddressType Company.Principals[0].EmploymentInfos[1].EmployerAddress.StreetAddressLine1 Company.Principals[0].EmploymentInfos[1].EmployerAddress.StreetAddressLine2 Company.Principals[0].EmploymentInfos[1].EmployerAddress.City Company.Principals[0].EmploymentInfos[1].EmployerAddress.County Company.Principals[0].EmploymentInfos[1].EmployerAddress.State Company.Principals[0].EmploymentInfos[1].EmployerAddress.ZipCode Company.Principals[0].EmploymentInfos[1].EmployerAddress.Country Company.Principals[0].EmploymentInfos[1].EmployerName CSE 5350 - Fall 2007 Analyzing Algorithms Growth of Functions Slide 30 Data Graph ROOT ApplicationNumber ConversationLogs Comment DateTimeStamp IncludeInUDR Company Principals FinancialStatements EmploymentInfos IncomeInfos TradeLines CSE 5350 - Fall 2007 Analyzing Algorithms Growth of Functions Slide 31 Data Structures Node Text : string Count : int CrtIndex : int IsLast : bool NextNode : Node RightNode : Node CSE 5350 - Fall 2007 Analyzing Algorithms Growth of Functions Slide 32