COMP 335: Introduction to Theoretical Computer Science: Fall 2014 Assignment 4
Transcription
COMP 335: Introduction to Theoretical Computer Science: Fall 2014 Assignment 4
COMP 335: Introduction to Theoretical Computer Science: Fall 2014 Assignment 4 Due November 11, 2014 at midnight 1. Let G be any context-free grammar without any λ productions or unit productions. Let k be the maximum number of symbols on the right side of any production in P . Show that there is an equivalent grammar in Chomsky Normal Form that has no more than (k − 1)|P | + |T | production rules. Ans. Suppose G is already in CNF. If k = 1, all productions are of the form S → a for some a ∈ T , and therefore G has at most |T | ≤ (k − 1)|P | + |T | productions. If instead k ≥ 2, the number of productions is |P | ≤ (k − 1)|P | + |T |. So we assume G is not in CNF. We will convert G to CNF and find the number of productions in the resulting grammar. By assumption, G does not have λ or unit productions; notice that this means k ≥ 2. Furthermore, since removing useless productions only reduces the number of productions, we will consider the two remaining procedures to convert to CNF, and show that the grammar obtained as a result of applying these procedures will have at most the stated number of productions. In the first procedure, each terminal a in the right hand side of a production containing both terminals and non-terminals is replaced by a new non-terminal Ta and a new production of the sort Ta → a. Clearly this only needs |T | new productions. In the second procedure, every production A → A1 A2 . . . Ai with i > 2 is replaced by i − 1 productions namely A → A1 B1 , B1 → A2 B2 , . . . , Bi−2 → Ai−1 Ai . Since there are at most |P | productions that are replaced in this manner, and i ≤ k, we replace the original productions by at most (k − 1)|P | productions, giving a total of (k − 1)|P | + |T | productions. 2. Let G be a context-free grammar in CNF, and let w ∈ L(G) be the yield of a parse tree for w according to the grammar G. Prove using induction that if the length of the longest path in the tree is n, then |w| ≤ 2n−1 . Ans. We use induction on n. Basis: If n = 1, then there must be a production of the form S → w. Since G is in CNF, and w ∈ T ∗ , it follows that |w| = 1 = 20 as needed. Induction step: Assume that the yield w of a parse tree in which the length of the longest path is k is at most 2k−1 . Now consider a parse tree in which the length of the longest path is at most k + 1. The first production must be of the form S → AB. Let the yield of the tree with A as root be w1 and that with B as root be w2 . Then, since the longest paths in both subtrees can be of length at most k, we know that |w1 | ≤ 2k−1 and |w2 | ≤ 2k−1 . Since w = w1 w2 , we conclude that |w| ≤ 2 · 2k−1 = 2k as needed. 3. Convert the following grammars to push-down automata using the standard procedure: (a) S −→ aABB | aAA A −→ aBB | a B −→ bBB | A First we convert the grammar to Griebach Normal Form. S −→ aABB | aAA A −→ aBB | a B −→ bBB | aBB | a Next we convert it to a PDA. a, S → ABB a, S → AA a, A → BB a, A → λ b, B → BB a, B → BB a, B → λ q0 λ, z → Sz q1 λ, z → z qf (b) S −→ aSb | bSa | ab | ba First we convert the grammar to Griebach Normal Form. S −→ aSB | bSA | aB | bA A→a B→b Next we convert it to a PDA. a, S → SB a, S → B b, S → SA b, S → A a, A → λ b, B → λ q0 λ, z → Sz q1 λ, z → z qf 4. Determine whether or not the following languages on Σ = {a, b, c} are context-free. Explain your answers. (a) L1 = {an bj ck dl | n ≤ j; k ≤ l} Ans. L1 is context-free. It is generated by the following CFG. S → AB A → aAb | Ab | λ B → cBd | Bd | λ (b) L2 = {an bj ck dl | n ≤ k; j ≤ l} Ans. L2 is not context-free. Suppose it is context-free, and let m be the constant of the pumping lemma. Choose w = am bm cm dm . Clearly w ∈ L and |w| ≥ m. Let w = uvxyz with |vxy| ≤ m and |vy| ≥ 1. Then vy cannot contain both a’s and c’s and cannot contain both b’s and d’s. If vy contains a’s but not c’s, we pump up and if vy contains c’s but no a’s, we pump down. In both situations, we obtain a string with more a’s than c0 s. Similarly, if vy contains b’s but no d’s, we pump up and if vy contains d’s but no b’s, we pump down. In both situations, we obtain a string with more b’s than d’s. In every case, we obtain a string not in L2 , a contradiction to the pumping lemma. Therefore L2 must be context-free. (c) L3 = {w1 cw2 | w1 , w2 ∈ (a + b)? , w1 6= w2 } Ans. L3 is a context-free language. We observe that L3 = LA ∪ LB where LA = {w1 cw2 | |w1 | = 6 |w2 |} and LB = {w1 cw2 | the ith symbol of w1 is different from the ith symbol of w2 where i ≤ min(|w1 |, |w2 |)} It is easy to see that LA is a cfl, for instance the following grammar generates it: S → XSX | A | B A → XA | Xc B → BX | cX X→a|b Next we give a grammar for LB . The key idea is to generate strings of length i − 1 before and after the c before generating the non-matching symbol, as shown in the grammar below. S → BaD | AbD B → XBX | bC A → XAX | aC C → XC | c D → XD | λ X→a|b We prove that the grammar above generates LB . Observe that: i. ii. iii. iv. ? B ⇒ X i−1 bCX i−1 for all i ≥ 1. ? X i ⇒ x where x ∈ (a + b)∗ and |x| = i for all i ≥ 1. ? C ⇒ yc for all y ∈ (a + b)∗ ? D ⇒ z for all z ∈ (a + b)∗ Therefore by starting with the production S → BaD, we obtain ? ? S ⇒ X i−1 bCX i−1 aD ⇒ x1 bycx2 az, with x1 , x2 , y, z ∈ (a + b)∗ and |x1 | = |x2 | = i − 1. Similarly, by starting with the production S → AbD, we conclude that ? S ⇒ x1 aycx2 bz, with x1 , x2 , y, z ∈ (a + b)∗ and |x1 | = |x2 | = i − 1. These are the only strings that S derives, therefore the above grammar generates LB . We can also give a PDA for LB as follows. The idea of the PDA is that we ”guess” the value of i above. We push the stack symbol X on to the stack for every input symbol we see, then for the ith symbol, we go to different states based on whether we see an a or a b. At this point there are i − 1 Xs on stack. We keep processing input symbols after this without altering the stack until we get to a c. Now we pop off the symbol X until we see the bottom of stack marker z. This means we have seen i − 1 symbols after the c. If the next symbol varies from the i-th symbol in the string before the c, we go to a final state. (d) L4 = {wcw | w ∈ (a + b)? } Ans. Not context-free. We use the pumping lemma to prove that it is not context-free. Assume L4 to be context-free and let m be the constant of the pumping lemma, and choose the string w = am bm cam bm . Clearly w ∈ L4 and |w| ≥ m. Let w = uvxyz with |vxy| ≤ m and |vy| ≥ 1. We consider the following exhaustive cases: Case 1: vy contains the symbol c. Then uv 2 xy 2 z has more than 1 c, and therefore cannot belong to L4 . Case 2: vy is chosen from the substring before the c. Then vy = ai bj , with i + j ≥ 1. Therefore uxz = am−i bm−j cam bm ∈ / L4 since am−i bm−j 6= am bm as either i ≥ 1 or j ≥ 1 (or both). Case 3: vy is chosen from the substring after the c. Then vy = ai bj , with i + j ≥ 1. Therefore uxz = am bm cam−i bm−j ∈ / L4 since am−i bm−j 6= am bm as either i ≥ 1 or j ≥ 1 (or both). Case 4: vy contains symbols both from the substring of w before the c and after the c (but does not contain c). Then since |vxy| ≤ m, it must be that v = bi and y = aj with i, j ≥ 1. Therefore uxz = am bm−i cam−j bm ∈ / L4 as am bm−i 6= am−j bm . In all cases, we arrive at a string not in L4 , which contradicts the pumping lemma. Therefore L4 cannot be context-free. 5. Consider the language L = {ai bj ck | i 6= j, j 6= k, i 6= k}; it is not a context-free language. Show that it nevertheless satisfies the conditions of the context-free pumping lemma, that is, show that there exists an m so that for all strings w in L of length at least m, we can write w = uvxyz with |vxy| ≤ m, |vy| ≥ 1, such that ∀i ≥ 0 : uv i xy i z ∈ L. Ans. Let m = 3, and consider any string w = ai bj ck ∈ L. Since i 6= j, j 6= k, i 6= k, there is a strict ordering between the three. We will break up w so that vy consists of only one type of symbol, specifically the symbol which has the most occurrences in w. We claim that all pumped strings are still in L. Consider the case when i > j > k. Then we will split up w = uvxyz with u = v = x = λ and y = a` where ` ≥ 1 is defined below. Notice that for any ` ≥ 1, pumping up will only increase further the number of a’s and so will give strings in L. So we only have to consider the string resulting from pumping down. Case 1: i = k + 2: Choose ` = 3. Then uxz = ai−3 bj ck = ak−1 bj ck with k − 1 < k < j. Thus uxz ∈ L. Case 2: i > k + 2 and i > j + 1: Take ` = 1. Then uxz = ai−1 bj ck with i − 1 > j > k. Thus uxz ∈ L. Case 3:i > k + 2 and i = j + 1: Take ` = 2. Then uxz = aj−1 bj ck . Since i > k + 2, we have i − 2 6= k. Thus uxz ∈ L. This example shows that the converse of the pumping lemma does not hold. You can have a language in which all long enough strings can be pumped. Yet, the language is not context-free (this can shown using a stronger version of the pumping lemma called Ogden’s lemma.) 6. Given a context-free grammar G = (V, T, S, P ), show how to construct a grammar G0 such that L(G0 ) = L(G)R . Explain your answer. Ans. Use the same variables, and simply reverse all right hand sides of productions. The proof that this generates L(G)R is omitted. 7. Give a DPDA for the following languages: (a) {an b2n+3m cm | n, m ≥ 1} a, z, AAz a, A, AAA q0 b, A → λ b, A → λ q1 b, B → BB b, z → Bz q2 c, B → λ q3 λ, B → λ q4 λ, B → λ q5 λ, z → z qf c, B → λ (b) {wwr | w ∈ (ab)∗ } q0 a, z → z q1 b, z → Az b, A → AA a, A → A q2 b, A → A q3 a, A → λ q4 λ, z → z qf b, A → A 8. Let L be a DCFL over an alphabet Σ. Let f1 (L) = {w : wa ∈ L for some a ∈ Σ} and let f2 (L) = {w : aw ∈ L for some a ∈ Σ}. Only one of f1 (L) and f2 (L) is guaranteed to be a DCFL. Which one? Explain your answer. Soln. If L is a DCFL, we claim that f1 (L) is a DCFL. Given a DPDA M for L, we can convert it to a DPDA for f1 (L) by simply converting to a final state any q ∈ Q from which it is possible to arrive at a state qf ∈ F while consuming one input symbol. Making previously nonfinal states into final states does not make the machine non-deterministic, thus the resulting machine is a DPDA and accepts f1 (L). We claim that f2 (L) is not necessarily a DCFL, even if L is a DCFL. Let L = {can bn | n ≥ 1} ∪ {dan b2n | n ≥ 1}. Then L is clearly a DCFL: a DPDA accepting L can easily be constructed, by going from the initial state to q1 and q2 without altering the stack, based on whether the first input symbol is a c or a d (respectively). Now q1 can be the start state for a DPDA for the language {an bn | n ≥ 0} and q2 the start state for a DPDA for the language {an b2n | n ≥ 0}. It is easy to see that the constructed machine is a DPDA and that it accepts L. But f2 (L) = {an bn | n ≥ 1} ∪ {an b2n | n ≥ 1}, which is not a DCFL.