Assignment
Transcription
Assignment
CMPSCI 445 -- Homework 4 Due April 14, before 11 am Question 1. Output of Join Algorithms Suppose we have two unary (one attribute only) relations, R and S: R S ------------7 8 2 4 9 2 8 1 3 3 9 2 1 7 3 3 6 Show the result of joining R and S using each of the following algorithms. List the results in the order that they would be output by the join algorithm. Note that the result relation contains only one attribute, which is the common attribute between R and S. (1) Sort merge join algorithm. (2) Two-phase hash join algorithm, assuming that there is enough memory to complete the join in two passes. To run the algorithm, assume that there are two hash buckets, numbered 0 and 1. In Phase I, the hash function sends even values to bucket 0 and odd values to bucket 1. In Phase II, use R as the “build” relation and S as the “probe” relation. Assume that bucket 0 is read first and that the contents of a bucket are read in the same order as they were written in the previous phase. Question 2. Costs of Join Algorithms Consider the join R R.a=S.b S, given the following information about the relations to be joined. The cost metric is the number of page I/Os unless otherwise noted, and the cost of writing out the result should be uniformly ignored. Relation R contains 10,000 tuples and has 10 tuples per page. Relation S contains 2,000 tuples and also has 10 tuples per page. Attribute b of relation S is the primary key for S. Both relations are stored as simple heap files. Neither relation has any indexes built on it. 52 buffer pages are available. (1) What is the cost of joining R and S using a block nested loops join? (2) What is the cost of joining R and S using a sort-merge join? (3) What is the cost of joining R and S using a hash join? (4) Assume that you can access as much memory as you want. What would be the lowest possible I/O cost for joining R and S using any join algorithm, and what is the minimum buffer size to achieve this cost? Explain briefly. Question 3. Cost of Group By Consider again the relation R(a, b, c, … ), with 10,000 tuples and 10 tuples per page. Consider the following Group By query: Query 1: Select From Group By min(R.b) R R.a Assume that 52 buffer pages are available. Please briefly describe the most efficient implementation of the Group By query using the sortmerge algorithm and analyze its I/O cost. Question 4: Query Optimization Consider the following schema: Sailors(sid, sname, rating, age) Boats(bid, bname, size) Reserves(sid, bid, day) Reserves.sid is a foreign key to Sailors. Reserves.bid is a foreign key to Boats.bid. We are given the following information about the database: • Sailors contains 50 pages with 20 records per page, so 1000 records in total. • Boats contains 10 pages with 10 records per page, so 100 records in total. • Reserves contains 250 pages with 40 records per page, so 10,000 records in total. • There are 100 values for Reserves.sid. • There are 50 values for Reserves.bid. • There are 1000 values for Reserves.day Consider Query 2: SELECT S.sid, S.sname, B.bname FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid and R.bid = B.bid (1) Draw all left-deep query plans that avoid Cartesian products for this query. (2) Now consider the query plan (SR) B, where SR takes place first and then further joins with B. Assume that we have 22 pages of memory. Block nested loops join and hash join are the only available join methods. Which join algorithm works better for SR given all 22 pages? Explain your answer. Next, we consider the two joins, (SR) B, together. Let us reserve 10 pages to hold relation B in memory, and 1 page to hold the final output. We give the remaining 11 pages to implement SR. Given 11 pages, which join algorithm is more efficient for SR? And what is the total I/O cost for the entire plan, (SR) B?