Document 6518698
Transcription
Document 6518698
Overview Data Structures for Databases II Advanced SQL Real world Model Queries Answers Lena Strömbäck Databases DBMS Processing of queries and updates Access to stored data Physical database What is this about? How to make more efficient kinds of indexes What do you need to learn? Multilevel indexing Index on mutiple keys and in MySQL PLSQL and triggers Part II Advanced SQL for the project Example Assume an ordered data file with 1,000,000 records of size 1000 byte and block size of 4,096 bytes. Assuming an index record size of 32 bytes. On average, how many block accesses need to be performed to find a single record when searching for the key field a) Using no index? The number of blocks for the data file is 250,000 b) Using a primary index? Multilevel Indexes Multilevel indexes ”Index on the index” Multilevel Index Example Assume an ordered datafile with 1,000,000 records of size 1000 byte and block size of 4,096 bytes. Assuming an index record size of 32 bytes. On average, how many block accesses need to be performed to find a single record when searching for the key field Reduce the search space of the index by fitting indexes of the index in fewer blocks until the top level index fits in one block. The reduction is determined by the blocking factor. The value blocking factor is called as fan out (fo). a) Problems with Multilevel Indexes Using a multilevel index Search Tree A search tree is a tree that is used to guide the search for a record. An ordinary search tree of order p consist of nodes that have at most p-1 values and p pointers. <P1, K1, P2, K2, …, Pq-1, Kq-1, Pq> Problems when inserting and deleting data All levels are based on physically ordered files. Use an overflow file and re-create the index during file reorganisation. Use a dynamic multilevel index structure Search Trees where q≤p and Pi is a pointer to a child node (or a null pointer) Search Tree: Example, order p=3 Pi . . 1. Within each node, K1 < K2 < … < Kq-i 2. For all values X in the subtree pointed by Pi: If 1< i < q, Ki-1 < X < Ki If i = 1, X < K1 If i = q, Kq-1 < X Pq B-Tree B-Tree: Example, order p=3 B-tree = Balanced tree. all leaves are on the same level all nodes except the root and leaves have at most p pointers and at least p / 2 pointers. B-tree: Order B-tree: Number of entries Given: B = 4096 bytes, Precord = 16 bytes, Pblock = 8 bytes, K = 64 bytes, fill percentage = 69% One node must fit in one block: Æ p <= 47 B+Precord+K p ⋅ Pblock + ( p − 1) ⋅ ( Precord + K ) ≤ B ⇒ p ≤ Pblock+Precord+K Nodes Pointers 1 0.69*47≈33 33-1=32 Level1 33 33*33=1089 33*32=1056 Level2 1089 Level3 35,937 Root p Pblock Precord K order, number of block pointer entries in a node size of a block pointer size of a record pointer size of a search key field =35,937 332 *32=34,848 334 =1,185,921 333 *32=1,149,984 333 The number of entries hold in the 3 level B-tree: 1,149,984 B+-Tree: Example, order p=3, pleaf=2 B+-tree Entries Order of insertion: 8, 5, 1, 7, 3, 12, 9, 6 A variation of the B-tree Data pointers only stored in leaf nodes. The leaf nodes are usually linked to provide ordered access. Most common dynamic multilevel index implementation 5 3 1 3 5 6 7 8 7 8 8 5 1 7 3 12 9 6 Andersson Hagberg French Silver Daniels Young Zhing Baker 9 12 B+-trees: Internal nodes B+-trees: Internal nodes K1 1. 2. 3. K1 4. Each internal node is of the form <P1, K1, P2, K2, …, Pq-1, Kq-1, Pq> Within each internal node K1 < K2 < … < Kq-i For all search field values X in the subtree pointed at by Pi, we have Ki-1< X ≤ Ki for 1 < i < q for i = 1 X ≤ Ki for i=q Ki-1 < X P1 K1 ... Ki −1 Pi Ki ... 6. P1 K q −1 < X 3. 4. 5. Ki ... Kq−1 Pq K q −1 < X Search: 8 5 Pr1 ... Ki Pri ... K q−1 7 3 1 3 5 6 8 7 8 9 12 Pq Pnext B+-tree Order B+-trees One internal node must fit in one block: ⇒ p ≤ p ⋅ Pblock + ( p − 1) ⋅ K ≤ B B+K Pblock + K Given: B=4096 bytes, Precord=16 bytes, Pblock=8 bytes, K=64bytes, fill percentage=70% Æ p <= 57; pleaf<=51 Nodes One leaf node must fit in one block: p leaf ⋅ ( Precord + K ) + Pblock ≤ B ⇒ p leaf ≤ B p pleaf Pblock K Precord Ki −1 Pi K i −1 < X ≤ K i 3 Each leaf node is of the form <<K1, P1>, <K2, P2>, …, <Kq-1, Pq-1>, Pnext> Within each leaf node K1 < K2 < … < Kq-i Each entry contains a pointer to the record whose search field value corresponds to the entry. Each leaf node has at least p / 2 values. All leaf nodes are at the same level. K1 ... B+-Tree Search 1 2. K1 X ≤ K1 B+-trees: Leaf nodes 1. Each internal node has at most p tree pointers. Each internal node, except the root, has at least p / 2 tree pointers. The root node has at least two tree pointers if it is an internal nodes. An internal node with q pointers (q≤ p), has q-1 search field values. Kq−1 Pq K i −1 < X ≤ K i X ≤ K1 5. B − Pblock Precord + K block size order, number of pointer entries in an internal node number of record pointer entries in a leaf node size of a block pointer size of a search key field size of a record pointer Pointers Entries ≈ 40 40-1=39 Level1 40 40*40=1600 40*39=1560 Level2 1600 403 =64,000 402 *39=62,400 Root Leaf level 1 0.7*57 Record pointers 64,000 64,000*0.7*51=2,284,800 the number of entries hold in the 3-level B-tree: 1,185,920 B+-trees Search B+-trees Insertion and Deletion Very fast searching in the index structure: Insertion and deletion can be expensive. log p⋅ f N N p f number of search values order, number of block pointers per node fill factor, 0≤f≤1 B+-tree: Insertion B+-Tree When a leaf node is full it causes an overflow The first p 2 entries in the node are kept there, the remaining are moved to a new leaf. The search value of new node move up to the parent. If the parent is full, it will overflow. The resulting split can propagate all the way up to the root. Insert: 8 B+-Tree B+-Tree 8 Insert: 5 5 Insert: 1 8 Overflow – create a new level B+-Tree B+-Tree 5 1 5 5 8 1 Overflow - Split Insert: 7 5 7 8 Insert: 3 B+-Tree B+-Tree 3 1 3 5 5 5 7 8 3 8 Overflow - Split Propagates to a new level 1 Insert: 12 3 5 7 B+-Tree 5 3 3 12 Insert: 9 B+-Tree 1 8 5 8 5 7 8 3 9 12 1 3 7 5 Overflow – Split Insert: 6 Resulting B+-tree 6 7 8 8 9 12 B+-tree: Deletion B+-Tree 7 When a leaf node is less than haf full it causes an underflow Redistribute, merge with sibling, The resulting combining can also propagate to internal nodes. 1 6 1 5 9 6 7 8 9 12 Delete: 5 B+-Tree B+-Tree 7 1 1 6 6 7 9 7 1 8 9 12 6 1 8 6 7 8 9 Underflow - redistribute Delete: 12 Delete: 9 B+-Tree B+-Tree 7 1 1 6 6 1 8 7 1 8 Underflow merge with the left propagate reduce the tree levels 6 6 7 8 B+-trees Many variations Indexes on Multiple Keys B-trees B+-trees B*-trees (B+-tree with a fill factor of at least 2/3) Common modifications Change the fillfactor from 0.5 to 1.0 Allow a node to become empty before merging e.g. select * from employee where dept = ‘CS’ and age = ’40’ Possible strategies for processing this query using indices on single attributes: Indexes on Multiple Keys If the set of records that matches each condition is large, but the combination is not, an index on the composite may be useful. use index on dept to find employee with dept = ‘CS’, then test them individually to see if age = ’40’ use index on age to find employee with age = ’40’, then test them individually to see if dept = ‘CS’ use dept index to find pointers to all records of the CS department, and use age index similarly, then take intersection of both sets of pointers Indexes in reality – MySQL InnoDB storage engine ordered index on multiple attributes, treat the composite as a single value Create a clustered index for each table Rows are physically ordered by the primary key B-trees Advanced SQL Functions and procedures • • • Must contain RETURN …; Used for performing series of SQL statements Stored on the server side Reduce the traffic between server and the clients Triggers • Stored procedures and functions Used for triggering controls and other actions Programming • • Flow control Cursors SHOW CREATE { PROCEDURE | FUNCTION } name Triggers Triggers OLD.colname refers to the value before the triggering event NEW.colname refers to the value after the triggering event create trigger newemp before insert on emp for each row begin declare n int; select count(*) into n from emp where id=new.id; if n>0 then begin select max(id) into n from emp; set new.id=n+1; end; end if; end; // SHOW TRIGGERS; Cursors 1st 2nd 3rd Flow control