- Consider the following two scenarios in a k-way merge sort of a file.
_ The file has 20,000 pages and the buffer has 7 available frames.
_ The file has 2,000,000 pages and buffer has 15 available frames.
For each of the above scenario, answer the following questions assuming the best use of the
(a) How many initial runs will be produced?
(b) How many passes would be needed to complete the sorting?
(c) How many page I/Os would be required to sort the file?
(d) If we want to sort the file in 2 passes, how many buffer frames would be required?
2. Consider the join R 1R:a=S:b S of relations R and S with the following information.
R has 10,000 tuples and a blocking factor of 10.
S has 2,000 tuples and a blocking factor of 10.
S.b is the primary key of S.
Both R and S are stored in simple heap files and neither relation has any index.
Up to 52 buffer frames available.
For each of the three join algorithms: (a) Blocked nested-loop join, (b) Sort-merge join, and
(c) Hash join, answer the following questions. Unless stated otherwise, the join cost should
be the number of page I/Os excluding the cost for writing the join result.
(a) Estimate the I/O cost and the number of buffer frames that is needed to obtain that cost.
(b) What is the best cost of the join algorithm if additional buffer is available? How many
buffer frames are required to obtain that cost?
(c) How many blocks does the join result contain?
(d) If R:a is a foreign key referencing S:b, how would the answers to the previous questions
Computer Science 5443 Assignment 3 August 15, 2015
3. Consider relation Executives(ename, title, dname, address), where all attributes have
the same length and ename is a candidate key. The relation contains 12; 000 pages and the
buffer has 10 pages. Assume that 10% tuples satisfy the following query.
Select E.title, Count(*)
From Executives E
Where E.dname > ’W%’
Group by E.title
For each of the following scenarios, describe the most efficient evaluation plan for the query
and estimate its cost in pages.
(a) Only a clustering B+ tree index on title is available.
(b) Only an unclustered B+ tree index on title is available.
(c) Only a clustered B+ tree index on dname is available.
(d) Only a clustering B+ tree index on (dname, title) is available.
(e) Only a clustering B+ tree index on (title, dname) is available.
4. Consider the database schema
emp(eid, did, sal, hobby)
dept(did, dname, floor, phone)
finance(did, budget, sales, expenses)
and the following query.
Select D.dname, F.budget
From emp E, dept D, finance F
Where E.did=D.did and D.did=F.did and D.floor=1
and E.sal>=49000 and E.hobby=’tennis’
(a) Draw the initial query tree and an optimized query tree produced by a decent query
(b) Assume that the optimizer will not consider any evaluation plan involving a cross product,
list join orders that the optimizer will consider.
(c) Consider the following scenario. The attributes that have unclustered B+ tree indexes
are emp.did, emp.sal, dept.floor, dept.did, and finance.did. The data in emp.sal ranges
between 10,000 and 60,000. Employees have 200 different hobbies. The building of
the company has 5 floors. The company has 50,000 employees and 2,500 departments,
each of which has some financial information. The DBMS uses only the indexed nested
loop join methods including possible pipelining. Assume that all B+ tree indexes have
i. For each base relation (i.e., emp, dept, and finance), estimate the number of tuples
that will satisfy all non-join conditions relevant to that relation.
ii. Which join order considered by the optimizer has the lowest estimated cost?
This question belongs to computer science and discusses about join algorithm and drawing query trees.
Word count: NA
Download Full Solution