Problem 1: Consider a database containing information about car accidents between 2000 and 2020, including the cars involved and their owners.
The database has the following tables:
Car( license, year, make, model)
Accident( license, accident_date, damage_amount, zipcode)
// zipcode in Accident is the place where accident took place
// assume that the same car does not get into an accident twice in a day
Owner(SSN, license, name, street, city, zipcode)
// assume each owner has only one licensed car
The statistics and other catalog information for the database are as follows:
NTuples(Car) = 10,000, NTuples(Accident) = 20,000, NTuples(Owner) = 10,000
NPages(Car) = 100, NPages(Accident) = 1000, NPages(Owner) = 500
NDistinct(Car.make) = 50, NDistinct(Owner.City) = 20
Min(Accident.accident_date) = 01/01/2000, Max(Accident_date) = 12/31/2020.
All indexes use Alternative 2 for the data entries.
All BTrees have 100 keys per node, hash indexes have 100 keys per bucket, assume
BTrees are 3 levels deep, and the cost for a hash look up is 1.2 I/Os on average.
An unclustered Linear hash index exists on Car(make)
A clustered B+Tree index on Accident(accident_date)
An unclustered Linear hash index exists on Owner(city)
Consider now the following query:
SELECT O.name, A.damage_amount
FROM Car C, Accident A, Owner O
WHERE C.license = A.license AND C.license = O.license
AND C.make = “Ford”
AND A.accident_date < “06/31/2005” AND O.city=“Boston”;
Answer the following questions. You can make any additional assumptions that you may
need:
1. Give the estimation of the results (how many tuples) after you apply the
selection operations to Accident, Car, and Owner relations.
2. Assume that only Block Nested Loop Join (BNLJ), Sort-Merge Join (SMJ), and
Hash Join (HJ) join algorithms are supported. Using the System R query
optimizer, compute the best evaluation plan for this query and provide the
estimated cost in number of I/Os. Assume that the buffer has 5 pages.
Problem 2:
Consider the join R ⋈R.a=S.b S, given the following information about the relations to
be joined. The cost metric is the number of page I/Os unless otherwise noted, and the
cost of writing out the result should be uniformly ignored.
Relation R contains 20,000 tuples and has 10 tuples per page.
Relation S contains 5,000 tuples and also has 10 tuples per page.
Attribute b of relation S is the primary key and every tuple in S matches 4 tuples in R.
There exists a unclustered B+-tree index on R.a with height 3.
There exists a clustered B+-tree index on S.b with height 2.
The main memory buffer can hold 5 pages.
1. What is the cost of the join using a sort-merge join? What is the minimum
number of buffer pages required for this cost to remain unchanged?
2. What is the cost of the join using a hash join? What is the cost of joining R and S
using a hash join if the size of the buffer is 52 pages?
3. What is the cost of the join using indexed nested loop join?


0 comments