Home
Blog
Query Evaluation and Optimization

Query Evaluation and Optimization

Daniel Kevins

0 comments

Problem 1: Consider a database containing information about car accidents between 2000 and 2020, including the cars involved and their owners.

The database has the following tables:

Car( license, year, make, model)

Accident( license, accident_date, damage_amount, zipcode)

// zipcode in Accident is the place where accident took place

// assume that the same car does not get into an accident twice in a day

Owner(SSN, license, name, street, city, zipcode)

// assume each owner has only one licensed car

The statistics and other catalog information for the database are as follows:

NTuples(Car) = 10,000, NTuples(Accident) = 20,000, NTuples(Owner) = 10,000

NPages(Car) = 100, NPages(Accident) = 1000, NPages(Owner) = 500

NDistinct(Car.make) = 50, NDistinct(Owner.City) = 20

Min(Accident.accident_date) = 01/01/2000, Max(Accident_date) = 12/31/2020.

All indexes use Alternative 2 for the data entries.

All BTrees have 100 keys per node, hash indexes have 100 keys per bucket, assume

BTrees are 3 levels deep, and the cost for a hash look up is 1.2 I/Os on average.

An unclustered Linear hash index exists on Car(make)

A clustered B+Tree index on Accident(accident_date)

An unclustered Linear hash index exists on Owner(city)

Consider now the following query:

SELECT O.name, A.damage_amount

FROM Car C, Accident A, Owner O

WHERE C.license = A.license AND C.license = O.license

AND C.make = “Ford”

AND A.accident_date < “06/31/2005” AND O.city=“Boston”;

Answer the following questions. You can make any additional assumptions that you may

need:

1. Give the estimation of the results (how many tuples) after you apply the

selection operations to Accident, Car, and Owner relations.

2. Assume that only Block Nested Loop Join (BNLJ), Sort-Merge Join (SMJ), and

Hash Join (HJ) join algorithms are supported. Using the System R query

optimizer, compute the best evaluation plan for this query and provide the

estimated cost in number of I/Os. Assume that the buffer has 5 pages.

Problem 2:

Consider the join R ⋈R.a=S.b S, given the following information about the relations to

be joined. The cost metric is the number of page I/Os unless otherwise noted, and the

cost of writing out the result should be uniformly ignored.

Relation R contains 20,000 tuples and has 10 tuples per page.

Relation S contains 5,000 tuples and also has 10 tuples per page.

Attribute b of relation S is the primary key and every tuple in S matches 4 tuples in R.

There exists a unclustered B+-tree index on R.a with height 3.

There exists a clustered B+-tree index on S.b with height 2.

The main memory buffer can hold 5 pages.

1. What is the cost of the join using a sort-merge join? What is the minimum

number of buffer pages required for this cost to remain unchanged?

2. What is the cost of the join using a hash join? What is the cost of joining R and S

using a hash join if the size of the buffer is 52 pages?

3. What is the cost of the join using indexed nested loop join?

About the Author

Daniel Kevins

Follow me