Consider a database that contains the following tables (and possibly many others that
are not related to this problem):
PROFESSOR (pid (4) , p_name (40), did (4), p_age (2) )
COURSE (crscode (4), did (4), crsname (30), descr (62) )
TEACHING (pid (4), crscode (4), semester (2))
For each attribute, we specify how many bytes we need to use to store each value.
Now, consider the following SQL query:
SELECT C.crsname, P.p_name
FROM PROFESSOR P, COURSE C, TEACHING T
WHERE T.semester = “F2020” AND P.did=100 AND P.pid = T.pid AND
T.crscode=C.crscode
Assume the following statistical information about these relations:
- 1000 tuples for the PROFESSOR relation
- 16,000 tuples for the TEACHING relation
- 2000 tuples for the COURSE relation
- each page on the disk can store 1000 bytes
- crscode is the primary key in COURSE and foreign key in TEACHING
- pid is the primary key in PROFESSOR and foreign key in TEACHING
- (crscode, semester) is the primary key in TEACHING
- 5-page main memory buffer
- 20 different values in departments (did values in PROFESSOR and COURSE)
- 16 different values for semester (semester values in TEACHING)
-
Assume uniformity whenever is needed (e.g., all semesters the same number of
courses, all professors the same number of courses, etc.)
1. Provide two alternative query trees (operation trees) that represent an evaluation of
the SQL query above. You do not have to provide estimation details or algorithms. Just
the operation trees.
2. Now, assume that only the Block Nested Loop Join and the Hash Join are
supported by the system. Find the best evaluation plan of the SQL query and estimate
the total cost in number of I/Os. You can make any other assumption that you want here
but you need to state it and use it in your estimation. Specify any pipeline or on-the-fly
operations (if you use some).


0 comments