# Read or write a page takes D time units.

## Read or write a page takes D time units.

Subject(s): Computer Science–Database Development

READ THIS: If we scan 1000 pages using one processing node (PN), the computation time is 1000D, where D is the time to read or write a disk page. If we evenly distribute the 1000 pages among four PNs, then the computation time can be reduced to 1000/4 or 250D (due to parallel scanning). On the other hand, if one of the PN has more data (data skew), 30% of the total data size, the overall computation time increases to 1000×0.3 or 300D.

(25 pts.) We have a shared-nothing system with four processing nodes (PNs). The Grace algorithm is used to join two relations R and S as discussed in class. Each relation has 1,000,000 tuples stored in 10,000 pages. We make the following assumptions:

The two operand relations are initially evenly distributed among the four PNs.

Data skew does not occur (i.e., the workload is balanced throughout the computation).

2-Phase join is used to join the bucket pairs. Probing the hash table to find the matches incurs negligible computation time.

Read or write a page takes D time units.

Communication time is negligible.

Estimate the computation time (i.e., response time) for this join operation.

(25 pts.) Consider the same join operation described in Question 1, but with a data skew problem. The Hash Phase results in PN1 having 40% of the total data from each of the operand relations. Estimate the computation time of this join operation using Grace algorithm.

(25 pts.) We apply TIJ to join two relations R and S, each with 1,000,000 tuples stored in 10,000 pages. Initially, PN1 has 40% of the R and S tuples. PN2, PN3, and PN4 each has 20% of the total tuples. Estimate the computation time of this join operation.

(25 pts.) If we use ABJ algorithm for Question 3, estimate the computation time.