# Question: 1 consider the following dataset bab3 bc01 cc2 cd5 cd3...

###### Question details

1. Consider the following dataset:

[‘bab3’, ‘bc01’, ‘cc2’, ‘cd5’, ‘cd3’, ‘cdx2’, ‘cdx1’, ‘e01’, ‘g02’, ‘ha1’, ‘hb1’, ‘hc8’, ‘hz5’, ‘z00’, ‘z01’, ‘bc01’]

a. Assume a hashing function that makes an assignment based on
the 1^{st} symbol of the string. So ‘bab3’ goes into
Bucket1 since it starts with ‘b’ and ‘cc1’ goes into Bucket2 since
it starts from ‘c’. (Yes, it is a very crude hash function)

[a-b] -> Bucket1

[c-d] -> Bucket2

[e-f] -> Bucket3

[g-z] -> Bucket4

Why (or why not?) would you consider it a good hashing function? Please note that an answer of yes or no (without an explanation) will not be credited.

b. Design your own (good) hash function based on the given data and using exactly 5 buckets. In this case, the goodness of the function is measured based on load-balancing of the data.

c. Suppose that the input dataset is:

[‘a1’, ‘a1’, ‘b1’, ‘d1’, ‘a1’, ‘a1’, ‘b1’, ‘c1’, ‘a2’, ‘c1’, ‘c1’, ‘a1’, ‘d2’,’d1’].

How would you design a hash function to partition this data into 3 buckets? Once again the goodness of hash function is measured based on even distribution (as even as possible).