《数据库理论与技术》期终考试复习题⼀、选择题:
1.Chubby presents itself to clients as this service:
a) Centralized mutual exclusion b) Hierarchical mutual exclusionc) Token-based mutual exclusion d) Contention-based mutual exclusion.2.Which statement best describes a client's interaction with Chubby?
a) A client contacts any Chubby server, which then forwards the request to the appropriateserver in the Chubby cell.
b) A client contacts any Chubby server, which can process the request since all Chubbyservers are replicated.
c) A client contacts the Chubby master, which then forwards the request to the appropriateserver in the Chubby cell.
d) A client contacts the Chubby master, which handles all client requests.
3.By sending messages through Paxos, processes can ensure that concurrent messages become:a) Global time ordered multicasts. b) Totally ordered multicasts.c) Causally ordered multicasts. d) Unordered multicasts.
4.Based on their vector timestamps, which event causally precedes (4, 2, 8, 5)?a) (3, 1, 7, 7) b) (5, 1, 6, 2) c) (4, 2, 8, 4) d) (4, 3, 8, 5)
5.With Paxos, message sequence numbers are assigned by the:a) Acceptor b) Client c) Learner d) Proposer6.Paxos reaches agreement when:
a)All proposers agree on a value to send to the acceptors.b)All acceptors agree to a proposed value.
c)The majority proposers agree on a value to send to the acceptors.d)The majority of acceptors agree to a proposed value.7. A write-ahead log does NOT enable a process to:
a) Undo changes in case of an abort. b) Record that a transaction has committed.c) Record the response that was sent to a vote in a commit protocol.d) Achieve higher performance by prefetching data.8.To abort a distributed transaction:a)At least one participant must vote to abort.b)The majority of participants must vote to abort.c)All of the participants must vote to abort.d)All live participants must vote to abort.
9.The motivation for an eventual consistency model was:
a)It is impossible to have highly available replicated data that is fully consistent in asystem that can survive network partitioning.
b)Data inconsistencies are highly undesirable, so the primary focus should be on ensuringthat all replicas are consistent.
c)Distributed commit protocols can never work reliably, so data is bound to becomeinconsistent on some participants.
d)Data might be in an inconsistent state during the execution of transactions but will bemade consistent when they commit.
10.In Bigtable, each server is responsible for serving tablets. A tablet is:a)A subset of rows and column families.
b)A subset of rows but stores all column family data for those rows.
c)A set of one or more column families but it stores all row data for those column families.d)Exactly one row and one column family.11.Differing from a distributed hash table, Bigtable:a) Can support replication for fault tolerance.
b) Supports more than one primary key for looking up data.c) Can associate large amounts of data with a key.
d) Sorts its data by the key and allows iteration over rows of sorted data.
12.Which of these operations is most efficiently implemented on a large-scale GFS (Google FileSystem) system?
a) Read one 1 TB file. b) Read 1 million 1 MB files.c) Write one 1 TB file. d) Write 1 million 1 MB files.13.The partitioning function in MapReduce
a) Determines which shard will be assigned to a specific map worker.
b) Filters out unnecessary input data prior to being processed by the map worker.c) Determines the division of available servers into map workers and reduce workers.d) Determines which reduce worker will process data associated with a particular key.
14.Amazon Dynamo allows an administrator to add a greater load to a bigger server in the groupby
a) Configuring the client library to issue a higher percentage of its requests to that server.b) Configuring other servers to forward some percentage of their requests to the biggerserver.
c) Using consistent hashing and giving a larger consecutive range of the hash space to theserver.
d) Assigning more virtual nodes to that server.
15.Which of the following is NOT a responsibility of a map worker in the MapReduceframework:
a) Generate (key, value) pairs.
b) Target (key, value) data for one of Rreduce workers.
c) Partition original data into shards. d) Discard data of no interest.16. A key design principle of the Google cluster architecture is
a) Merging is fast; break up databases into lots of pieces and use lots of processes, eachworking on a piece of the data.
b) Minimize the number of phases in a task; create many replicas of a database and alloweach system full access to it without the need for sharing or locking.
c) Minimize context switch overhead; machines are cheap and it's more efficient to devote anode to one task exclusively.
d) Machines and software fail; run the same task redundantly in parallel to ensure successfulexecution.⼆、简答题:
1.考虑⽤⼆元联系(图1)对三元联系(图2)的表⽰:
图1
图2
1)分别给出图1中E,A,B,C,R A,R B和R C的⼀个实例,这些实例不对应图2中A,B,C和R的任何实例;
2)更改图1中的ER图,引⼊适当的约束以确保满⾜约束的E,A,B,C,R A,R B和RC的任何实例都对应于A,B,C和R的⼀个实例;3)更改以上的转化以表⽰在三元联系上的全参与约束;
2. Suppose that we are using extendable hashing on a file that containsrecords with the following search-key values:2, 3, 5, 7, 11, 17, 19, 23, 29, 31,35,27
1) Show the extendable hash structure for this file if the hash function ish(x) = x mod 11 and buckets can hold three records.
2) Show how the extendable hash structure of part 1) changes as theresult of each of the following steps:a. Delete 11.b. Insert 15.c. Delete 31.d. Insert 25.
3. The key-value store uses quorums for consistency. The total number of
replicas, N, for a key, is fixed – however, N may be different for different keys. Each read has to access at least r replicas (and returns if all of them agree), while eachwrite has to write to at least w replicas.For each of the following design choices, say whether it (by itself) does or does not guarantee strong consistency, i.e., one copyserializability?a. w=3, r=1b. w = 2N/3c. r+w = Nd. w = Ne. r + w > N/2f. r + w > 3N/2
g. r + w > 3N/2, w > 2N/3
4. The following scenario is a distributed file system which spans multiple
datacenters. There are several choices for how to handle network partitions that occur in between datacenters (given below). For each of these choices, tell me: if itviolates consistency?
a. Allow all partitions to process both reads and writes.
b. Allow all partitions to process reads, but only one special partition(prechosen)to process writes.
c. Allow only the partition which has at least a quorum number of servers(measured across all datacenters) to execute writes.d. Until partitions are repaired, allow only reads but no writes.e. Allow only partitions with a quorum of servers (measured across alldata centers) to execute writes and reads.
5. Lamport’s Logical Clocks, amon g other things, help make inferences about
consistency of replicated data items operated upon concurrently. The following figure shows the time-line diagrams of three processes P, Q and R.a. Fill in the logical times for each event (a)–(o) in the three processesusing the rules of local ordering, send/receive ordering and transitiveclosure.
b.Assuming you are given only the logical times for each event in a process,answer the following True, False or Unknown questionsi.Event b happened before event a.ii.Events a, h and m happened concurrently.iii.Event n happened before event c.
iv.Events d, k and n occur at the same time instant.v.Event g did not happen before event l.vi.Event o happened after event l.
6.设某WEB内容缓存系统采⽤⼀致性哈希算法进⾏内容分发管理。其环形哈希表地址空间⼤⼩N=4096,存储节点A, B, C, D的哈希值分别为: 13,1025,2047,3011;
给定数据对象obj1~obj14,其ID值分别为:56,256,643,654,726,1365,1276,652,1887,1535,827,2323,892, 1337。哈希值计算⽅式为:h(id) = (3*id)mod 4096
a.对给定数据对象obj1~obj14进⾏哈希计算,按顺时针⽅向将其映射到离其最近的存储节点节点上;
b.在a)的结果基础上新增存储节点E和F(哈希值分别为1657,3987),并对数据对象重新进⾏映射;
c.在b)的结果上插⼊数据对象obj15,obj16,obj17,其ID值分别如下:835,1765,2325
d.在c)的结果基础上删除存储节点D, 并对数据对象重新进⾏映射;
7. 假设⼀个事务应⽤是⽤SQL嵌⼊到C中编写的,其中80%的时间花费在运⾏SQL
语句上,其余20%的时间花费在运⾏C代码上,如果只对SQL进⾏了并⾏化,那么可以期望得到多⼤的加速⽐? 为什么?8. 利⽤频率分布直⽅图建⽴负载均衡的范围划分向量:
1) 假设有⼀个取值范围为1~100的直⽅图,划分为以下10个范围:1~10,11~20,…,91~100。其出现频率分别为: 15,5,20,10,10,5,5,20,5,5。请给出⼀个负载均衡的范围划分向量,将这些值划分为5个部分。
2) 对于给定的包括n个范围的频率分布直⽅图,请说明应如何计算出划分为p(p
因篇幅问题不能全部显示,请点此查看更多更全内容