GFS

Goals (shared storage)

capacity
- e.g., 1000 servers, 300TB
performance
fault tolerance
- map-reduce

Approach

filesystem-like API
- proprietary library (write/read/append)
- not POSIX
single master (metadata)
- (filename, offset) -> chunk
chunk size: 64MB. why not smaller, such as 4KB(hdd), 4MB (ssd)?
3-way replication

How GFS works

client communicates with master to retrieve data locations
client buffers information locally
client sends to data servers

Performance

why single master and 64 MB sufficient?
- workload: large files; sequential read/write
not a good design if
- small files (aggregate)
- random accesses (buffer)

Consistency

correctness: outcome = expectation
- concurrency
- failures
tradeoff
- weak consistency: easier to implement, hard to use
- strong consistency: hard to implement, easy to use

Case 1 (strawman, inconsistent with concurrency)

S1: C1  C2
S2: C2  C1

Case 2 (consistent with concurrency)

S1(P): C1 C2 C1-id C2-id      
S2   : C2 C1             S1-id-C1 S1-id-C2

Case 3 (consistent but undefined)

a write operation breaks into many smaller write operations

Case 4 (inconsistent with failures)

follower fails
primary fails
two primary? (what about serial numbers?)
- leases to avoid two masters

Case 5 (more consistency anomaly)

S1(P): C1 C2 C2-id C1-id C3-read
S2   : C2 C1                     S1-id-C2 C3-read S1-id-C1 C3-read