Two Phase Locking with two-phase commit (2PL+2PC)

How to ensure a serializable schedule?

Locking-based approach
Strawman solution 1:
- Grab global lock when transaction starts
- release global lock when transaction finishes committing
Strawman solution 2:
- grab lock on item X before read/write X
- release lock on item X after read/write X

Problem with strawman 2?

Permits this non-serializable interleaving

R_1(C), R_1(S), W_1(C), R_3(C), R_3(S), Commit3, W_1(S), Commit1

Look at W_1(C), if lock on C is released, then another transaction T’ can read new value of C, but T’ must be able to read new value of S that T has not yet written yet.
If write lock is not held, —> read of uncommitted value
So, write lock must be held till the transaction commits

(Extra question) How about read locks, must it also be held till transaction commits?

R_3(C), R_1(C), R_1(S), W_1(S), W_1(S), Commit1, R_3(S), Commit3

This is a non-serializable schedule
- R_3(C) reads old value
- R_3(S) reads new value (non-repeatable reads)
So, read locks must also be held till commit time

2-phase-locking (2PL)

a growing phase in which transaction is acquiring locks
a shrinking phase in which locks are released
in practice
- growing phase is the entire transaction
- shrinking phase is during commit
Optimization: use read/write locks instead of exclusive locks
More on 2PL:
- what is a lock is unavailable?
- deadlock possible?
- how to cope with deadlocks?
  - grab locks in order? not always possible
  - (central) system detects deadlock cycles and aborts involved transactions
  - deadlock prevention
    - wound-wait
    - wait-die

2-phase-commit (2PC)

lock without 2PC
- coordinator –> server-a: lock a, log a=1, write a=1 to database state, unlock a
- coordinator –> server-b: lock b, log b=1, write b=1 to database state, unlock b
Problem (failure to commit)
- what if transaction cannot commit (e.g. deadlocks)
Problem (failure):
- coordinator crashes after message to a (before message to b).
A later transaction T2 sees a=1, but the information about b=1 is permanently lost!
Problem (serializability violation)
- a’s message to server A arrives read(a = 1) read(b = 0)
- b’s message to server B arrives
2PC (Two-phase commit)
- coordinator –> server-a: prepare-T1: server-a lock a, logs a=1,
- coordinator –> server-b: prepare-T1: server-b lock b, logs b=1
- coordinator –> server-a: commit-T1: write a=1 to database state, unlock a
- coordinator –> server-b: commit-T1: write b=1 to database state, unlock b
Now if coordinator crashes after prepare-a before prepare-b, a recovery protocol should abort T1
- (no other transactions can read a=1 since a is still locked)
Now if coordinator crashes after commit-a before commit-b, a recovery protocol should send commit-T1 to b.
How does the recovery protocol work? Two options:
- Option 1:
  - Coordinator can unilaterially determine the commit status of a transaction
  - e.g. Coordinator receives prepare-ok(T1) from server-a, but times out on server-b,
    - Coordinator can abort T1 (even if server-b has successfully prepared T1).
  - Coordinator durably logs its decision (e.g. to a Paxos RSM).
  - Recovery protocol reads from coordinator’s log to decide to commit or abort.
- Option 2:
  - Coordinator can not unilaterially determine the commit status of a transaction
  - If both server-a and server-b successfully prepared T1, then T1 must commit
    - participants log must be durable against failure (e.g. log replicated via Paxos RSM)
  - Recovery protocol must read all participating server’s log to decide commit or abort.