Raft的TLA+验证介绍

介绍使用TLA+对Raft的验证工作。

Ongaro等已经在Github上提供了一版TLA+的实现

运行

容易发现,源码里面不包含TLC的Config,但是我们在Issue 1里面能找到一个。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
CONSTANTS Server = {r1,r2,r3}
Value = {v1,v2}
Follower = Follower
Candidate = Candidate
Leader = Leader
Nil = Nil
RequestVoteRequest = RequestVoteRequest
RequestVoteResponse = RequestVoteResponse
AppendEntriesRequest = AppendEntriesRequest
AppendEntriesResponse = AppendEntriesResponse
TLC_MAX_TERM = 3
TLC_MAX_ENTRY = 1
TLC_MAX_MESSAGE = 1
\* PNat = {1,2,3,4,5}
\* Nat = {0,1,2,3,4,5}
\*SYMMETRY Perms
SPECIFICATION Spec
\*CONSTRAINT TermConstraint
\*CONSTRAINT LogConstraint
\*CONSTRAINT MessageConstraint
\*INVARIANT AtMostOneLeaderPerTerm
\*INVARIANT TermAndIndexDeterminesLogPrefix
\*INVARIANT StateMachineSafety
\*INVARIANT NewLeaderHasCompleteLog
\*INVARIANT CommitInOrder

\*INVARIANT MessageTypeInv
\*INVARIANT TypeInv

需要限制一下Model

1
2
3
4
5
6
/\ \A i \in Server : Len(log[i]) =< maxLength
/\ \A i \in Server : currentTerm[i] =< maxTerm
/\ Cardinality(DOMAIN messages) =< maxMessage
/\ restartNum =< maxRestartNum
/\ timeoutNum =< maxTimeoutNum
/\ actionNum =< maxActionNum

timeoutNum必须至少为1,不然不能触发选举

列出一下Safety条件

1
OneLeader == Cardinality( {i \in Server : state[i] = Leader} ) <= 1

注意如下错误

1
2
The subscript of the next-state relation specified by the specification
does not seem to contain the state variable xxx

看看xxx是不是在vars里面

1
Spec == Init /\ [][Next]_vars

打印语句

1
Print(内容, Print语句的值)

发现maxMessage必须为1,其他都可以变。

总体模型

变量介绍

下面的变量是全局的:

  1. messages
    表示从一个Server发送到另一个Server的消息。因为TLAPS不支持Bags模块(也就是所谓的multiset),所以这是一个将消息映射为Nat的函数
    通过WithMessage(m, msgs)messages添加一条消息,其中m是我们需要发送的消息,msgs就是messages这个变量。

    1
    2
    3
    4
    5
    WithMessage(m, msgs) ==
    IF m \in DOMAIN msgs THEN
    [msgs EXCEPT ![m] = msgs[m] + 1]
    ELSE
    msgs @@ (m :> 1)

    消息的格式如下图所示

    1
    2
    3
    4
    5
    6
    [mtype         |-> RequestVoteRequest,
    mterm |-> currentTerm[i],
    mlastLogTerm |-> LastTerm(log[i]),
    mlastLogIndex |-> Len(log[i]),
    msource |-> i,
    mdest |-> j]
  2. elections
    仅在证明(proof)中需要,实现(implementation)并不需要。
    这个用来记录所有成功的选举。包括Leader和Voter的日志。其中一个元素类似下面。

    1
    2
    3
    4
    5
    {[ eterm     |-> currentTerm[i],
    eleader |-> i,
    elog |-> log[i],
    evotes |-> votesGranted[i],
    evoterLog |-> voterLog[i]]}
  3. allLogs
    同样仅在证明中需要。
    集合了每个Server上的日志

    1
    /\ allLogs' = allLogs \cup {log[i] : i \in Server}

下面的变量是Per Server的:

  1. serverVars
    <<currentTerm, state, votedFor>>
    在Init中,currentTerm初始化为1,state初始化为Follower,voteFor初始化为Nil。

  2. logVars
    <<log, commitIndex>>
    其中log是A Sequence of log entries,log[i]表示Server i上的日志。
    在Init中,log初始化为

    1
    log = [i \in Server |-> << >>]

    每个Log Entry的格式是

    1
    2
    [term  |-> currentTerm[i],
    value |-> v]

    commitIndex初始化为0。

  3. candidateVars
    <<votesResponded, votesGranted, voterLog>>

  4. leaderVars
    <<nextIndex, matchIndex, elections>>
    在Init中,nextIndex全部被初始化为1,matchIndex全部被初始化为0。

Helper

Quorum实际上是一个集合的集合。表示Server中所有可以形成Quorum的子集。

1
Quorum == {i \in SUBSET(Server) : Cardinality(i) * 2 > Cardinality(Server)}

周边Action

Spec

1
2
3
4
5
6
7
8
9
10
11
12
Next == /\ \/ \E i \in Server : Restart(i)
\/ \E i \in Server : Timeout(i)
\/ \E i,j \in Server : RequestVote(i, j)
\/ \E i \in Server : BecomeLeader(i)
\/ \E i \in Server, v \in Value : ClientRequest(i, v)
\/ \E i \in Server : AdvanceCommitIndex(i)
\/ \E i,j \in Server : AppendEntries(i, j)
\/ \E m \in DOMAIN messages : Receive(m)
\/ \E m \in DOMAIN messages : DuplicateMessage(m)
\/ \E m \in DOMAIN messages : DropMessage(m)
\* History variable that tracks every log ever:
/\ allLogs' = allLogs \cup {log[i] : i \in Server}

主要Acttion

AppendEntries

Enabling条件,就是自己是Leader,并且不能自己给自己发消息。
SubSeq(s,m,n)表示<< s[m], s[m+1], ..., s[n] >>

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
AppendEntries(i, j) ==
/\ i /= j
/\ state[i] = Leader
/\ LET prevLogIndex == nextIndex[i][j] - 1
prevLogTerm == IF prevLogIndex > 0 THEN
log[i][prevLogIndex].term
ELSE
0
\* Send up to 1 entry, constrained by the end of the log.
lastEntry == Min({Len(log[i]), nextIndex[i][j]})
entries == SubSeq(log[i], nextIndex[i][j], lastEntry)
IN Send([mtype |-> AppendEntriesRequest,
mterm |-> currentTerm[i],
mprevLogIndex |-> prevLogIndex,
mprevLogTerm |-> prevLogTerm,
mentries |-> entries,
\* mlog is used as a history variable for the proof.
\* It would not exist in a real implementation.
mlog |-> log[i],
mcommitIndex |-> Min({commitIndex[i], lastEntry}),
msource |-> i,
mdest |-> j])
/\ UNCHANGED <<serverVars, candidateVars, leaderVars, logVars>>

对端处理AppendEntries

1
2
3
4
5
6
HandleAppendEntriesRequest(i, j, m) ==
LET logOk == \/ m.mprevLogIndex = 0
\/ /\ m.mprevLogIndex > 0
/\ m.mprevLogIndex <= Len(log[i])
/\ m.mprevLogTerm = log[i][m.mprevLogIndex].term
IN /\ m.mterm <= currentTerm[i]

下面处理分为三个部分:
拒绝

1
2
3
4
5
6
7
8
9
10
11
12
13
14
...
/\ \/ /\ \* reject request
\/ m.mterm < currentTerm[i]
\/ /\ m.mterm = currentTerm[i]
/\ state[i] = Follower
/\ \lnot logOk
/\ Reply([mtype |-> AppendEntriesResponse,
mterm |-> currentTerm[i],
msuccess |-> FALSE,
mmatchIndex |-> 0,
msource |-> i,
mdest |-> j],
m)
/\ UNCHANGED <<serverVars, logVars>>

切换为Follower

1
2
3
4
5
6
...
\/ \* return to follower state
/\ m.mterm = currentTerm[i]
/\ state[i] = Candidate
/\ state' = [state EXCEPT ![i] = Follower]
/\ UNCHANGED <<currentTerm, votedFor, logVars, messages>>

接受

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
...
\/ \* accept request
/\ m.mterm = currentTerm[i]
/\ state[i] = Follower
/\ logOk
/\ LET index == m.mprevLogIndex + 1
IN \/ \* already done with request
/\ \/ m.mentries = << >>
\/ /\ m.mentries /= << >>
/\ Len(log[i]) >= index
/\ log[i][index].term = m.mentries[1].term
\* This could make our commitIndex decrease (for
\* example if we process an old, duplicated request),
\* but that doesn't really affect anything.
/\ commitIndex' = [commitIndex EXCEPT ![i] =
m.mcommitIndex]
/\ Reply([mtype |-> AppendEntriesResponse,
mterm |-> currentTerm[i],
msuccess |-> TRUE,
mmatchIndex |-> m.mprevLogIndex +
Len(m.mentries),
msource |-> i,
mdest |-> j],
m)
/\ UNCHANGED <<serverVars, log>>
\/ \* conflict: remove 1 entry
/\ m.mentries /= << >>
/\ Len(log[i]) >= index
/\ log[i][index].term /= m.mentries[1].term
/\ LET new == [index2 \in 1..(Len(log[i]) - 1) |->
log[i][index2]]
IN log' = [log EXCEPT ![i] = new]
/\ UNCHANGED <<serverVars, commitIndex, messages>>
\/ \* no conflict: append entry
/\ m.mentries /= << >>
/\ Len(log[i]) = m.mprevLogIndex
/\ log' = [log EXCEPT ![i] =
Append(log[i], m.mentries[1])]
/\ UNCHANGED <<serverVars, commitIndex, messages>>
/\ UNCHANGED <<candidateVars, leaderVars>>

AppendEntries处理对端返回

RequestVote

Reference

  1. https://github.com/dranov/raft-tla/blob/master/thirdparty/raft_membership.tla
    Raft的Membership Change证明
  2. https://github.com/dranov/raft-tla/blob/master/thirdparty/raft_dricketts.tla
    一个带有很多Inv的Raft的TLA+
  3. https://github.com/tlaplus/CommunityModules/blob/master/modules/SequencesExt.tla
    Sequences扩展