GhostCell

GhostCell 是另一个内部可变性的实现。相比 RefCell,它实现的是编译期的检查。

https://plv.mpi-sws.org/rustbelt/ghostcell/paper.pdf

问题

Rust 中面临所谓 AXM 的问题。A 指的是 Aliasing,M 指的是 Mutability。AXM 表示 A Xor M,也就是这两个只能保证其一。

这让 Rust 容易实现树状结构,而难以实现图状结构(比如双向链表、图等)。其原因是图状结构中有环,环意味着 internal sharing。

对此,Rust 一般提供两种做法:

  1. Raw 指针
    具有下面的特点:
    • Unsafe,也就是 Rust 的类型系统无法进行约束
    • 没有 AXM 限制
  2. Interior Mutability
    • 非线程安全的有 Cell 和 RefCell,它们允许通过一个不可变借用 &T 去修改内部的对象
    • 线程安全的有 Mutex 和 RwLock

比如双向链表就可以写成

1
2
3
4
5
6
7
struct Node<T> {
data: T,
prev: Option<NodeRef<T>>,
next: Option<NodeRef<T>>,
}

type NodeRef<T>= &RwLock<Node<T>>;

作者就说这样的话一个 Node 会有一个 RwLock。如果希望链表可以被多个线程同时修改,这是有用的,但如果只是希望在整个链表上实现 AXM,就有点 overkill 了。

当然,我觉得始终可以去把整个链表外面包一层来解决 AXM 的问题的吧。实际上往后看,使用这样粗粒度的锁,也恰恰是 GhostCell 的 idea。

介绍

作者宣称 GhostCell 有如下的特性:

  1. 支持 internal sharing
  2. zero cost abstraction
  3. safe,通过 Coq 证明
  4. flexible,是 thread-safe 的,对 type T 没有要求
方案 访问延迟 (ns) 内存开销 (%) 线程安全
GhostCell 2.1 0
RwLock 42.7 300
RefCell 7.3 100
UnsafeCell 1.9 0

原理简介

作者说 Rust 将 permission 和 data 绑定了。所以这导致了 AXM 在一个很细的粒度上,比如链表中的每个节点。

我觉得这有点类似于 shared_ptr 的 alias 构造函数。

因此,GhostCell 将 permission 和 data 分开来。让一个 permission 可以和一整个 data 关联,比如整个链表。

使用 Brand Types 即 'id 去表示某个有状态的数据结构,来自于 ST monad 的启发。涉及使用 phantom types 和 rank-2 polymorphism 去模拟一个 lightweight form of stateful, dependently-typed API。

The basic technique of branded typesÐusing a static brand like ‘id as a representative of some dynamic, stateful data structureÐis an old one, dating back at least to the work of Launchbury and Peyton Jones [1995] on the ST monad in Haskell.

GhostCell 表示被 share 的某个对象。它常常是一个大结构里面的某个小元素,比如链表中的某个 Node。
GhostToken 是一个 Permission,可以用来访问任意的标有 'id 的 GhostCell。

在论文中,对 'id 进行了解释。它其实并不是一个 lifetime 的概念,而 GhostCell<'id, _> 都属于某个大结构,它被抽象为 'id。这样的 lifetime 称为 phantom lifetime。

rather, it merely serves as a static representative of the collection to which all the nodes of type GhostCell<'id, _> belong。

以链表为例,可以将 NodeRef 替换为 GhostCell 即可使用。

Ownership of GhostToken<'id> plays the role that in Rust is usually played by ownership of the Container: given a mutable reference to the token.

但是,和 Container 不同的是,我们不需要使用 RwLock<Container> 一样去 RwLock<GhostToken>。可以直接使用 GhostToken。无论底层同步是 sync::mpsc 还是 rayon::join 或者其他的方案。

这里的 tradeoff 是,RwLock 这样细粒度的锁允许链表中的不同节点被并发地修改或者读取。但是 GhostToken 是粗粒度的 permission tracking,因此并不支持这样。

Furthermore, the GhostCell API does not allow the user to mutate one GhostCell-wrapped node in a collection while simultaneously holding interior pointers into other nodes from the same collection. Still, for the common case where these restrictions are acceptable

使用

如下所示,创建一个 vec,其中元素都为 cell 的引用。我修改 vec[n / 2] 的值,然后再读取 cell,会发现 cell 的值也被修改了。

这里的逻辑没问题,但是 GhostToken 能够避免 borrow checker 的检查报错。

1
2
3
4
5
6
7
8
9
10
11
12
use ghost_cell::{GhostToken, GhostCell};

fn demo(n: usize) {
let value = GhostToken::new(|mut token| {
let cell = GhostCell::new(42);
let vec: Vec<_> = (0..n).map(|_| &cell).collect();
*vec[n / 2].borrow_mut(&mut token) = 33;
*cell.borrow(&token)
});

assert_eq!(value, 33);
}

原理详解

Phantom Lifetimes and Unchecked Indexing

An API for Branded Vectors

“unchecked indexing” 指的是,使用 vec[i] 访问元素时,会动态检查是否下标越界。但在一些程序中,这样的检查是可以被静态进行的。

这个 vector 的定义如下所示

push 会返回一个 BrandedIndex<'id>,通过它可以访问 bvec 中的元素。
可以通过 get_index,指定下标 index,去得到一个安全的 BrandedIndex。
从作者给的代码来看,bvec 也是嵌套在一个 lambda 里面作为参数给出的,写起来感觉挺丑的。原因也是为了构造 rank-2 多态。

1
2
3
4
5
6
7
8
9
10
11
12
let vec1: Vec<u8> = vec![10, 11];
let vec2: Vec<u8> = vec![20, 21];
BrandedVec::new(vec1, move |mut bvec1: BrandedVec<u8>| { // bvec1: BrandedVec<'id1, u8>
bvec1.push(12); let i= bvec1.push(13); // i1: BrandedIndex<'id1> is an index into bvec1
BrandedVec::new(vec2, move |mut bvec2: BrandedVec<u8>| { // bvec2: BrandedVec<'id2, u8>
let i= bvec2.push(22); // i2: BrandedIndex<'id2> is an index into bvec2
*bvec2.get_mut(i2) -= 1; // No bounds check! Updates to 21
println!("{:?}", bvec2.get(i2)); // No bounds check! Prints 21
println!("{:?}", bvec1.get(i1)); // No bounds check! Prints 13
// println!("{:?}", bvec2.get(i1)); // Rejected: iis not an index of bvec2
}); // end of `bvec2`'s closure
}); // end of `bvec1`'s closure

Implementation of Branded Vectors

BrandedVec 和 BrandedIndex 就是简单封装了下 Vec 以及 usize,但是带上了 InvariantLifetime。

1
2
3
4
// Phantom lifetime type that can only be subtyped by exactly the same lifetime `'id`.
struct InvariantLifetime<'id>(PhantomData<*mut &'id ()>);
struct BrandedVec<'id, T> { inner: Vec<T>, _marker: InvariantLifetime<'id> }
struct BrandedIndex<'id> { idx: usize, _marker: InvariantLifetime<'id> }

意思是 'id 会被编译期推断为是 invariant 的。这保障了我们不能通过 subtyping 去改变 BrandedVec 和 BrandedIndex 的类型。因此,同样的 'id 只能用来访问同一个 BrandedVec。

In general, the Rust compiler will automatically infer the variance of lifetime and type parameters such as ‘id and T, and here the type InvariantLifetime<’id> is carefully defined so that the compiler will infer the lifetime ‘id to be invariant. This ensures that we cannot change the brand of a BrandedVec<’id, T> or BrandedIndex<’id> via subtyping.

As mentioned in the introduction, implementing this API requires unsafe operations (code marked with an unsafe block) in the body of the get and get_mut functions, since they perform a vector access without bounds checks. This implies that we, as the developer of the library, have the obligation to prove that such accesses are actually safe under all possible interactions with well-typed clients.

GHOSTCELL: ZERO-OVERHEAD, THREAD-SAFE INTERIOR MUTABILITY

代码实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
type InvariantLifetime<'brand> = PhantomData<fn(&'brand ()) -> &'brand ()>;

/// A `GhostToken<'x>` is _the_ key to access the content of any `&GhostCell<'x, _>` sharing the same brand.
///
/// Each `GhostToken<'x>` is created alongside a unique brand (its lifetime), and each `GhostCell<'x, T>` is associated
/// to one, and only one, `GhostToken` at a time via this brand. The entire set of `GhostCell<'x, T>` associated to a
/// given `GhostToken<'x>` creates a pool of cells all being accessible solely through the one token they are associated
/// to.
///
/// The pool of `GhostCell` associated to a token need not be homogeneous, each may own a value of a different type.
pub struct GhostToken<'brand> {
_marker: InvariantLifetime<'brand>,
}

impl<'brand> GhostToken<'brand> {
/// Creates a fresh token to which `GhostCell`s can be tied to later.
///
/// Due to the use of a lifetime, the `GhostCell`s tied to a given token can only live within the confines of the
/// invocation of the `fun` closure.

#[allow(clippy::new_ret_no_self)]
pub fn new<R, F>(fun: F) -> R
where
for<'new_brand> F: FnOnce(GhostToken<'new_brand>) -> R,
{
let token = Self {
_marker: InvariantLifetime::default(),
};
fun(token)
}
}

impl<'brand, T: ?Sized> GhostCell<'brand, T> {
pub fn borrow<'a>(&'a self, _: &'a GhostToken<'brand>) -> &'a T {
// Safety:
// - The cell is borrowed immutably by this call, it therefore cannot already be borrowed mutably.
// - The token is borrowed immutably by this call, it therefore cannot be already borrowed mutably.
// - `self.value` therefore cannot be already borrowed mutably, as doing so requires calling either:
// - `borrow_mut`, which would borrow the token mutably.
// - `get_mut`, which would borrow the cell mutably.
unsafe { &*self.value.get() }
}

pub fn borrow_mut<'a>(&'a self, _: &'a mut GhostToken<'brand>) -> &'a mut T {
// Safety:
// - The cell is borrowed immutably by this call, it therefore cannot already be borrowed mutably.
// - The token is borrowed mutably by this call, it therefore cannot be already borrowed.
// - `self.value` therefore cannot already be borrowed, as doing so requires calling either:
// - `borrow` or `borrow_mut`, which would borrow the token.
// - `get_mut`, which would borrow the cell mutably.
unsafe { &mut *self.value.get() }
}

pub fn from_mut(t: &mut T) -> &mut Self {
// Safety:
// - `t` is mutably borrowed for the duration.
// - `GhostCell<'_, T>` has the same in-memory representation as `T`.
unsafe { mem::transmute(t) }
}
}

Reference

  1. https://www.bilibili.com/video/BV1HP4y1s762/