Halo
发布于 2022-05-11 / 122 阅读 / 0 评论 / 0 点赞

c++内存模型

内存模型

静态内存模型

主要是类(或结构)对象在内存中的布局。也就是类(或结构)成员在内存中是如何存放。
可以参考书籍《深度探索C++对象模型》

动态内存模型

从行为方面来看,多个线程对同一个对象同时读写时所做的约束。
通常情况下讲的内存模型,都是指动态内存模型。

  • 描述编程语言在支持多线程编程中对共享内存访问的顺序
  • 或者指在单线程情况下CPU指令在多大程度上发生指令重排(reorder)

内存模型起因

  • 具有双CPU核,每个核有一个私有的64K的一级缓存,两核共享4MB的二级缓存以及8G内存。该架构下数据并不是CPU<–>RAM直接读写,而是要经过L1和L2。写时CPU写入L1 Cache中,再从L1存入RAM中。读时也是,先从L1中读,读不到再从RAM中读。
  • 从编译器层面也一样,为了获取更高的性能,也可能会对语句进行执行顺序上的优化

CPU体系结构内存顺序模型

强顺序模型(Total Store Order)

内存(在写操作上)是有一个全局的顺序的(所有人看到的一样的顺序), 就好像在内存上的每个Store动作必须有一个排队,一个弄完才轮到另一个,这个顺序和你的程序顺序直接相关。
所有的行为组合只会是所有CPU内存程序顺序的交织,不会发生和程序顺序不一致的地。
TSO模型有利于多线程程序的编写,对程序员更加友好,但对芯片实现者不友好。CPU为了TSO的承诺,会牺牲一些并发上的执行效率。

  • x86_64 和 Sparc 是强顺序模型

弱内存模型(Weak Memory Ordering)

CPU不去保证这个顺序模型(除非他们在一个CPU上就有依赖), 程序员要主动插入内存屏障指令来强化这个“可见性”。

  • ARMv8,PowerPC 和 MIPS 等体系结构都是弱内存模型。
  • 每种弱内存模型的体系架构都有自己的内存屏障指令,语义也不完全相同。
  • 弱内存模型下,硬件实现起来相对简单,处理器执行的效率也高, 只要没有遇到显式的屏障指令,CPU可以对局部指令进行reorder以提高执行效率。

如何保证内存的顺序或者数据的顺序

  • 如果对象不是原子类型,必须确保有足够的同步操作(比如互斥量、信号量、conditional variable 等)
  • 使用原子类型数据

C++11 除了提供内置的原子类型,更加通用使用方法是类模版的原子类型.

#include<stdatomic.h>

_Atomic(int) a; // or
_Atomic int b; // both of them are the atomic integer

struct Node
{
   int data;
   struct Node *next;
};
_Atomic struct Node s; //s is also an atomic type

std::atomic 的 memory order

enum class memory_order
{
memory_order_relaxed,
memory_order_consume, // load-consume
memory_order_acquire, // load-acquire
memory_order_release, // store-release
memory_order_acq_rel, // store-release load-acquire
memory_order_seq_cst // store-release load-acquire
};

memory_order_relaxed

The operation is ordered to happen atomically at some point.
This is the loosest memory order, providing no guarantees on how memory accesses in different threads are ordered with respect to the atomic operation.

#include <vector>
#include <iostream>
#include <thread>
#include <atomic>
 
std::atomic<int> cnt = {0};
 
void f()
{
    for (int n = 0; n < 1000; ++n) {
        cnt.fetch_add(1, std::memory_order_relaxed);
    }
}
 
int main()
{
    std::vector<std::thread> v;
    for (int n = 0; n < 10; ++n) {
        v.emplace_back(f);
    }
    for (auto& t : v) {
        t.join();
    }
    std::cout << "Final counter value is " << cnt << '\n';
}

Output:

Final counter value is 10000

memory_order_consume

Applies to loading operations
The operation is ordered to happen once all accesses to memory in the releasing thread that carry a dependency on the releasing operation (and that have visible side effects on the loading thread) have happened.

  • consume is cheaper than acquire. All CPUs (except DEC Alpha AXP’s famously weak memory model1) do it for free, unlike acquire. (Except on x86 and SPARC-TSO, where the hardware has acq/rel memory ordering without extra barriers or special instructions.)

  • On ARM/AArch64/PowerPC/MIPS/etc weakly-ordered ISAs, consume and relaxed are the only orderings that don’t require any extra barriers, just ordinary cheap load instructions. i.e. all asm load instructions are (at least) consume loads, except on Alpha. acquire requires LoadStore and LoadLoad ordering, which is a cheaper barrier instruction than a full-barrier for seq_cst, but still more expensive than nothing.

  • So yes, you’re correct that consume can be safely replaced with acquire, but you’re totally missing the point.

memory_order_acquire

Applies to loading operations
The operation is ordered to happen once all accesses to memory in the releasing thread (that have visible side effects on the loading thread) have happened.

memory_order_release

Applies to storing operations
The operation is ordered to happen before a consume or acquire operation, serving as a synchronization point for other accesses to memory that may have visible side effects on the loading thread.

#include <thread>
#include <atomic>
#include <cassert>
#include <string>
 
std::atomic<std::string*> ptr;
int data;
 
void producer()
{
    std::string* p  = new std::string("Hello");
    data = 42;
    ptr.store(p, std::memory_order_release);
}
 
void consumer()
{
    std::string* p2;
    while (!(p2 = ptr.load(std::memory_order_acquire)))
        ;
    assert(*p2 == "Hello"); // never fires
    assert(data == 42); // never fires
}
 
int main()
{
    std::thread t1(producer);
    std::thread t2(consumer);
    t1.join(); t2.join();
}

memory_order_acq_rel

Applies to loading/storing operations
The operation loads acquiring and stores releasing (as defined above for memory_order_acquire and memory_order_release).

memory_order_seq_cst

The operation is ordered in a sequentially consistent manner: All operations using this memory order are ordered to happen once all accesses to memory that may have visible side effects on the other threads involved have already happened.
This is the strictest memory order, guaranteeing the least unexpected side effects between thread interactions though the non-atomic memory accesses.
For consume and acquire loads, sequentially consistent store operations are considered releasing operations.

#include <thread>
#include <atomic>
#include <cassert>
 
std::atomic<bool> x = {false};
std::atomic<bool> y = {false};
std::atomic<int> z = {0};
 
void write_x()
{
    x.store(true, std::memory_order_seq_cst);
}
 
void write_y()
{
    y.store(true, std::memory_order_seq_cst);
}
 
void read_x_then_y()
{
    while (!x.load(std::memory_order_seq_cst))
        ;
    if (y.load(std::memory_order_seq_cst)) {
        ++z;
    }
}
 
void read_y_then_x()
{
    while (!y.load(std::memory_order_seq_cst))
        ;
    if (x.load(std::memory_order_seq_cst)) {
        ++z;
    }
}
 
int main()
{
    std::thread a(write_x);
    std::thread b(write_y);
    std::thread c(read_x_then_y);
    std::thread d(read_y_then_x);
    a.join(); b.join(); c.join(); d.join();
    assert(z.load() != 0);  // will never happen
}

更多

https://en.cppreference.com/w/cpp/atomic/memory_order


评论