从汇编底层全面解析 CAS 的来龙去脉

一、引言

对于 Java 开发者而言，关于 CAS ，我们一般当做黑盒来进行使用，不需要去打开这个黑盒。

但随着目前程序员行业的发展，我们有必要打开这个黑盒，去探索其中的奥妙。

本期 CAS 源码解析文章，将带你领略 CAS 源码的奥秘

本源码文章吸收了之前 Spring、Kakfa、JUC源码文章的教训，将不再一行一行的带大家分析源码，我们将一些不重要的部分当做黑盒处理，以便我们更快、更有效的阅读源码。

虽然现在是互联网寒冬，但乾坤未定，你我皆是黑马！

废话不多说，发车！

二、使用

在 Java 中，CAS 操作是通过 JDK 提供的 java.util.concurrent.atomic 包下的 Atomic 系列类来实现的。

例如，AtomicInteger 类提供了原子性的加法、减法、比较和设置等操作，它们都是通过 CAS 操作来实现的。

Java 中的 CAS 操作通常使用 sun.misc.Unsafe 类来实现，因为 CAS 操作需要直接操作内存，而 Unsafe 类提供了直接操作内存的方法。

虽然 Unsafe 类是 Java 平台的内部实现细节，但是在一些高性能的并发编程库和框架中，仍然会使用 Unsafe 类来实现 CAS 操作。

public class Test {
    public static void main(String[] args) {
        AtomicInteger atomicInteger = new AtomicInteger();
        for (int i = 0; i < 10; i++) {
            atomicInteger.getAndIncrement();
        }
        System.out.println(atomicInteger.get());
    }
}

三、原理

CAS(Compare and Swap)是一种并发编程中常用的原子操作，用于实现多线程环境下的同步和互斥。

CAS 操作包括三个参数：

内存地址 V
原始值 A
新值 B。

如果当前内存地址的值等于 原始值 A，则将内存地址的值修改为 新值 B，否则不进行任何操作。

CAS 操作是原子的，即在同一时刻只有一个线程能够成功执行该操作。在这里插入图片描述

如上所示：

第一步：CPU 获取内存地址上的数据 V
第二步：CPU 将 原始值 与 数据 V 做对比
第三步：
- 如果相等，将 内存地址 的 数据V 更换成 新值
- 如果不相等，则不进行操作

四、源码

上面是 CAS 一些的基本使用和原理，老粉都知道，小黄主打的就是一个 源码硬核

我们继续分析其 HotSpot 中的实现

在 Java 代码中，我们追到下面这行代码就没办法继续往下追了

public final native boolean compareAndSwapInt(Object var1, long var2, int var4, int var5);

我们翻开 HotSpot 源码：

Atomic::cmpxchg_ptr(lock, obj()->mark_addr(), mark)

在不同的操作系统下面，实现不同。

1、Linux操作系统源码

以 linux x86 为例，它的 int 类型的 CAS 实现如下：

第一个参数是 exchange_value（新值）
第二个参数是 dest（目标地址）
第三个参数是 compare_value（原值）

inline void* Atomic::cmpxchg_ptr(void* exchange_value, volatile void* dest, void* compare_value) {
  return (void*)cmpxchg((jint)exchange_value, (volatile jint*)dest, (jint)compare_value);
}

咱们继续往下追：

inline jint Atomic::cmpxchg(jint exchange_value, volatile jint* dest, jint compare_value) {
  int mp = os::is_MP();
  __asm__ volatile (LOCK_IF_MP(%4) "cmpxchgl %1,(%3)"
                    : "=a" (exchange_value)
                    : "r" (exchange_value), "a" (compare_value), "r" (dest), "r" (mp)
                    : "cc", "memory");
  return exchange_value;
}

在 Linux 环境下，最终调的就是这个方法

2、Window操作系统源码

但实际上来说，Linux 下的方法不太方便我们去阅读源码，我们来看看 Window 下的实现

// atomic_windows_x86.inline.hpp
#define LOCK_IF_MP(mp) __asm cmp mp, 0  \
                       __asm je L0      \
                       __asm _emit 0xF0 \
                       __asm L0:
            
inline jint Atomic::cmpxchg (jint exchange_value, volatile jint* dest, jint compare_value) {
  // alternative for InterlockedCompareExchange
  int mp = os::is_MP();
  __asm {
    mov edx, dest
    mov ecx, exchange_value
    mov eax, compare_value
    LOCK_IF_MP(mp)
    cmpxchg dword ptr [edx], ecx
  }
}

我们一行一行的去进行分析：

mov edx, dest：获取内存地址 dest 数据放至 edx 寄存器中
mov ecx, exchange_value：将 新值 放入到 ecx 寄存器中
mov eax, compare_value：将 原值放入到 eax 寄存器中
LOCK_IF_MP(mp)：根据当前是否是多核进行加锁

当然，前面都不是我们的重点，我们的重点是下面这一行代码：

cmpxchg dword ptr [edx], ecx

首先我们先来看 dword ptr [edx] 这个是啥意思

dword ：全称是 doubleword

ptr：全称是 pointer，与前面的 dword 连起来使用，表明访问的内存单元是一个双字单元

[edx]：表示一个内存单元，edx 是寄存器，dest 指针值存放在 edx 中。那么 [edx] 表示内存地址为 dest 的内存单元

所以，dword ptr [edx] 的意思：访问内存地址为 dest 的双字内存单元

有人可能会疑惑，这里也没有我们上面说的 eax 里面的寄存器数据呀

不要着急，奥秘就在 cmpxchg 这个里面

我们看一下官方对于 cmpxchg 指令的定义：

Compares the value in the AL, AX, EAX, or RAX register with the first operand (destination operand). If the two values are equal, the second operand (source operand) is loaded into the destination operand. Otherwise, the destination operand is loaded into the AL, AX, EAX or RAX register. RAX register is available only in 64-bit mode.

This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically. To simplify the interface to the processor’s bus, the destination operand receives a write cycle without regard to the result of the comparison. The destination operand is written back if the comparison fails; otherwise, the source operand is written into the destination. (The processor never produces a locked read without also producing a locked write.)
    
    
// 翻译
将 AL、AX、EAX 或 RAX 寄存器中的值与第一个操作数（目标操作数）进行比较。如果两个值相等，则将第二个操作数（源操作数）加载到目标操作数中。否则，目标操作数被加载到 AL、AX、EAX 或 RAX 寄存器中。RAX 寄存器仅在 64 位模式下可用。

该指令可以与 LOCK 前缀一起使用，以允许指令以原子方式执行。为了简化与处理器总线的接口，目标操作数接收一个写周期而不考虑比较的结果。如果比较失败则写回目标操作数；否则，源操作数被写入目标。（处理器永远不会在不产生锁定写入的情况下产生锁定读取。）

所以，我们在这里看到了 EAX 寄存器的出现，将 AL、AX、EAX 或 RAX 寄存器中的值与第一个操作数（目标操作数）进行比较。如果两个值相等，则将第二个操作数（源操作数）加载到目标操作数中。 这一句的描述，也符合我们 CAS 的定义。

现在最关键的问题是，这里有 4 个寄存器，我们怎么才能知道走的是 EAX 寄存器呢？

Accumulator = AL, AX, EAX, or RAX depending on whether a byte, word, doubleword, or quadword comparison is being performed

// 翻译
累加器 = AL、AX、EAX 或 RAX，具体取决于执行的是字节、字、双字还是四字比较

这里我们看到了，访问不同模式的内存单元，走的寄存器是不同的：