多进程应用调试

cargo miri 这样的工具通常是复杂的、嵌套的应用,因为它涉及这样的进程调用:

  1. cargo 通过参数识别并调用 cargo-miri
  2. cargo-mirimiri 的调用封装
  3. miri 是一个动态链接到 librustc 的应用
  4. cargo-miri 在编译项目期间调用 cargo build,从而会调用 rustcrustup 等应用

因此,当调试 cargo miri 时,需要经过数十个子进程才能达到 miri 进程,并且 cargo、cargo-miri、miri 这些应用不止一次地运行,这要求 GDB 调试工具具有控制父子进程、父子线程的能力。

GDB 有多种配置和方式来应对多进程调试。这是我使用它们的一些记录。

detach-on-fork

GDB 默认是 set detach-on-fork on,这意味着每当 fork 发生,新的子进程将分离 GDB 独立运行。

把它设置为 off,则表示 GDB 接手 fork 产生的新进程,从而父子进程都被 GDB 管理起来。

如果父进程阻塞等待子线程,而 GDB 只允许一个进程运行,那么设置了 detach-on-fork 会让两个进程相互等待:

>>> si
Can not resume the parent process over vfork in the foreground while
holding the child stopped.  Try "set detach-on-fork" or "set schedule-multiple".
clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:62
62      in ../sysdeps/unix/sysv/linux/x86_64/clone3.S

现代 glibc 的 fork() 底层可能使用 clone3 系统调用,而某些 clone 标志组合会触发类似 vfork 的行为(父进程阻塞直到子进程释放内存空间)。

schedule-multiple on 让 GDB 可以同时恢复父子进程,因此我们需要下面的设置让 GDB 能顺利进入子进程调试。

# 管理子进程
set detach-on-fork off
# 调度多个进程
set schedule-multiple on

示例:使用上面的设置来调试下面的 Rust 代码

fn main() {
    println!("main thread: start");
    let output = std::process::Command::new("echo")
        .arg("hi")
        .stdout(std::process::Stdio::piped())
        .spawn()
        .unwrap()
        .wait_with_output()
        .unwrap();
    let hi = std::str::from_utf8(&output.stdout).unwrap();
    println!("main thread: {hi}");
}
>>> run # 自动跳入子进程上运行
process 2910868 is executing new program: /usr/bin/echo
[Inferior 2 (process 2910868) exited normally]
>>> i inferiors # 查看进程情况:echo 进程执行结束,并保持父进程 hi
  Num  Description       Connection           Executable        
  1    process 2910865   1 (native)           /home/gh-zjp-CN/tmp/hi/target/debug/hi 
* 2    <null>                                 /usr/bin/echo     

>>> inferior 1 # 切换到父进程
[Switching to inferior 1 [process 2910865] (/home/gh-zjp-CN/tmp/hi/target/debug/hi)]
[Switching to thread 1.1 (Thread 0x7ffff7f82780 (LWP 2910865))]
#0  0x00007ffff7d107d7 in __GI___wait4 (pid=2910868, stat_loc=0x7fffffffd4e0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
warning: 30     ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory
>>> i inferiors 
  Num  Description       Connection           Executable        
* 1    process 2910865   1 (native)           /home/gh-zjp-CN/tmp/hi/target/debug/hi 
  2    <null>                                 /usr/bin/echo     
>>> c
Continuing.
main thread: hi

[Inferior 1 (process 2910865) exited normally]

catch exec

上面的例子成功在子进程发起之后切换并执行。但这并不足够,因为我们需要自动切换但不自动执行。

可以利用 catch exec 设置一个“断点”,它捕捉 exec 系统调用,并在切换子进程之后暂停,然后我们能够正常对它调试,比如查询符号、设置断点、运行 continue 等

>>> set detach-on-fork off
>>> set schedule-multiple on
>>> catch exec
Catchpoint 1 (exec)
>>> r
Starting program: /home/gh-zjp-CN/tmp/hi/target/debug/hi 

main thread: start
[New inferior 2 (process 2914989)]
[Switching to process 2914989]

Thread 2.1 "echo" hit Catchpoint 1 (exec'd /usr/bin/echo), 0x00007ffff7fe4540 in _start () from /lib64/ld-linux-x86-64.so.2
>>> i inferiors # 注意 echo 进程有连接状态,而不是 null
  Num  Description       Connection           Executable        
  1    process 2914985   1 (native)           /home/gh-zjp-CN/tmp/hi/target/debug/hi 
* 2    process 2914989   1 (native)           /usr/bin/echo     
>>> f # 查看当前栈帧,我们停在子进程的程序入口
#0  0x00007ffff7fe4540 in _start () from /lib64/ld-linux-x86-64.so.2

follow-fork-mode

在所有设置都是默认情况下, follow-fork-mode 为 parent,这意味着程序发起子进程之后,GDB 不会进入子进程执行。

我们可以设置它为 child,每当 fork 发生,让 GDB 跟随子进程执行:

>>> set follow-fork-mode child

>>> r
Starting program: /home/gh-zjp-CN/tmp/hi/target/debug/hi 
main thread: start
[Attaching after Thread 0x7ffff7f82780 (LWP 2968237) vfork to child process 2968252]
[New inferior 2 (process 2968252)]
[Detaching vfork parent process 2968237 after child exec]
[Inferior 1 (process 2968237) detached]
process 2968252 is executing new program: /usr/bin/echo
warning: could not find '.gnu_debugaltlink' file for /usr/bin/echo
main thread: hi

[Inferior 2 (process 2968252) exited normally]

>>> i inferiors 
  Num  Description       Connection           Executable        
  1    <null>                                 /usr/bin/echo     
* 2    <null>                                 /usr/bin/echo     

可以看到只设置 follow-fork-modechild,GDB 在子进程发生之后直接进入并执行,并解除对父进程的控制,最终父子进程都运行结束,中途没有停下。

设置 catch exec 可在进入子进程之后停下,但父进程依然游离于 GDB:

>>> set follow-fork-mode child

>>> catch exec

>>> r
Starting program: /home/gh-zjp-CN/tmp/hi/target/debug/hi 
main thread: start
[Attaching after Thread 0x7ffff7f82780 (LWP 2997061) vfork to child process 2997064]
[New inferior 2 (process 2997064)]
[Detaching vfork parent process 2997061 after child exec]
[Inferior 1 (process 2997061) detached]
process 2997064 is executing new program: /usr/bin/echo
[Switching to process 2997064]

Thread 2.1 "echo" hit Catchpoint 1 (exec'd /usr/bin/echo), 0x00007ffff7fe4540 in _start () from /lib64/ld-linux-x86-64.so.2
>>> i inferiors 
  Num  Description       Connection           Executable        
  1    <null>                                 /usr/bin/echo     
* 2    process 2997064   1 (native)           /usr/bin/echo     

但这种模式只适用于线性的进程调试,对我们不太有用,因为

  • 遇到子进程就执行
  • 丢失对父进程的控制权

而我们需要的是:

  • 保持 GDB 对 cargo miri 的父进程控制
  • 遇到子进程,检查是否为 miri:
    • 是:停下
    • 不是:执行到结束,返回父进程继续执行

GDB Python API

gdb 对象

除了官方的 Python API 文档,我们可以查看一个对象的 docstring 介绍,以及它可供调用的方法:

>>> python help(gdb.inferiors)
Help on built-in function inferiors in module _gdb:

inferiors(...)
    inferiors () -> (gdb.Inferior, ...).
    Return a tuple containing all inferiors.
>>> python print(dir(gdb.inferiors()[0]))
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__',
 '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__
', '__sizeof__', '__str__', '__subclasshook__', 'architecture', 'arguments', 'clear_env', 'connection', 'connection_num', 'is_valid', 'main_na
me', 'num', 'pid', 'progspace', 'read_memory', 'search_memory', 'set_env', 'thread_from_handle', 'thread_from_thread_handle', 'threads', 'unse
t_env', 'was_attached', 'write_memory']
>>> python print(gdb.inferiors()[0].num)
1
>>> python print(gdb.inferiors()[0].pid)
2315217
>>> i inferiors 
  Num  Description       Connection           Executable        
* 1    process 2315217   3 (native)           /home/gh-zjp-CN/tmp/hi/target/debug/hi 

安装最新的 GDB

像 events.selected_context 接口需要 GDB 版本为 15.1 以上,而 apt 安装的 GDB 比较旧,因此需要源码安装。

查看 最新的 版本,本地非 root 用户安装 GDB 命令:

wget https://ftp.gnu.org/gnu/gdb/gdb-17.1.tar.xz
tar -xf gdb-17.1.tar.xz
rm ./gdb-17.1/build/ -rf
rm ~/.local/bin/gdb/ -rf
mkdir ./gdb-17.1/build
cd ./gdb-17.1/build

export MY_GDB_ROOT=$HOME/.local/bin/gdb

wget https://ftp.gnu.org/gnu/ncurses/ncurses-6.6.tar.gz
tar -xf ncurses-6.6.tar.gz
cd ncurses-6.6
./configure --prefix=$MY_GDB_ROOT \
  --with-shared \
  --enable-widec \
  --with-termlib \
  --enable-pc-files \
  --with-pkg-config-libdir=$MY_GDB_ROOT/lib/pkgconfig
make -j$(nproc)
make install
cd ..

wget https://ftp.gnu.org/gnu/readline/readline-8.2.tar.gz
tar -xf readline-8.2.tar.gz
cd readline-8.2
./configure --prefix=$MY_GDB_ROOT \
  --with-curses \
  LDFLAGS="-L$MY_GDB_ROOT/lib" \
  CPPFLAGS="-I$MY_GDB_ROOT/include" \
  SHLIB_LIBS="-lncursesw"
make SHLIB_LIBS="-L$MY_GDB_ROOT/lib -lncursesw" -j$(nproc)
make install
cd ..

wget https://ftp.gnu.org/gnu/gmp/gmp-6.3.0.tar.xz
tar -xf gmp-6.3.0.tar.xz
cd gmp-6.3.0
./configure --prefix=$MY_GDB_ROOT
make -j$(nproc)
make install
cd ..

wget https://ftp.gnu.org/gnu/mpfr/mpfr-4.2.2.tar.xz
tar -xf mpfr-4.2.2.tar.xz
cd mpfr-4.2.2
./configure --prefix=$MY_GDB_ROOT --with-gmp=$MY_GDB_ROOT
make -j$(nproc)
make install
cd ..

wget https://github.com/libexpat/libexpat/releases/download/R_2_7_5/expat-2.7.5.tar.xz
tar -xf expat-2.7.5.tar.xz
cd expat-2.7.5
./configure --prefix=$MY_GDB_ROOT
make -j$(nproc)
make install
cd ..

git clone https://github.com/intel/libipt.git
cd libipt
mkdir build && cd build
cmake .. \
  -DCMAKE_INSTALL_PREFIX=$MY_GDB_ROOT \
  -DCMAKE_INSTALL_LIBDIR=lib \
  -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)
make install
cd ../..

# liblzma
wget https://github.com/tukaani-project/xz/releases/download/v5.8.2/xz-5.8.2.tar.xz
tar -xf xz-5.8.2.tar.xz
cd xz-5.8.2
./configure --prefix=$MY_GDB_ROOT --disable-doc
make -j$(nproc)
make install
cd ..

export LDFLAGS="-L$MY_GDB_ROOT/lib -L$MY_GDB_ROOT/lib64 -Wl,-rpath,$MY_GDB_ROOT/lib -Wl,-rpath,$MY_GDB_ROOT/lib64"
export CPPFLAGS="-I$MY_GDB_ROOT/include"
export CFLAGS="-O2 -I$MY_GDB_ROOT/include"
export CXXFLAGS="-O2 -I$MY_GDB_ROOT/include"

../configure --prefix=$MY_GDB_ROOT \
  --with-python=$(which python3) \
  --enable-targets=all \
  --enable-64-bit-bfd \
  --with-python=python3 \
  --with-guile=no \
  --enable-tui \
  --with-source-highlight \
  --with-expat \
  --with-system-readline \
  --with-libipt-prefix=$MY_GDB_ROOT \
  --with-libreadline-prefix=$MY_GDB_ROOT \
  --with-libgmp-prefix=$MY_GDB_ROOT \
  --with-libmpfr-prefix=$MY_GDB_ROOT \
  --with-libexpat-prefix=$MY_GDB_ROOT \
  --with-liblzma-prefix=$MY_GDB_ROOT \
  --enable-sim \
  LDFLAGS="$LDFLAGS" \
  CPPFLAGS="$CPPFLAGS" \
  CFLAGS="$CFLAGS" \
  CXXFLAGS="$CXXFLAGS"
# --with-intel-pt \

make -j$(nproc)
make install

添加本用户 GDB 动态库和路径到环境变量:

# ~/.bashrc
LOCAL_BIN=/${HOME}/.local/bin
GDB_HOME="$LOCAL_BIN/gdb"
export PATH=${GDB_HOME}/bin:$PATH
export LD_LIBRARY_PATH=${GDB_HOME}/lib:$LD_LIBRARY_PATH

其他资料