Skip to content

[🍒][lldb][6.3] Android Support in LLDB#12312

Open
gmondada wants to merge 8 commits intoswiftlang:swift/release/6.3from
gmondada:gab/6.3/lldb-android-patch1+2
Open

[🍒][lldb][6.3] Android Support in LLDB#12312
gmondada wants to merge 8 commits intoswiftlang:swift/release/6.3from
gmondada:gab/6.3/lldb-android-patch1+2

Conversation

@gmondada
Copy link

@gmondada gmondada commented Feb 9, 2026

Fixes cherry-picked from stable/21.x, previously cherry-picked from llvm

Original PRs:

Risks:

  • The majority of the changes concern Android only, solving crucial bugs on this platform. Risk for breaking things outside Android support is very low.
  • Some changes are in the ModuleList. Here, risk is not zero, but these changes are part of LLVM since months.

Story:

cs01 and others added 8 commits February 9, 2026 20:43
…#159676)

Reattempt at llvm#145382 (cc
@labath). This time setenv() was replaced with set_env()
(llvm#145382 (comment)).

---------

Co-authored-by: Chad Smith <cssmith@meta.com>
(cherry picked from commit 223cfa8)
(cherry picked from commit fb5d4fe)
## Bug
Trying to attach to an android process by name fails:
```
(lldb) process attach -n com.android.bluetooth
error: attach failed: could not find a process named com.android.bluetooth
```

## Root Cause
PlatformAndroid does not implement the `FindProcesses` method.

As described in `include/lldb/Target/Platform.h`:
```
  // The base class Platform will take care of the host platform. Subclasses
  // will need to fill in the remote case.
  virtual uint32_t FindProcesses(const ProcessInstanceInfoMatch &match_info,
                                 ProcessInstanceInfoList &proc_infos);
```

## Fix
Implement the `FindProcesses` method in PlatformAndroid. Use adb to get
the pid of the process name on the device with the adb client connection
using `adb shell pidof <name>`.

## Reproduce
With an android device connected, run the following

Install and start lldb-server on device
```
adb push lldb-server /data/local/tmp/
adb shell chmod +x /data/local/tmp/lldb-server
adb shell /data/local/tmp/lldb-server platform  --listen 127.0.0.1:9500 --server
```

Start lldb, and run
```
platform select remote-android
platform connect connect://127.0.0.1:9500
log enable lldb platform
```

Connect to the process by name:
```
process attach -n com.android.bluetooth
```

## Test Plan
Before:
```
(lldb) process attach -n com.android.bluetooth
error: attach failed: could not find a process named com.android.bluetooth
```

After:
```
(lldb) process attach -n com.android.bluetooth
lldb             AdbClient::ResolveDeviceID Resolved device ID: 127.0.0.1
lldb             AdbClient::AdbClient(device_id='127.0.0.1') - Creating AdbClient with device ID
lldb             Connecting to ADB server at connect://127.0.0.1:5037
lldb             Connected to Android device "127.0.0.1"
lldb             Forwarding remote TCP port 38315 to local TCP port 35243
lldb             AdbClient::~AdbClient() - Destroying AdbClient for device: 127.0.0.1
lldb             gdbserver connect URL: connect://127.0.0.1:35243
lldb             AdbClient::AdbClient(device_id='127.0.0.1') - Creating AdbClient with device ID
lldb             Connecting to ADB server at connect://127.0.0.1:5037
lldb             Selecting device: 127.0.0.1
lldb             PlatformAndroid::FindProcesses found process 'com.android.bluetooth' with PID 2223
lldb             AdbClient::~AdbClient() - Destroying AdbClient for device: 127.0.0.1
llvm-worker-48   PlatformRemoteGDBServer::GetModuleSpec - got module info for (/apex/com.android.art/lib64/libc++.so:aarch64-unknown-linux-android) : file = '/apex/com.android.art/lib64/libc++.so', arch = aarch64-unknown-linux-android, uuid = 552995D0-A86D-055F-1F03-C13783A2A16C, object size = 754128
llvm-worker-9    PlatformRemoteGDBServer::GetModuleSpec - got module info for (/apex/com.android.art/lib64/liblzma.so:aarch64-unknown-linux-android) : file = '/apex/com.android.art/lib64/liblzma.so', arch = aarch64-unknown-linux-android, uuid = E51CAC98-666F-6C3F-F605-1498079542E0, object size = 179944
Process 2223 stopped
```

(cherry picked from commit a19c9a8)
(cherry picked from commit 651b8f2)
…HostInfo' (llvm#163075)

This should fix
llvm#163050 (comment)

Thank you for tagging me @trcrsired!

```
[105/1932] Building CXX object tools/lldb/source/Plugins/Platfor...d/CMakeFiles/lldbPluginPlatformAndroid.dir/PlatformAndroid.cpp.
FAILED: [code=1] tools/lldb/source/Plugins/Platform/Android/CMakeFiles/lldbPluginPlatformAndroid.dir/PlatformAndroid.cpp.o
/home/cqwrteur/toolchains/llvm/x86_64-linux-gnu/llvm/bin/clang++ --target=aarch64-linux-android30 --sysroot=/home/cqwrteur/toolchains/llvm/aarch64-linux-android30/aarch64-linux-android30 -DGTEST_HAS_RTTI=0 -DHAVE_ROUND -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/cqwrteur/toolchains_build/toolchainbuildscripts/llvm/.artifacts/llvm/aarch64-linux-android30/llvm/tools/lldb/source/Plugins/Platform/Android -I/home/cqwrteur/toolchains_build/llvm-project/lldb/source/Plugins/Platform/Android -I/home/cqwrteur/toolchains_build/llvm-project/lldb/include -I/home/cqwrteur/toolchains_build/toolchainbuildscripts/llvm/.artifacts/llvm/aarch64-linux-android30/llvm/tools/lldb/include -I/home/cqwrteur/toolchains_build/toolchainbuildscripts/llvm/.artifacts/llvm/aarch64-linux-android30/llvm/include -I/home/cqwrteur/toolchains_build/llvm-project/llvm/include -I/home/cqwrteur/toolchains_build/llvm-project/llvm/../clang/include -I/home/cqwrteur/toolchains_build/toolchainbuildscripts/llvm/.artifacts/llvm/aarch64-linux-android30/llvm/tools/lldb/../clang/include -I/home/cqwrteur/toolchains_build/llvm-project/lldb/source -I/home/cqwrteur/toolchains_build/toolchainbuildscripts/llvm/.artifacts/llvm/aarch64-linux-android30/llvm/tools/lldb/source -isystem /home/cqwrteur/toolchains/llvm/aarch64-linux-android30/aarch64-linux-android30/usr/include/libxml2 -fuse-ld=lld -fuse-lipo=llvm-lipo -flto=thin -Wno-unused-command-line-argument -rtlib=compiler-rt -stdlib=libc++ --unwindlib=libunwind -lc++abi -stdlib=libc++ -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wno-pass-failed -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -flto=thin -Wno-unknown-pragmas -Wno-strict-aliasing -Wno-vla-extension -O3 -DNDEBUG -std=c++17 -fPIC  -fno-exceptions -funwind-tables -fno-rtti -MD -MT tools/lldb/source/Plugins/Platform/Android/CMakeFiles/lldbPluginPlatformAndroid.dir/PlatformAndroid.cpp.o -MF tools/lldb/source/Plugins/Platform/Android/CMakeFiles/lldbPluginPlatformAndroid.dir/PlatformAndroid.cpp.o.d -o tools/lldb/source/Plugins/Platform/Android/CMakeFiles/lldbPluginPlatformAndroid.dir/PlatformAndroid.cpp.o -c /home/cqwrteur/toolchains_build/llvm-project/lldb/source/Plugins/Platform/Android/PlatformAndroid.cpp
/home/cqwrteur/toolchains_build/llvm-project/lldb/source/Plugins/Platform/Android/PlatformAndroid.cpp:66:48: error: use of undeclared identifier 'HostInfo'
   66 |     default_platform_sp->SetSystemArchitecture(HostInfo::GetArchitecture());
      |                                                ^
1 error generated.
[118/1932] Linking CXX shared library lib/libLLVMAnalysis.so.22.0git
ninja: build stopped: subcommand failed.
llvm: Ninja build failed for aarch64-linux-android30
```

(cherry picked from commit 55b0d14)
(cherry picked from commit 5073bd8)
## Summary

Fix `FindProcesses` to respect Android's `hidepid=2` security model and
enable name matching for Android apps.

## Problem

1. Called `adb shell pidof` or `adb shell ps` directly, bypassing
Android's process visibility restrictions
2. Name matching failed for Android apps - searched for
`com.example.myapp` but GDB Remote Protocol reports `app_process64`

Android apps fork from Zygote, so `/proc/PID/exe` points to
`app_process64` for all apps. The actual package name is only in
`/proc/PID/cmdline`. The previous implementation applied name filters
without supplementing with cmdline, so searches failed.

## Fix

- Delegate to lldb-server via GDB Remote Protocol (respects `hidepid=2`)
- Get all visible processes, supplement zygote/app_process entries with
cmdline, then apply name matching
- Only fetch cmdline for zygote apps (performance), parallelize with
`xargs -P 8`
- Remove redundant code (GDB Remote Protocol already provides GID/arch)

## Test Results

### Before this fix:

```
(lldb) platform process list
error: no processes were found on the "remote-android" platform

(lldb) platform process list -n com.example.hellojni
1 matching process was found on "remote-android"
PID    PARENT USER       TRIPLE                         NAME
====== ====== ========== ============================== ============================
5276   359    u0_a192                                   com.example.hellojni
                         ^^^^^^^^ Missing triple!
```

### After this fix:

```
(lldb) platform process list
PID    PARENT USER       TRIPLE                         NAME
====== ====== ========== ============================== ============================
1      0      root       aarch64-unknown-linux-android  init
2      0      root                                      [kthreadd]
359    1      system     aarch64-unknown-linux-android  app_process64
5276   359    u0_a192    aarch64-unknown-linux-android  com.example.hellojni
5357   5355   u0_a192    aarch64-unknown-linux-android  sh
5377   5370   u0_a192    aarch64-unknown-linux-android  lldb-server
                          ^^^^^^^^ User-space processes now have triples!

(lldb) platform process list -n com.example.hellojni
1 matching process was found on "remote-android"
PID    PARENT USER       TRIPLE                         NAME
====== ====== ========== ============================== ============================
5276   359    u0_a192    aarch64-unknown-linux-android  com.example.hellojni

(lldb) process attach -n com.example.hellojni
Process 5276 stopped
* thread llvm#1, name = 'example.hellojni', stop reason = signal SIGSTOP
```

## Test Plan

With an Android device/emulator connected:

1. Start lldb-server on device:
```bash
adb push lldb-server /data/local/tmp/
adb shell chmod +x /data/local/tmp/lldb-server
adb shell /data/local/tmp/lldb-server platform  --listen 127.0.0.1:9500 --server
```

2. Connect from LLDB:
```
(lldb) platform select remote-android
(lldb) platform connect connect://127.0.0.1:9500
(lldb) platform process list
```

3. Verify:
- `platform process list` returns all processes with triple information
- `platform process list -n com.example.app` finds Android apps by
package name
- `process attach -n com.example.app` successfully attaches to Android
apps

## Impact

Restores `platform process list` on Android with architecture
information and package name lookup. All name matching modes now work
correctly.

Fixes llvm#164192

(cherry picked from commit f8cb6cd)
(cherry picked from commit b7e8de3)
…160771)

llvm#159676 was recently landed.
After it was landed, additional tests were ran, where I saw an assertion
error on windows only. I am not able to test on windows, and the test is
a new test to mock an adb server. The mock server fails to start on
windows, so I am disabling to fix the breakage on main. This is not an
issue with the main code, only a test issue.

Relevant error output:
```
Step 8 (test-check-lldb-unit) failure: Test just built components: check-lldb-unit completed (failure)
******************** TEST 'lldb-unit :: Platform/Android/./AdbClientTests.exe/7/20' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:C:\buildbot\as-builder-10\lldb-x86-64\build\tools\lldb\unittests\Platform\Android\.\AdbClientTests.exe-lldb-unit-30696-7-20.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=20 GTEST_SHARD_INDEX=7 C:\buildbot\as-builder-10\lldb-x86-64\build\tools\lldb\unittests\Platform\Android\.\AdbClientTests.exe
--

Note: This is test shard 8 of 20.

[==========] Running 1 test from 1 test suite.

[----------] Global test environment set-up.

[----------] 1 test from AdbClientTest

[ RUN      ] AdbClientTest.RealTcpConnection

Assertion failed: error.Fail(), file C:\buildbot\as-builder-10\lldb-x86-64\llvm-project\lldb\source\Host\common\TCPSocket.cpp, line 254
```

(cherry picked from commit a85d3a5)
(cherry picked from commit bbf9afb)
…ith_unusual_process_name (llvm#173250)

This patch skips
TestPlatformProcessLaunchGDBServer.test_launch_with_unusual_process_name
on Windows which is flaky.

The test will be reenabled once it has been fixed.

(cherry picked from commit ec18557)
(cherry picked from commit 2d7e0b4)
…erver (llvm#148774)

Summary:
There was a deadlock was introduced by [PR
llvm#146441](llvm#146441) which changed
`CurrentThreadIsPrivateStateThread()` to
`CurrentThreadPosesAsPrivateStateThread()`. This change caused the
execution path in
[`ExecutionContextRef::SetTargetPtr()`](https://github.com/llvm/llvm-project/blob/10b5558b61baab59c7d3dff37ffdf0861c0cc67a/lldb/source/Target/ExecutionContext.cpp#L513)
to now enter a code block that was previously skipped, triggering
[`GetSelectedFrame()`](https://github.com/llvm/llvm-project/blob/10b5558b61baab59c7d3dff37ffdf0861c0cc67a/lldb/source/Target/ExecutionContext.cpp#L522)
which leads to a deadlock.

Thread 1 gets m_modules_mutex in
[`ModuleList::AppendImpl`](https://github.com/llvm/llvm-project/blob/96148f92146e5211685246722664e51ec730e7ba/lldb/source/Core/ModuleList.cpp#L218),
Thread 3 gets m_language_runtimes_mutex in
[`GetLanguageRuntime`](https://github.com/llvm/llvm-project/blob/96148f92146e5211685246722664e51ec730e7ba/lldb/source/Target/Process.cpp#L1501),
but then Thread 1 waits for m_language_runtimes_mutex in
[`GetLanguageRuntime`](https://github.com/llvm/llvm-project/blob/96148f92146e5211685246722664e51ec730e7ba/lldb/source/Target/Process.cpp#L1501)
while Thread 3 waits for m_modules_mutex in
[`ScanForGNUstepObjCLibraryCandidate`](https://github.com/llvm/llvm-project/blob/96148f92146e5211685246722664e51ec730e7ba/lldb/source/Plugins/LanguageRuntime/ObjC/GNUstepObjCRuntime/GNUstepObjCRuntime.cpp#L57).

This fixes the deadlock by adding a scoped block around the mutex lock
before the call to the notifier, and moved the notifier call outside of
the mutex-guarded section. The notifier call
[`NotifyModuleAdded`](https://github.com/llvm/llvm-project/blob/96148f92146e5211685246722664e51ec730e7ba/lldb/source/Target/Target.cpp#L1810)
should be thread-safe, since the module should be added to the
`ModuleList` before the mutex is released, and the notifier doesn't
modify the module list further, and the call is operates on local state
and the `Target` instance.

### Deadlocked Thread backtraces:
```
* thread llvm#3, name = 'dbg.evt-handler', stop reason = signal SIGSTOP
  * frame #0: 0x00007f2f1e2973dc libc.so.6`futex_wait(private=0, expected=2, futex_word=0x0000563786bd5f40) at    futex-internal.h:146:13
   /*... a bunch of mutex related bt ... */
   liblldb.so.21.0git`std::lock_guard<std::recursive_mutex>::lock_guard(this=0x00007f2f0f1927b0, __m=0x0000563786bd5f40) at std_mutex.h:229:19
    frame llvm#8: 0x00007f2f27946eb7 liblldb.so.21.0git`ScanForGNUstepObjCLibraryCandidate(modules=0x0000563786bd5f28, TT=0x0000563786bd5eb8) at GNUstepObjCRuntime.cpp:60:41
    frame llvm#9: 0x00007f2f27946c80 liblldb.so.21.0git`lldb_private::GNUstepObjCRuntime::CreateInstance(process=0x0000563785e1d360, language=eLanguageTypeObjC) at GNUstepObjCRuntime.cpp:87:8
    frame llvm#10: 0x00007f2f2746fca5 liblldb.so.21.0git`lldb_private::LanguageRuntime::FindPlugin(process=0x0000563785e1d360, language=eLanguageTypeObjC) at LanguageRuntime.cpp:210:36
    frame llvm#11: 0x00007f2f2742c9e3 liblldb.so.21.0git`lldb_private::Process::GetLanguageRuntime(this=0x0000563785e1d360, language=eLanguageTypeObjC) at Process.cpp:1516:9
    ...
    frame llvm#21: 0x00007f2f2750b5cc liblldb.so.21.0git`lldb_private::Thread::GetSelectedFrame(this=0x0000563785e064d0, select_most_relevant=DoNoSelectMostRelevantFrame) at Thread.cpp:274:48
    frame llvm#22: 0x00007f2f273f9957 liblldb.so.21.0git`lldb_private::ExecutionContextRef::SetTargetPtr(this=0x00007f2f0f193778, target=0x0000563786bd5be0, adopt_selected=true) at ExecutionContext.cpp:525:32
    frame llvm#23: 0x00007f2f273f9714 liblldb.so.21.0git`lldb_private::ExecutionContextRef::ExecutionContextRef(this=0x00007f2f0f193778, target=0x0000563786bd5be0, adopt_selected=true) at ExecutionContext.cpp:413:3
    frame llvm#24: 0x00007f2f270e80af liblldb.so.21.0git`lldb_private::Debugger::GetSelectedExecutionContext(this=0x0000563785d83bc0) at Debugger.cpp:1225:23
    frame llvm#25: 0x00007f2f271bb7fd liblldb.so.21.0git`lldb_private::Statusline::Redraw(this=0x0000563785d83f30, update=true) at Statusline.cpp:136:41
    ...
* thread llvm#1, name = 'lldb', stop reason = signal SIGSTOP
  * frame #0: 0x00007f2f1e2973dc libc.so.6`futex_wait(private=0, expected=2, futex_word=0x0000563785e1dd98) at futex-internal.h:146:13
   /*... a bunch of mutex related bt ... */
   liblldb.so.21.0git`std::lock_guard<std::recursive_mutex>::lock_guard(this=0x00007ffe62be0488, __m=0x0000563785e1dd98) at std_mutex.h:229:19
    frame llvm#8: 0x00007f2f2742c8d1 liblldb.so.21.0git`lldb_private::Process::GetLanguageRuntime(this=0x0000563785e1d360, language=eLanguageTypeC_plus_plus) at Process.cpp:1510:41
    frame llvm#9: 0x00007f2f2743c46f liblldb.so.21.0git`lldb_private::Process::ModulesDidLoad(this=0x0000563785e1d360, module_list=0x00007ffe62be06a0) at Process.cpp:6082:36
    ...
    frame llvm#13: 0x00007f2f2715cf03 liblldb.so.21.0git`lldb_private::ModuleList::AppendImpl(this=0x0000563786bd5f28, module_sp=ptr = 0x563785cec560, use_notifier=true) at ModuleList.cpp:246:19
    frame llvm#14: 0x00007f2f2715cf4c liblldb.so.21.0git`lldb_private::ModuleList::Append(this=0x0000563786bd5f28, module_sp=ptr = 0x563785cec560, notify=true) at ModuleList.cpp:251:3
    ...
    frame llvm#19: 0x00007f2f274349b3 liblldb.so.21.0git`lldb_private::Process::ConnectRemote(this=0x0000563785e1d360, remote_url=(Data = "connect://localhost:1234", Length = 24)) at Process.cpp:3250:9
    frame llvm#20: 0x00007f2f27411e0e liblldb.so.21.0git`lldb_private::Platform::DoConnectProcess(this=0x0000563785c59990, connect_url=(Data = "connect://localhost:1234", Length = 24), plugin_name=(Data = "gdb-remote", Length = 10), debugger=0x0000563785d83bc0, stream=0x00007ffe62be3128, target=0x0000563786bd5be0, error=0x00007ffe62be1ca0) at Platform.cpp:1926:23
```

## Test Plan:
Built a hello world a.out
Run server in one terminal:
```
~/llvm/build/Debug/bin/lldb-server g :1234 a.out
```
Run client in another terminal
```
~/llvm/build/Debug/bin/lldb -o "gdb-remote 1234" -o "b hello.cc:3"
```

Before:
Client hangs indefinitely
```
~/llvm/build/Debug/bin/lldb -o "gdb-remote 1234" -o "b main"
(lldb) gdb-remote 1234

^C^C
```

After:
```
~/llvm/build/Debug/bin/lldb -o "gdb-remote 1234" -o "b hello.cc:3"
(lldb) gdb-remote 1234
Process 837068 stopped
* thread llvm#1, name = 'a.out', stop reason = signal SIGSTOP
    frame #0: 0x00007ffff7fe4a60
ld-linux-x86-64.so.2`_start:
->  0x7ffff7fe4a60 <+0>: movq   %rsp, %rdi
    0x7ffff7fe4a63 <+3>: callq  0x7ffff7fe5780 ; _dl_start at rtld.c:522:1

ld-linux-x86-64.so.2`_dl_start_user:
    0x7ffff7fe4a68 <+0>: movq   %rax, %r12
    0x7ffff7fe4a6b <+3>: movl   0x18067(%rip), %eax ; _dl_skip_args
(lldb) b hello.cc:3
Breakpoint 1: where = a.out`main + 15 at hello.cc:4:13, address = 0x00005555555551bf
(lldb) c
Process 837068 resuming
Process 837068 stopped
* thread llvm#1, name = 'a.out', stop reason = breakpoint 1.1
    frame #0: 0x00005555555551bf a.out`main at hello.cc:4:13
   1   	#include <iostream>
   2
   3   	int main() {
-> 4   	  std::cout << "Hello World" << std::endl;
   5   	  return 0;
   6   	}
```

(cherry picked from commit 7fb620a)
(cherry picked from commit f4054a6)
Another attempt at resolving the deadlock issue @GeorgeHuyubo discovered
(his previous
[attempt](llvm#160225)).

This change can be summarized as the following:
* Plumb through a boolean flag to force no preload in
`GetOrCreateModules` all the way through to `LoadModuleAtAddress`.
* Parallelize `Module::PreloadSymbols` separately from
`Target::GetOrCreateModule` and its caller `LoadModuleAtAddress` (this
is what avoids the deadlock).

These changes roughly maintain the performance characteristics of the
previous implementation of parallel module loading. Testing on targets
with between 5000 and 14000 modules, I saw similar numbers as before,
often more than 10% faster in the new implementation across multiple
trials for these massive targets. I think it's because we have less lock
contention with this approach.

See [bt.txt](https://github.com/user-attachments/files/22524471/bt.txt)
for a sample backtrace of LLDB when the deadlock occurs.

As @GeorgeHuyubo explains in his
[PR](llvm#160225), the deadlock
occurs from an ABBA deadlock that happens when a thread context-switches
out of `Module::PreloadSymbols`, goes into `Target::GetOrCreateModule`
for another module, possibly entering this block:
```
      if (!module_sp) {
        // The platform is responsible for finding and caching an appropriate
        // module in the shared module cache.
        if (m_platform_sp) {
          error = m_platform_sp->GetSharedModule(
              module_spec, m_process_sp.get(), module_sp, &search_paths,
              &old_modules, &did_create_module);
        } else {
          error = Status::FromErrorString("no platform is currently set");
        }
      }
```
`Module::PreloadSymbols` holds a module-level mutex, and then
`GetSharedModule` *attempts* to hold the mutex of the global shared
`ModuleList`. So, this thread holds the module mutex, and waits on the
global shared `ModuleList` mutex.

A competing thread may execute `Target::GetOrCreateModule`, enter the
same block as above, grabbing the global shared `ModuleList` mutex.
Then, in `ModuleList::GetSharedModule`, we eventually call
`ModuleList::FindModules` which eventually waits for the `Module` mutex
held by the first thread (via `Module::GetUUID`). Thus, we deadlock.

It might be worth noting that I've never been able to observe this
deadlock issue during live debugging (e.g. launching or attaching to
processes), however we were able to consistently reproduce this issue
with coredumps when using the following settings:
```
(lldb) settings set target.parallel-module-load true
(lldb) settings set target.preload-symbols true
(lldb) settings set symbols.load-on-demand false
(lldb) target create --core /some/core/file/here
```

This change avoids concurrent executions of `Module::PreloadSymbols`
with `Target::GetOrCreateModule` by waiting until after the
`Target::GetOrCreateModule` executions to run `Module::PreloadSymbols`
in parallel. This avoids the ordering of holding a Module lock *then*
the ModuleList lock, as `Target::GetOrCreateModule` executions maintain
the ordering of the shared ModuleList lock first (from what I've read
and tested).

Some feedback in llvm#160225 was to
modify mutexes used in these components with read-write locks. This
might be a good idea overall, but I don't think it would *easily*
resolve this specific deadlock. `Module::PreloadSymbols` would probably
need a write lock to Module, so even if we had a read lock in
`Module::GetUUID` we would still contend. Maybe the `ModuleList` lock
could be a read lock that converts to a write lock if it chooses to
update the module, but it seems likely that some thread would try to
update the shared module list and then the write lock would contend
again.

Perhaps with deeper architectural changes, we could fix this issue?

One downside of this approach (and the former approach of parallel
module loading) is that each DYLD would need to implement this pattern
themselves. With @clayborg's help, I looked at a few other approaches:
* In `Target::GetOrCreateModule`, backgrounding the
`Module::PreloadSymbols` call by adding it directly to the thread pool
via `Debugger::GetThreadPool().async()`. This required adding a lock to
`Module::SetLoadAddress` (probably should be one there already) since
`ObjectFileELF::SetLoadAddress` is not thread-safe (updates sections).
Unfortunately, during execution, this causes the preload symbols to run
synchronously with `Target::GetOrCreateModule`, preventing us from truly
parallelizing the execution.
* In `Module::PreloadSymbols`, backgrounding the `symtab` and `sym_file`
`PreloadSymbols` calls individually, but similar issues as the above.
* Passing a callback function like
swiftlang#10746 instead of the
boolean I use in this change. It's functionally the same change IMO,
with some design tradeoffs:
* Pro: the caller doesn't need to explicitly call
`Module::PreloadSymbols` itself, and can instead call whatever function
is passed into the callback.
* Con: the caller needs to delay the execution of the callback such that
it occurs after the `GetOrCreateModule` logic, otherwise we run into the
same issue. I thought this would be trickier for the caller, requiring
some kinda condition variable or otherwise storing the calls to execute
afterwards.

```
ninja check-lldb
```

---------

Co-authored-by: Tom Yang <toyang@fb.com>
(cherry picked from commit 66d5f6a)
(cherry picked from commit 3cad3fc)
@gmondada gmondada marked this pull request as ready for review February 9, 2026 20:14
@gmondada gmondada requested a review from a team as a code owner February 9, 2026 20:14
@finagolfin
Copy link
Member

@swift-ci test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants