Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 87 additions & 0 deletions docs/194717-ensure-code-executed-on-worker-is-confined.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# 194717 Ensure code executed on the worker is confined
Issue: https://progress.opensuse.org/issues/194717

## Summary

Test modules run as Perl code inside the `isotovideo` process on the **host**, not inside the VM.
That means a test module can read files, write to the filesystem, or make network requests as the worker OS user — even on a production server with a dedicated account.

This document describes what was investigated, what works, and what still needs to be done.

## Investigation

To demonstrate the attack surface I wrote a test module (`tests/install/explore.pm`) that uses standard testapi calls (`script_run`, `upload_logs`) alongside direct Perl code to do sensitive things from inside the worker process on the host side.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote a test module (tests/install/explore.pm)

where can I find this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that not really important. I will add something similar to the openqa-in-opena as suggested in the RFD.

TThis covers the main ways a malicious test module could exploit the worker: testapi calls that run commands in the VM but also give access to the host process, direct host file reads, and leaking files via `upload_logs` to the web UI.
Other attack vectors acknowledged but not demonstrated: external code pulled in via container images or git repos during a job, and arbitrary commands passed through job variables or openQA settings.

I ran the same test module against a confined and an unconfined worker to compare results.

### systemd hardening — tested and working

I added standard systemd hardening directives to the worker unit. This solution seems to be the less efortless with great value return, which should be applied first .
The key confirmed results:

- Writing to `/etc` fails with `Read-only file system` and this confirms systemd is enforcing the restriction at the kernel level, not just relying on normal file permissions.
- Writing to the home directory also fails the same way once `ProtectHome` is active.
- Other users' processes are hidden in `/proc`. Processes running as the same user are still visible (this is expected behavior!).

The full set of directives tested:

```ini
ProtectSystem=strict
ProtectHome=yes
ProtectProc=invisible
NoNewPrivileges=yes
PrivateTmp=yes
ProtectKernelTunables=yes
ProtectKernelModules=yes
ProtectKernelLogs=yes
ProtectControlGroups=yes
ProtectClock=yes
ProtectHostname=yes
RestrictSUIDSGID=yes
RestrictRealtime=yes
LockPersonality=yes
```

Several directives cannot be used because they break QEMU:

| Directive | Why excluded |
|---|---|
| `MemoryDenyWriteExecute=yes` | QEMU needs to write and execute memory for its JIT compiler |
| `PrivateDevices=yes` | Would remove `/dev/kvm` access |
| `PrivateNetwork=yes` | Worker needs network to reach the WebUI and cache service |
| `RestrictNamespaces=yes` | QEMU networking needs this |
| `SystemCallFilter=` | QEMU uses too many system calls to filter safely |

### AppArmor — gaps identified, follow-up needed

I looked at the existing profile (`/etc/apparmor.d/usr.share.openqa.script.worker`) and mainly there are alrady several rules but found two gaps:

1. The profile allows reading `/etc/openqa/client.conf` — any test module can read the worker API key.
2. `isotovideo` runs with the same permissions as the worker daemon — there is no separation between the two.

I drafted a tighter sub-profile for `isotovideo` that would close these gaps, but could not test it locally because the worker in my dev setup runs as a user session service, which causes conflicts with AppArmor that do not exist on production. The approach is valid and needs to be tested on a production worker.

## Known Limitations

- **Same-user `/proc` visibility:** `ProtectProc=invisible` hides processes of other users but not processes running as the same user. If all openQA services share one account, a test module can still read their environment variables.
- **Network not yet restricted:** A test module can still make outbound connections. Adding network restrictions depends on various factors which I couldnt determine in a timebox. I could not come up with any one solid idea.


## Proposal and followup

- [ ] Add a test module in openqa-to-openqa for a list of confined operations
- [ ] Add systemd hardening directives to the production worker unit
- use one dedicate worker slot for starters
- need to test among a variety of test cases
- [ ] Test the AppArmor `isotovideo` sub-profile on production in complain mode
- Already mostly in place

## Out of Scope

- **Network restriction** — as mentioned above
- **Containers, lxc, cgroups/namespaces** — I assessed these as confinement options. They considered the industry standard in a review of other tools but quite complex for a timeboxed experiment. Compared to systemd and AppArmor solutions, the effort is much higher for a similar result on a worker.
- **SELinux** — production uses AppArmor; switching would need a follow up once we switch to it
- **Syscall filtering** — needs detailed profiling of QEMU, high risk of breakage;
- **Code scanning** — Semgrep with custom rules could flag dangerous patterns in test modules; out of scope here but worth a separate investigation
Loading