Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 18 additions & 17 deletions .github/workflows/deploy-arc-runners.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,15 +27,14 @@ on:
env:
TOFU_VERSION: "1.8.8"
STACK_DIR: tofu/stacks/arc-runners
STATE_NAME: arc-runners-dev
TF_IN_AUTOMATION: "true"
TF_INPUT: "false"
CLUSTER_NAME: tinyland

jobs:
plan:
name: Plan ARC Runners
runs-on: ubuntu-latest
runs-on: tinyland-docker
if: >-
github.event_name == 'pull_request'
|| github.event_name == 'push'
Expand All @@ -50,6 +49,12 @@ jobs:
uses: opentofu/setup-opentofu@v1
with:
tofu_version: ${{ env.TOFU_VERSION }}
tofu_wrapper: false

- name: Install kubectl
Comment on lines 51 to +54
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 kubectl installed with dynamic version and no checksum verification

The Install kubectl step fetches the version dynamically from stable.txt at runtime and does not verify the binary's checksum. This means:

  1. The kubectl version is non-deterministic between runs (could silently change)
  2. The binary is not verified against a known-good SHA256, which is a supply-chain risk

Consider pinning a specific version and verifying the checksum. This same pattern is duplicated in the apply job (~line 142) and both should be updated together.

run: |
curl -sLO "https://dl.k8s.io/release/$(curl -sL https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
chmod +x kubectl && sudo mv kubectl /usr/local/bin/

- name: Install Civo CLI
run: |
Expand All @@ -68,13 +73,8 @@ jobs:
working-directory: ${{ env.STACK_DIR }}
run: |
tofu init -reconfigure \
-backend-config="address=https://gitlab.com/api/v4/projects/79706605/terraform/state/${STATE_NAME}" \
-backend-config="lock_address=https://gitlab.com/api/v4/projects/79706605/terraform/state/${STATE_NAME}/lock" \
-backend-config="unlock_address=https://gitlab.com/api/v4/projects/79706605/terraform/state/${STATE_NAME}/lock" \
-backend-config="lock_method=POST" \
-backend-config="unlock_method=DELETE" \
-backend-config="username=tofu-ci" \
-backend-config="password=${{ secrets.GITLAB_PAT }}"
-backend-config="access_key=${{ secrets.RUSTFS_ACCESS_KEY }}" \
-backend-config="secret_key=${{ secrets.RUSTFS_SECRET_KEY }}"

- name: Plan
id: plan
Expand Down Expand Up @@ -124,7 +124,7 @@ jobs:

apply:
name: Apply ARC Runners
runs-on: ubuntu-latest
runs-on: tinyland-docker
needs: plan
if: >-
needs.plan.outputs.has-changes == 'true'
Expand All @@ -142,6 +142,12 @@ jobs:
uses: opentofu/setup-opentofu@v1
with:
tofu_version: ${{ env.TOFU_VERSION }}
tofu_wrapper: false

- name: Install kubectl
run: |
curl -sLO "https://dl.k8s.io/release/$(curl -sL https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
chmod +x kubectl && sudo mv kubectl /usr/local/bin/

- name: Install Civo CLI
run: |
Expand All @@ -160,13 +166,8 @@ jobs:
working-directory: ${{ env.STACK_DIR }}
run: |
tofu init -reconfigure \
-backend-config="address=https://gitlab.com/api/v4/projects/79706605/terraform/state/${STATE_NAME}" \
-backend-config="lock_address=https://gitlab.com/api/v4/projects/79706605/terraform/state/${STATE_NAME}/lock" \
-backend-config="unlock_address=https://gitlab.com/api/v4/projects/79706605/terraform/state/${STATE_NAME}/lock" \
-backend-config="lock_method=POST" \
-backend-config="unlock_method=DELETE" \
-backend-config="username=tofu-ci" \
-backend-config="password=${{ secrets.GITLAB_PAT }}"
-backend-config="access_key=${{ secrets.RUSTFS_ACCESS_KEY }}" \
-backend-config="secret_key=${{ secrets.RUSTFS_SECRET_KEY }}"

- name: Download plan
uses: actions/download-artifact@v4
Expand Down
2 changes: 1 addition & 1 deletion docs/runners/hpa-tuning.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ kubectl describe hpa runner-docker -n {org}-runners
(1--5) is usually sufficient.
- **dind**: Container builds are bursty. If builds queue during peak hours,
consider increasing the maximum.
- **rocky8/rocky9**: Typically low utilization. Minimum of 1 keeps a warm
- **tinyland-docker/tinyland-nix**: Typically low utilization. Minimum of 1 keeps a warm
pod available.
- **nix**: CPU-intensive builds can saturate a pod quickly. Monitor
utilization and adjust the maximum if builds are queuing.
Expand Down
8 changes: 4 additions & 4 deletions docs/runners/load-testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,14 +100,14 @@ test-dind:
- docker pull alpine:latest
- docker images

test-rocky8:
tags: [rocky8]
test-tinyland-docker:
tags: [tinyland-docker]
script:
- cat /etc/redhat-release
- dnf list installed | head -20

test-rocky9:
tags: [rocky9]
test-tinyland-nix:
tags: [tinyland-nix]
script:
- cat /etc/redhat-release
- dnf list installed | head -20
Comment on lines +103 to 113
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 RHEL-specific scripts on non-RHEL runners

Both test-tinyland-docker and test-tinyland-nix still run cat /etc/redhat-release and dnf list installed, which are RHEL/Rocky Linux-specific commands. Given that tinyland-docker is a generic Docker runner and tinyland-nix is a Nix runner, these commands would fail (file not found / dnf not available) on the actual runner environments after the rename. If this is intentional documentation of what these runners replaced (i.e., they still run RHEL images), a clarifying comment would help. Otherwise the scripts should be updated to reflect the actual runner environments.

Expand Down
8 changes: 4 additions & 4 deletions docs/runners/migration-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ The legacy setup required operators to:
```
TF_VAR_docker_runner_token=glrt-...
TF_VAR_dind_runner_token=glrt-...
TF_VAR_rocky8_runner_token=glrt-...
TF_VAR_rocky9_runner_token=glrt-...
TF_VAR_tinyland-docker_runner_token=glrt-...
TF_VAR_tinyland-nix_runner_token=glrt-...
Comment on lines +15 to +16
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Hyphens in TF_VAR_* env var names are invalid in POSIX shells

The new names TF_VAR_tinyland-docker_runner_token and TF_VAR_tinyland-nix_runner_token contain hyphens. POSIX shell variable names may only contain letters, digits, and underscores — a hyphen is not a valid character. Anyone attempting to export these directly in a shell script would get a syntax error.

If the underlying Terraform variable names actually use hyphens, they need to be set via a .tfvars file rather than environment variables. If the intended Terraform variable names use underscores (the standard convention), the documentation should be updated to replace the hyphens with underscores: TF_VAR_tinyland_docker_runner_token and TF_VAR_tinyland_nix_runner_token.

TF_VAR_nix_runner_token=glrt-...
```
3. Rotate tokens manually when they expired or were revoked.
Expand Down Expand Up @@ -130,8 +130,8 @@ include:
| ------------------------------- | -------------- |
| `ci-templates/docker.yml` | `docker-job` |
| `ci-templates/dind.yml` | `dind-job` |
| `ci-templates/rocky8.yml` | `rocky8-job` |
| `ci-templates/rocky9.yml` | `rocky9-job` |
| `ci-templates/tinyland-docker.yml` | `tinyland-docker-job` |
| `ci-templates/tinyland-nix.yml` | `tinyland-nix-job` |
| `ci-templates/nix.yml` | `nix-job` |
| `ci-templates/docker-build.yml` | `docker-build` |
| `ci-templates/k8s-deploy.yml` | `k8s-deploy` |
Expand Down
10 changes: 5 additions & 5 deletions docs/runners/project-onboarding.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@ Use the [Runner Selection Guide](runner-selection.md) to map each CI job to a ru
| Python lint/test | docker | `docker` |
| Nix flake builds | nix | `nix` |
| Container image builds | dind | `dind` |
| RHEL 8 packaging | rocky8 | `rocky8` |
| RHEL 9 packaging | rocky9 | `rocky9` |
| RHEL 8 packaging | tinyland-docker | `tinyland-docker` |
| RHEL 9 packaging | tinyland-nix | `tinyland-nix` |

## Step 2: Add Tags to Jobs

Expand Down Expand Up @@ -70,7 +70,7 @@ build:nix:
## Step 4: Verify Pipeline

1. Push your branch and check the pipeline
2. Click on a tagged job — the runner name should show `bates-docker`, `bates-nix`, etc.
2. Click on a tagged job — the runner name should show `tinyland-docker`, `tinyland-nix`, etc.
3. Check job duration against baseline (SaaS runner times)
4. For Nix jobs, verify cache hits in the build log

Expand All @@ -91,8 +91,8 @@ The `upgrading-dw` project migrated all CI jobs to dedicated runners:
| build:orchestrator-nix, build:orchestrator-nix-release | `nix` | Haskell builds |
| build:orchestrator-nix-static, build:orchestrator-nix-musl-upx | `nix` | MUSL static builds |
| build:orchestrator (buildah fallback) | `dind` | Container builds |
| package:fpm:el8 | `rocky8` | EL8 RPM packaging |
| package:fpm:el9 | `rocky9` | EL9 RPM packaging |
| package:fpm:el8 | `tinyland-docker` | EL8 RPM packaging |
| package:fpm:el9 | `tinyland-nix` | EL9 RPM packaging |
| deploy:repo | `mgr` | PVE node (unchanged) |
| deploy:dev | `docker` | SSH-based deployment |

Expand Down
6 changes: 3 additions & 3 deletions docs/runners/resource-limits.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@ values in their `*.tfvars` files.
|--------|------------|-----------|----------------|--------------|
| docker | 100m | 2 | 256Mi | 2Gi |
| dind | 500m | 4 | 1Gi | 8Gi |
| rocky8 | 100m | 2 | 256Mi | 2Gi |
| rocky9 | 100m | 2 | 256Mi | 2Gi |
| tinyland-docker | 100m | 2 | 256Mi | 2Gi |
| tinyland-nix | 100m | 2 | 256Mi | 2Gi |
| nix | 500m | 4 | 1Gi | 8Gi |

## Typical Workload Profiles
Expand All @@ -25,7 +25,7 @@ values in their `*.tfvars` files.
| GHC build (warm cache) | 500m-1 | 512Mi-1Gi | nix |
| GHC build (cold cache) | 2-4 | 2-4Gi | nix |
| MUSL static build | 1-2 | 1-2Gi | nix |
| FPM RPM packaging | 100-500m | 256-512Mi | rocky8/rocky9 |
| FPM RPM packaging | 100-500m | 256-512Mi | tinyland-docker/tinyland-nix |
| Docker image build | 500m-2 | 512Mi-2Gi | dind |

## Namespace Quota
Expand Down
2 changes: 1 addition & 1 deletion docs/runners/security-model.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ runner infrastructure.

Only the `dind` runner operates in privileged mode. This is required for the
Docker daemon sidecar and cannot be avoided for Docker-in-Docker builds. All
other runner types (`docker`, `rocky8`, `rocky9`, `nix`) run as unprivileged
other runner types (`docker`, `tinyland-docker`, `tinyland-nix`, `nix`) run as unprivileged
containers.

For container builds that do not require a full Docker daemon, consider using
Expand Down
8 changes: 4 additions & 4 deletions docs/runners/self-service-enrollment.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,8 @@ syntax instead of writing jobs from scratch.
| -------------- | ------------------------------------- |
| `docker-job` | Standard Docker runner job |
| `dind-job` | Docker-in-Docker job |
| `rocky8-job` | Rocky 8 runner job |
| `rocky9-job` | Rocky 9 runner job |
| `tinyland-docker-job` | Rocky 8 runner job |
| `tinyland-nix-job` | Rocky 9 runner job |
Comment on lines +39 to +40
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Stale descriptions not updated

The component descriptions still reference the old runner names. tinyland-docker-job is labeled "Rocky 8 runner job" and tinyland-nix-job is labeled "Rocky 9 runner job" — these weren't updated alongside the component name changes, leaving misleading documentation.

Suggested change
| `tinyland-docker-job` | Rocky 8 runner job |
| `tinyland-nix-job` | Rocky 9 runner job |
| `tinyland-docker-job` | tinyland-docker runner job |
| `tinyland-nix-job` | tinyland-nix runner job |

| `nix-job` | Nix runner job with Attic cache |
| `docker-build` | Build and push container images |
| `k8s-deploy` | Deploy to Kubernetes via GitLab Agent |
Expand Down Expand Up @@ -142,7 +142,7 @@ Each job runs in an ephemeral `ci-job-*` Kubernetes namespace with:
- **LimitRange**: sensible container defaults
- **RBAC**: read-only access to pods, deployments, HPAs, jobs, events

This applies to `docker`, `rocky8`, `rocky9`, and `nix` runners. The `dind`
This applies to `docker`, `tinyland-docker`, `tinyland-nix`, and `nix` runners. The `dind`
runner is the exception -- it uses a shared namespace with privileged access.
See [security-model.md](security-model.md) for full details.

Expand All @@ -159,7 +159,7 @@ gitlab.com but not github.com), this causes failures when jobs try to clone
from GitHub.

**Rule of thumb:** self-hosted runners should only be tagged with their
workload type (`docker`, `nix`, `rocky8`, etc). Projects request specific
workload type (`docker`, `nix`, `tinyland-docker`, etc). Projects request specific
runners by matching these workload tags. Generic infrastructure jobs stay on
SaaS shared runners.

Expand Down
20 changes: 16 additions & 4 deletions tofu/stacks/arc-runners/backend.tf
Original file line number Diff line number Diff line change
@@ -1,8 +1,20 @@
# GitLab Managed Terraform State
# RustFS S3-compatible state backend
#
# State is stored in GitLab's built-in Terraform state management.
# Access via CI_JOB_TOKEN in pipelines.
# State stored in RustFS (on-cluster MinIO-compatible store in nix-cache namespace).
# In-cluster runners access via: http://attic-rustfs-hl.nix-cache.svc:9000
# Local dev: kubectl port-forward -n nix-cache svc/attic-rustfs 9000:9000
# then: tofu init -backend-config="endpoint=http://localhost:9000" -backend-config="access_key=..." -backend-config="secret_key=..."

terraform {
backend "http" {}
backend "s3" {
bucket = "tofu-state"
key = "arc-runners/terraform.tfstate"
region = "us-east-1"
endpoint = "http://attic-rustfs-hl.nix-cache.svc:9000"
skip_credentials_validation = true
skip_metadata_api_check = true
skip_requesting_account_id = true
skip_s3_checksum = true
use_path_style = true
}
}
50 changes: 20 additions & 30 deletions tofu/stacks/attic/backend.tf
Original file line number Diff line number Diff line change
@@ -1,43 +1,33 @@
# Attic Stack - Backend Configuration
#
# Uses GitLab Managed Terraform State for state storage and locking.
# This enables collaboration and state versioning through GitLab.
# Uses RustFS S3-compatible state backend (on-cluster MinIO-compatible store).
#
# Backend Configuration Methods:
#
# 1. CI/CD (automatic):
# Environment variables are set by .gitlab-ci.yml templates:
# TF_HTTP_ADDRESS, TF_HTTP_LOCK_ADDRESS, TF_HTTP_UNLOCK_ADDRESS
# TF_HTTP_USERNAME (gitlab-ci-token), TF_HTTP_PASSWORD (CI_JOB_TOKEN)
# 1. CI/CD (ARC runners, in-cluster):
# Endpoint in backend block points to cluster-internal DNS.
# Credentials passed via -backend-config or env vars:
# -backend-config="access_key=..." -backend-config="secret_key=..."
#
# 2. Local development with GitLab state:
# Use Justfile commands which configure backend via -backend-config:
# just init # Initialize with GitLab backend
# just plan # Plan changes
# just apply # Apply changes
#
# Or manually:
# export TF_HTTP_PASSWORD="glpat-your-token"
# tofu init -backend-config=backend.local.hcl
# 2. Local development:
# Port-forward RustFS and override endpoint:
# kubectl port-forward -n nix-cache svc/attic-rustfs 9000:9000
# tofu init -backend-config="endpoint=http://localhost:9000" \
# -backend-config="access_key=..." -backend-config="secret_key=..."
#
# 3. Local-only state (not recommended for shared infrastructure):
# tofu init -backend=false
# # Uses in-memory state, changes are not persisted

terraform {
# HTTP backend for GitLab Managed Terraform State
# All configuration provided via environment variables or -backend-config
backend "http" {
# Required TF_HTTP_* environment variables:
# TF_HTTP_ADDRESS - State read/write URL
# TF_HTTP_LOCK_ADDRESS - Lock URL
# TF_HTTP_UNLOCK_ADDRESS - Unlock URL
# TF_HTTP_USERNAME - GitLab username or "gitlab-ci-token"
# TF_HTTP_PASSWORD - Personal access token or CI_JOB_TOKEN
#
# Optional:
# TF_HTTP_LOCK_METHOD - POST (default)
# TF_HTTP_UNLOCK_METHOD - DELETE (default)
# TF_HTTP_RETRY_WAIT_MIN - Retry wait time (default: 1s)
backend "s3" {
bucket = "tofu-state"
key = "attic/terraform.tfstate"
region = "us-east-1"
endpoint = "http://attic-rustfs-hl.nix-cache.svc:9000"
skip_credentials_validation = true
skip_metadata_api_check = true
skip_requesting_account_id = true
skip_s3_checksum = true
use_path_style = true
}
}
19 changes: 15 additions & 4 deletions tofu/stacks/gitlab-runners/backend.tf
Original file line number Diff line number Diff line change
@@ -1,8 +1,19 @@
# GitLab Managed Terraform State
# RustFS S3-compatible state backend
#
# State is stored in GitLab's built-in Terraform state management.
# Access via CI_JOB_TOKEN in pipelines.
# State stored in RustFS (on-cluster MinIO-compatible store in nix-cache namespace).
# In-cluster runners: http://attic-rustfs-hl.nix-cache.svc:9000
# Local dev: kubectl port-forward -n nix-cache svc/attic-rustfs 9000:9000

terraform {
backend "http" {}
backend "s3" {
bucket = "tofu-state"
key = "gitlab-runners/terraform.tfstate"
region = "us-east-1"
endpoint = "http://attic-rustfs-hl.nix-cache.svc:9000"
skip_credentials_validation = true
skip_metadata_api_check = true
skip_requesting_account_id = true
skip_s3_checksum = true
use_path_style = true
}
}
19 changes: 15 additions & 4 deletions tofu/stacks/runner-dashboard/backend.tf
Original file line number Diff line number Diff line change
@@ -1,8 +1,19 @@
# Backend Configuration
# RustFS S3-compatible state backend
#
# Uses GitLab managed Terraform state.
# Initialize with: just init
# State stored in RustFS (on-cluster MinIO-compatible store in nix-cache namespace).
# In-cluster runners: http://attic-rustfs-hl.nix-cache.svc:9000
# Local dev: kubectl port-forward -n nix-cache svc/attic-rustfs 9000:9000

terraform {
backend "http" {}
backend "s3" {
bucket = "tofu-state"
key = "runner-dashboard/terraform.tfstate"
region = "us-east-1"
endpoint = "http://attic-rustfs-hl.nix-cache.svc:9000"
skip_credentials_validation = true
skip_metadata_api_check = true
skip_requesting_account_id = true
skip_s3_checksum = true
use_path_style = true
}
}
Loading