Skip to content

fix: use network-online.target for reliable network dependency#16

Draft
MartinLoeper wants to merge 5 commits intomainfrom
fix-network-target-dependency
Draft

fix: use network-online.target for reliable network dependency#16
MartinLoeper wants to merge 5 commits intomainfrom
fix-network-target-dependency

Conversation

@MartinLoeper
Copy link
Copy Markdown
Owner

@MartinLoeper MartinLoeper commented Jan 5, 2026

Goal

Fix the systemd service network dependency by using network-online.target instead of network.target. The service makes HTTP requests to the Claude API, so it needs the network to be fully routable, not just configured.

Tasks

  • Update After to use network-online.target
  • Add Wants = [ "network-online.target" ]
  • Bump version

Implementation

Changed nix/hm-module.nix:

# Before
After = [ "network.target" ];

# After
After = [ "network-online.target" ];
Wants = [ "network-online.target" ];

Notes

  • network.target only means network interfaces are configured (~3s after boot)
  • network-online.target means network is actually routable (~15s after boot)
  • The ~12 second gap was causing initial API requests to fail on boot

MartinLoeper and others added 2 commits January 5, 2026 11:48
The service makes HTTP requests to the Claude API, so it needs the
network to be fully routable (network-online.target), not just
configured (network.target).

On typical systems, network.target is reached ~12 seconds before
network-online.target, causing initial API requests to fail on boot.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to fix network dependency issues for the claude-o-meter systemd service by switching from network.target to network-online.target to ensure the network is fully routable before the service starts.

  • Updated systemd service to use network-online.target instead of network.target
  • Added Wants dependency on network-online.target
  • Bumped version from 2.0.76-14 to 2.0.76-15

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
nix/hm-module.nix Modified systemd user service unit to depend on network-online.target for reliable network availability
VERSION Incremented patch version to 2.0.76-15

Comment thread nix/hm-module.nix
Comment on lines +180 to +181
After = [ "network-online.target" ];
Wants = [ "network-online.target" ];
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The network-online.target is a system-level target that may not be available or may not work reliably with systemd user services. User services are started in a different context and don't have access to all system targets.

For user services that need network connectivity, consider one of these alternatives:

  1. Use graphical-session.target (if this runs in a user session with a GUI)
  2. Remove the network dependency entirely and rely on the service's restart mechanism (RestartSec = "10s" is already configured) to handle transient network failures
  3. If network dependency is critical, implement retry logic within the application itself

Since the service already has "Restart = always" and "RestartSec = 10s" configured, removing the network dependency and letting the service restart on failures may be the most reliable approach for a user service.

Suggested change
After = [ "network-online.target" ];
Wants = [ "network-online.target" ];

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our strategy: test this on NixOS unstable first. If network-online.target doesn't work reliably for user services, we'll explore alternatives (e.g., blocking ExecStartPre with a network check).

If it does work, we'll make it an optional Home Manager module option so users can opt-in to the network dependency behavior.

@MartinLoeper
Copy link
Copy Markdown
Owner Author

It looks that the issue is not related to network being offline. I can observe that with the fix that claude-o-meter starts after network-online target is ready. However, its till hangs... maybe internet availability was not currectly detected. We should try the PreStart systemd property and see whether network is really available at that point.

@MartinLoeper
Copy link
Copy Markdown
Owner Author

Quick note: if this error really turns out to be related to network connectivity, I feel like we are really working around the lack of proper error reporting in claude code. We should definitely upstream the issue in that case.

@MartinLoeper
Copy link
Copy Markdown
Owner Author

@claude @copilot what do you both think of that?

Copy link
Copy Markdown

Copilot AI commented Jan 5, 2026

@MartinLoeper I've opened a new pull request, #17, to work on those changes. Once the pull request is ready, I'll request review from you.

@claude
Copy link
Copy Markdown

claude Bot commented Jan 5, 2026

Claude finished @MartinLoeper's task in 1m 19s —— View job


Analysis Complete

Tasks

  • Read current systemd service configuration
  • Read main.go to understand network request handling
  • Analyze whether issue is network-related or error handling
  • Provide recommendations

My Assessment

After analyzing the code, I agree with your observation that this might not be a network availability issue at all. Here's why:

The Real Problem: Poor Error Handling in executeClaudeCLI

Looking at main.go:720-818, the executeClaudeCLI function has a critical weakness:

  1. It runs claude /usage via PTY using the script command
  2. It polls for % used or % left patterns to detect success
  3. If those patterns don't appear within 30s, it times out
  4. But it doesn't distinguish between:
    • Network not being available
    • claude CLI not being ready
    • claude CLI hanging for some other reason
    • Authentication errors (though there is auth error detection on line 767-769)

Why network-online.target Doesn't Help

The issue you're seeing (service starts after network-online.target is ready but still hangs) strongly suggests:

  • The network is actually online when the service starts
  • But claude CLI is still not able to connect for some other reason:
    • DNS might not be fully ready
    • Claude API endpoints might not be reachable yet (routing tables, firewall rules, etc.)
    • The claude CLI itself might be waiting for some internal initialization
    • HTTP requests are timing out at the TCP/TLS level before reaching application layer

Your PreStart Suggestion is Good for Diagnosis

Using ExecStartPre to test network connectivity is an excellent diagnostic approach:

Service = {
  ExecStartPre = "${pkgs.bash}/bin/bash -c '${pkgs.curl}/bin/curl -s --max-time 5 https://api.anthropic.com/v1/health || echo \"Network check failed at $(date)\" >> /tmp/claude-o-meter-prestart.log'";
  ExecStart = "...";
};

This would tell us definitively whether network is truly available.

Root Cause: Lack of Error Reporting in Claude CLI

I strongly agree with your second comment about upstreaming this to Claude Code. The fundamental issue is:

The claude CLI doesn't provide machine-readable error codes or structured output when it fails.

If claude /usage could return:

  • Exit codes indicating different failure types (no network, auth error, etc.)
  • JSON error output with error codes
  • Faster failure modes when network isn't available

Then claude-o-meter could:

  • Detect network issues immediately (instead of waiting 30s)
  • Provide specific error messages
  • Retry with appropriate backoff strategies

Recommendations

Short term (for this PR):

  1. Add PreStart diagnostic as you suggested to confirm network availability
  2. Add better logging around the executeClaudeCLI call to capture the exact failure mode
  3. Consider a network health check before calling claude:
    // Before executeClaudeCLI, try a quick HTTP request to api.anthropic.com
    // If that fails, we know it's a network issue and can log appropriately

Medium term:

  1. Upstream to Claude Code - file an issue requesting:

    • Faster failure modes when network unavailable
    • Structured error output (JSON with error codes)
    • Exit codes that distinguish failure types
    • --timeout flag for claude /usage
  2. Add retry logic with exponential backoff in claude-o-meter specifically for startup failures

Alternative approach (if network-online.target proves unreliable):

Follow Copilot's suggestion and remove network dependency entirely, relying on:

  • The existing Restart = always + RestartSec = 10s
  • The startup retry logic (startupRetryInterval = 5s in main.go:1144)
  • This is actually quite robust already!

Verdict

The current implementation in main.go already has good retry logic for handling failures. The network-online.target dependency might be working around a problem that doesn't exist - the real issue is that claude CLI doesn't fail fast or provide useful error information when the network isn't ready.

I'd recommend:

  1. ✅ Test with PreStart to confirm network is available
  2. ✅ If network IS available but claude still hangs → upstream to Claude Code
  3. ✅ Consider removing network-online.target dependency and just let the retry logic handle it
  4. ✅ Add explicit network health check before calling claude if you want faster startup

@MartinLoeper
Copy link
Copy Markdown
Owner Author

Here are the logs for the error case:

-- Boot 14f7d4e4cd3647329c2904d9b4c78341 --
Jan 05 22:25:03 nesto-office-mloeper systemd[13965]: Starting Claude usage metrics daemon...
Jan 05 22:25:03 nesto-office-mloeper systemd[13965]: Started Claude usage metrics daemon.
Jan 05 22:25:03 nesto-office-mloeper claude-o-meter[14029]: 2026/01/05 22:25:03 Starting daemon: interval=5m0s, output=/home/mloeper/.cache/claude-o-meter.json, debug=false, dbus=true
Jan 05 22:25:03 nesto-office-mloeper claude-o-meter[14029]: 2026/01/05 22:25:03 Notifications enabled: threshold=90%, timeout=0ms, icon=/nix/store/rq234db5mfqv1p16rzhq804y6aqkqkd7-claude-o-meter-2.0.7>
Jan 05 22:25:03 nesto-office-mloeper claude-o-meter[14029]: 2026/01/05 22:25:03 D-Bus service registered: com.github.MartinLoeper.ClaudeOMeter
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]: 2026/01/05 22:25:33 Query failed: command timed out after 30s
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]: 2026/01/05 22:25:33 Raw CLI output:
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]:
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]: ╭─── Claude Code v2.0.76 ──────────────────────────────────────────────────────╮
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]: │                                                    │ Tips for getting        │
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]: │                Welcome back Martin!                │ started                 │
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]: │                                                    │ Run /init to create a … │
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]: │                     * ▐▛███▜▌ *                    │ ─────────────────────── │
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]: │                    * ▝▜█████▛▘ *                   │ Recent activity         │
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]: │                     *  ▘▘ ▝▝  *                    │ No recent activity      │
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]: │                                                    │                         │
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]: │ Opus 4.5 · Claude Max · redacted@gmail.com's  │                         │
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]: │ Organization                                       │                         │
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]: │                        /tmp                        │                         │
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]: ╰──────────────────────────────────────────────────────────────────────────────╯
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]:
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]: ────────────────────────────────────────────────────────────────────────────────
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]: > Try "edit <filepath> to..."
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]: ────────────────────────────────────────────────────────────────────────────────
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]:   -- INSERT --
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]:
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]: > /usage
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]:
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]: · Scheming… (esc to interrupt)
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]:
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]: ────────────────────────────────────────────────────────────────────────────────
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]: > Try "edit <filepath> to..."
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]: ────────────────────────────────────────────────────────────────────────────────
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]:   -- INSERT --
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]:
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]: > /usage
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]: ────────────────────────────────────────────────────────────────────────────────
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]:  Settings:  Status   Config   Usage  (tab to cycle)
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]:
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]:  Loading usage data…
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]:
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]:  Esc to cancel
Jan 05 22:25:33 nesto-office-mloeper claude-o-meter[14029]: 2026/01/05 22:25:33 Initial query failed (startup mode), retrying in 5s

@claude does that change your opinion?

@claude
Copy link
Copy Markdown

claude Bot commented Jan 5, 2026

Claude finished @MartinLoeper's task in 1m 20s —— View job


Analysis Update

Tasks

  • Read current systemd service configuration
  • Read main.go to understand network request handling
  • Analyze new log output
  • Provide updated assessment

YES - This Changes Everything

The logs are extremely revealing. Here's what's happening:

What the Logs Show

Jan 05 22:25:03 - Service starts (network-online.target is ready)
Jan 05 22:25:03 - claude CLI launches successfully
Jan 05 22:25:03 - Shows welcome screen with user info
Jan 05 22:25:03 - Accepts "/usage" command
Jan 05 22:25:03 - Shows "Loading usage data…"
Jan 05 22:25:33 - Timeout after exactly 30 seconds

The critical observation: The CLI gets all the way to "Loading usage data…" which means it's trying to make the API request, but that request never completes.

Root Cause Analysis

Looking at main.go:760-763, the code polls for these patterns:

hasUsageData := func(output string) bool {
    return strings.Contains(output, "% used") || strings.Contains(output, "% left")
}

The CLI shows "Loading usage data…" but never displays the percentage patterns. This means:

  1. ✅ Network is configured (service started after network-online.target)
  2. claude CLI is authenticated (shows "Welcome back Martin!")
  3. ✅ The CLI accepts the /usage command
  4. The HTTP request to the Claude API hangs indefinitely

This IS a Network Issue - But Not What We Thought

The issue isn't that the network is "offline" - it's that:

The network is technically "online" but the Claude API endpoint is not reachable yet.

This could be due to:

  • DNS resolution not fully working (name servers not yet reachable)
  • Routing tables not fully populated
  • Firewall rules not yet applied
  • VPN not yet connected
  • Corporate proxy not yet available
  • TLS/SSL certificate validation failing due to time sync issues

network-online.target only guarantees that a route to the internet exists, not that all network services are fully functional.

Why network-online.target Doesn't Help

From the systemd documentation:

network-online.target [...] should pull in a service that delays until the network is actually configured and usable.

But "usable" is vague. On your system, it appears the network is considered "online" when interfaces are up and default routes exist, but before all network infrastructure (DNS, routing, etc.) is fully operational.

Updated Recommendations

Immediate fix (for this PR):

The network-online.target approach won't work reliably. Instead, I recommend Copilot's original suggestion: Remove the network dependency entirely and rely on the existing retry logic.

Your service already has excellent retry configuration:

Restart = "always";
RestartSec = "10s";

And the Go code has startup retry logic:

startupRetryInterval = 5 * time.Second

This is the correct Unix philosophy: let the service fail fast, and let systemd restart it.

Better approach - Add network health check:

Add an ExecStartPre that blocks until the Claude API is actually reachable:

Service = {
  ExecStartPre = pkgs.writeShellScript "wait-for-claude-api" ''
    for i in {1..30}; do
      if ${pkgs.curl}/bin/curl -s --max-time 3 https://api.anthropic.com/v1/health > /dev/null 2>&1; then
        exit 0
      fi
      sleep 1
    done
    # Don't fail - let the service try anyway
    exit 0
  '';
  ExecStart = "...";
};

This:

  • Actually tests reachability to the Claude API
  • Gives up after 30 seconds (same as claude-o-meter's timeout)
  • Doesn't fail the service startup (always exits 0)
  • Provides a real network health gate

Best long-term solution:

Upstream to Claude Code requesting:

  1. Faster timeout for /usage in non-interactive mode
  2. Better error messages when network requests fail
  3. A --timeout flag for /usage
  4. JSON output mode that includes error details

This would allow claude-o-meter to:

  • Use a shorter timeout (e.g., 10s instead of 30s)
  • Distinguish between "network unreachable" and other errors
  • Fail fast and let systemd retry

Verdict

My original assessment was wrong. This IS a network availability issue, but network-online.target can't fix it because:

  • The target is too vague about what "online" means
  • Your system reaches the target before DNS/routing is fully operational
  • The Claude API specifically is unreachable for ~30+ seconds after boot

Recommended path forward:

  1. ❌ Don't rely on network-online.target - it's insufficient
  2. ✅ Add ExecStartPre health check to api.anthropic.com (diagnostic + blocking)
  3. ✅ Keep the existing retry logic as fallback
  4. ✅ Upstream issue to Claude Code about faster failures

Would you like me to implement the ExecStartPre health check approach?


Copilot AI and others added 2 commits January 5, 2026 22:19
Clarify that Copilot mentions in comments should result in advisory
responses without code changes unless explicitly requested. This project
uses Copilot primarily for code reviews, not automatic code generation.

Co-authored-by: MartinLoeper <5209395+MartinLoeper@users.noreply.github.com>
docs: add Copilot interaction guidelines to prevent unnecessary commits
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants