2026-06-07 — Dan Billings

Tracing requests across three GPUs and two operating systems: Jaeger without containers

Dan Billings — 2026-06-07

This post outlines the architecture and telemetry pathways required to configure full distributed tracing across a home LLM cluster composed of a macOS client (Hermes Agent), an Arch Linux service backend (Honcho API, nomic embeddings, PostgreSQL, and Jaeger), and a Windows/WSL2 inference runner (llama-server with a 5090).

1. Introduction: The Latency Problem

The Symptom: Your local AI agent (Hermes + Honcho memory) feels sluggish. You upgraded the hardware (RTX 4090 on danarch, RTX 5090 on danwin), but responses still lag.
The Challenge: In a heterogeneous cluster (macOS client, Arch Linux backend services, WSL2/Windows inference nodes), looking at logs on a single machine doesn't show the full picture. You need to correlate client-side UI latency with server-side database queries and inference execution.
The Solution: Distributed tracing with OpenTelemetry (OTel) and Jaeger, allowing you to follow a request from the user's keystroke on the Mac, through Honcho's PostgreSQL lookups, to llama-server's token generation.

2. System Architecture & Observability Map (PlantUML)

This diagram shows how components communicate and how they export trace data back to the central Jaeger instance.

Open full-screen PNG diagram

3. Telemetry Configuration on `danarch`

Deploying Jaeger: Running jaeger-all-in-one v1.60.0 natively via systemd. It handles both OTLP span collection (port 4317) and the Query UI (port 16686).
Instrumenting Honcho:
- OpenTelemetry auto-instrumentation runs via opentelemetry-instrument wrapper, instrumenting FastAPI, database queries (SQLAlchemy/psycopg), and HTTP clients automatically.
- Systemd configuration injection handles running honcho-server and honcho-deriver with OTEL_EXPORTER_OTLP_ENDPOINT pointing to Jaeger.

4. Instrumenting the macOS Edge (`dans-mac-mini`)

Context Propagation: Passing trace context via HTTP headers (traceparent) from the client on Mac down to Honcho on Arch.
Agent Instrumentation: Installing OTel SDK python requirements into the Hermes virtual environment and wrapping agent invocation in the telemetry agent.

5. Interpreting the Jaeger Rich Traces

Finding the Bottleneck: Explaining how to read trace timelines to isolate database overhead, LLM inference latency (danwin), or network roundtrip delay.
Troubleshooting: Real-world troubleshooting (like identifying when a rogue server process bypasses instrumentation and leaves ports unmapped to Jaeger).

← All writings · Home