Tracing requests across three GPUs and two operating systems: Jaeger without containers
Dan Billings — 2026-06-07
This post outlines the architecture and telemetry pathways required to configure full distributed tracing across a home LLM cluster composed of a macOS client (Hermes Agent), an Arch Linux service backend (Honcho API, nomic embeddings, PostgreSQL, and Jaeger), and a Windows/WSL2 inference runner (llama-server with a 5090).
1. Introduction: The Latency Problem
- The Symptom: Your local AI agent (Hermes + Honcho memory) feels sluggish. You upgraded the hardware (RTX 4090 on
danarch, RTX 5090 ondanwin), but responses still lag. - The Challenge: In a heterogeneous cluster (macOS client, Arch Linux backend services, WSL2/Windows inference nodes), looking at logs on a single machine doesn't show the full picture. You need to correlate client-side UI latency with server-side database queries and inference execution.
- The Solution: Distributed tracing with OpenTelemetry (OTel) and Jaeger, allowing you to follow a request from the user's keystroke on the Mac, through Honcho's PostgreSQL lookups, to llama-server's token generation.
2. System Architecture & Observability Map (PlantUML)
This diagram shows how components communicate and how they export trace data back to the central Jaeger instance.
3. Telemetry Configuration on danarch
- Deploying Jaeger: Running
jaeger-all-in-onev1.60.0 natively via systemd. It handles both OTLP span collection (port 4317) and the Query UI (port 16686). - Instrumenting Honcho:
- OpenTelemetry auto-instrumentation runs via
opentelemetry-instrumentwrapper, instrumenting FastAPI, database queries (SQLAlchemy/psycopg), and HTTP clients automatically. - Systemd configuration injection handles running
honcho-serverandhoncho-deriverwithOTEL_EXPORTER_OTLP_ENDPOINTpointing to Jaeger.
- OpenTelemetry auto-instrumentation runs via
4. Instrumenting the macOS Edge (dans-mac-mini)
- Context Propagation: Passing trace context via HTTP headers (
traceparent) from the client on Mac down to Honcho on Arch. - Agent Instrumentation: Installing OTel SDK python requirements into the Hermes virtual environment and wrapping agent invocation in the telemetry agent.
5. Interpreting the Jaeger Rich Traces
- Finding the Bottleneck: Explaining how to read trace timelines to isolate database overhead, LLM inference latency (
danwin), or network roundtrip delay. - Troubleshooting: Real-world troubleshooting (like identifying when a rogue server process bypasses instrumentation and leaves ports unmapped to Jaeger).