2026-06-06 — Dan Billings

500 bytes instead of raw video: YOLO11 pose estimation as typed IaC

Dan Billings — 2026-06-06

Streaming video is expensive and leaks privacy. A single compressed frame is 100KB+. A skeleton of 17 keypoints is 500 bytes.

I built a typed Infrastructure-as-Code module in ansible-scala to deploy YOLO11 pose estimation on edge nodes, extracting human posture from a webcam feed and rendering it as a glowing neon skeleton. No raw frames leave the device.

The neon rendering pipeline

Standard pose estimators draw a static stick figure. I wanted something visually distinct: a neon glow.

The rendering uses a two-layer approach. First, a glow layer accumulates thick, colored limbs and joints. Then it's blurred with a Gaussian kernel. Finally, a core layer overlays thin, bright lines and precise joint circles on top of the blur. The result is a luminous skeleton that stands out against any background.

The colors are hardcoded for contrast: cyan for the torso, magenta for arms, green for legs, and white for joints. Head is rendered in bright cyan.

Privacy by design

Instead of streaming MJPEG or H.264, the edge node runs the YOLO model locally. It extracts the 17 COCO keypoints (nose, shoulders, elbows, wrists, hips, knees, ankles) and their confidence scores.

A JSON packet for a single person is roughly 500 bytes:

[
  [x1, y1, conf1],
  [x2, y2, conf2],
  ...
  [x17, y17, conf17]
]

The receiver sees motion, posture, and count — never faces or backgrounds. Perfect for occupancy monitoring, gesture control, or remote activity dashboards over low-bandwidth links.

Infrastructure as Code

The deployment isn't a manual script. It's a typed Scala DSL.

Yolo.setup(model) takes a YoloModel enum — no raw strings. Default is V11PoseM (yolo11m-pose.pt), the latest YOLO11 architecture from Ultralytics.

The InfraProgram resolves to:

Install CUDA and cuDNN.
Create a uv Python environment at ~/.local/share/yolo/.
Sync dependencies: ultralytics, torch, torchvision, opencv-python, numpy.
Write the inference script with the neon rendering logic baked in.

Running the playbook provisions the edge node in one shot. No Docker, no systemd unit files to manage manually — just the typed declaration.

Why YOLO11?

Ultralytics released YOLO11 with improved backbone efficiency and pose head accuracy. yolo11m-pose.pt strikes the right balance for a single GPU edge node: fast enough for real-time webcam inference, accurate enough to track all 17 keypoints reliably.

The ansible-scala module supports the full lineup from YOLOv8 to YOLO11, detection and pose, N through X sizes. Swapping models is changing one enum case and re-running the playbook.

See also: Type-Safe Home Cluster — how the typed DSL behind this module works.

The result

A privacy-preserving pose stream that looks like a Tron character. The edge node handles the heavy lifting — model inference and rendering — and only sends skeletal data upstream.

For a complete edge vision pipeline, this is the right abstraction. You get the visual feedback of video with the efficiency of telemetry.

← All writings · Home