Architecture

From signal to execution: polybot's NNG pipeline

How a strategy signal becomes an on-venue order, traced end-to-end through polybot's async service mesh. The design trade-offs, and why NNG instead of Redis or in-process queues.

Published Apr 9, 2026

polybot’s services communicate via NNG (nanomsg-next-gen) pub/sub and push/pull channels, not direct function calls. New operators ask why. The answer is in what happens when a strategy emits a signal — traced through five processes in under a second.

The actors

Scanner service: Subscribes to venue WebSockets, normalises into PriceUpdate / MarketEvent, republishes to the event bus.
Strategy services: Subscribe to events, run strategy logic, emit Signals.
Risk service: Validates signals, converts to Orders.
Executor service: Submits orders to venues, tracks lifecycle, emits Fill events.
Analytics service: Subscribes to everything, writes to DuckDB.

All communicate through NNG sockets. Each is an independent OS process (or container) that can crash and restart without taking down the rest.

A trace

T+0ms: Polymarket WebSocket emits a book update for market X.
T+5ms: Scanner normalises to a BookUpdate, publishes on the prices PUB socket.
T+6ms: arbitrage strategy (subscriber) receives it, computes edge. Edge > threshold. Emits Signal(market=X, side=YES, size=$80) on the signals PUSH socket.
T+8ms: Risk service (PULL side) receives the signal. Checks limits. OK. Converts to Order. Pushes on orders PUSH socket.
T+10ms: Executor pulls the order. Serialises to Polymarket’s CLOB format. Signs with EIP-712. Submits via py-clob-client.
T+300ms: Venue ACKs (CLOB latency). Executor emits OrderAck on the executions PUB socket.
T+350ms: Executor also pushes the paired NO leg (from the original Signal pair).
T+600ms: Both legs fill. Executor emits Fill events. Analytics consumes, writes to DuckDB. Strategies receive, update internal state.

Total elapsed: ~600ms, dominated by on-chain venue latency. polybot’s in-process time is ~10ms.

Why NNG and not alternatives

Not direct function calls

A monolithic async app can call risk.check(signal) directly. Why split processes?

Fault isolation. If the executor hangs on a misbehaving venue, the scanner keeps producing prices. Strategies keep generating signals (which buffer or drop per backpressure policy). Risk keeps rejecting what it should reject. A service boundary is a blast-radius wall.
Independent deployment. You can restart the executor to ship a venue fix without disturbing analytics. Strategies can hot-reload with zero downtime on the scanner.
Concurrent scaling. Scanner and executor are I/O-bound; analytics is CPU-bound. Running them as separate processes lets each use the right runtime primitives without compromise.

Not Redis pub/sub

Redis was considered. NNG wins because:

No external broker. One less production dependency to run, upgrade, and alert on.
Lower latency. In-kernel IPC (ipc://) shaves hundreds of microseconds in the hot path.
Exact semantics. PUSH/PULL is exactly-once round-robin; PUB/SUB is at-most-once fan-out. Redis’s streams approximate both but with operational cost.
No message persistence. polybot is a real-time system. Messages older than a few seconds are useless. Persistence is anti-feature noise.

Not Kafka

Overkill. polybot’s message rate is < 10k msgs/sec at peak. Kafka is the right answer at 1M msgs/sec with replay requirements. We don’t have either.

Not ZeroMQ

NNG is a successor to ZeroMQ from the same author, fixing design issues (in-process threading, socket lifecycle). For greenfield Python code, NNG + pynng is the cleaner path.

Socket taxonomy

prices      PUB (scanner)  → SUB (strategies, analytics)
events      PUB (scanner)  → SUB (strategies, analytics)
signals     PUSH (strategies) → PULL (risk)                  [round-robin]
orders      PUSH (risk)    → PULL (executor)                 [round-robin]
executions  PUB (executor) → SUB (strategies, analytics)
control     REQ (cli)      → REP (services)                  [request/reply]

PUSH/PULL is used where every message must be processed exactly once (signals, orders). PUB/SUB is used where fan-out is desirable (prices, executions). REQ/REP is for the CLI to query state.

Backpressure

Each service publishes a lag metric to Prometheus. If the executor’s inbound queue grows beyond a threshold, strategies get a throttle signal on the control channel and slow down signal emission. This is cooperative backpressure — not bulletproof, but enough in practice. Hard backpressure (blocking publishers) was rejected because a slow executor should never block the scanner.

What this lets you do

Swap the executor. A backtest replays recorded prices and events into the same strategies; the “executor” is a mock that records signals to a file. Strategies don’t know.
Add new consumers. A compliance service can subscribe to the orders and executions channels without touching anything else. That’s how the audit log works.
Scale horizontally. Run N strategy processes, each handling disjoint markets. The risk service PULLs from all of them via a shared socket.

What this costs you

Process lifecycle management. You need systemd, Docker, or something to keep services up. polybot ships a docker-compose.yml and a polybot start supervisor that handles this.
Debugging is harder. A bug in the signal-to-order path involves at least three services. polybot’s distributed tracing (optional, via OpenTelemetry) is the answer — if you use it.
More moving parts. A monolith is simpler to reason about for small deployments. If you’re running a single strategy on your laptop, the service mesh is overkill. We ship it anyway because at $X/month in capital the simpler thing stops being simpler.

The design takeaway

Pick your coupling at the right layer. For polybot, the right layer was “services communicate by messages, not calls”. It cost us complexity in deployment and won us fault isolation, hot-reload, independent scaling, and a trivial backtesting story.

If you’re building a real-time agent system — prediction markets, algorithmic trading, any domain with async venue I/O and strict risk invariants — it’s the layout worth copying.

Need an agent system built like this?

Cryptuon builds production AI agents, MCP integrations, and trading systems. polybot is our open-source showcase.

Book a 30-min intro call → Email contact@cryptuon.com