Architecture
From signal to execution: polybot's NNG pipeline
How a strategy signal becomes an on-venue order, traced end-to-end through polybot's async service mesh. The design trade-offs, and why NNG instead of Redis or in-process queues.
Published Apr 9, 2026
polybot’s services communicate via NNG (nanomsg-next-gen) pub/sub and push/pull channels, not direct function calls. New operators ask why. The answer is in what happens when a strategy emits a signal — traced through five processes in under a second.
The actors
- Scanner service: Subscribes to venue WebSockets, normalises into
PriceUpdate/MarketEvent, republishes to the event bus. - Strategy services: Subscribe to events, run strategy logic, emit
Signals. - Risk service: Validates signals, converts to
Orders. - Executor service: Submits orders to venues, tracks lifecycle, emits
Fillevents. - Analytics service: Subscribes to everything, writes to DuckDB.
All communicate through NNG sockets. Each is an independent OS process (or container) that can crash and restart without taking down the rest.
A trace
- T+0ms: Polymarket WebSocket emits a book update for market X.
- T+5ms: Scanner normalises to a
BookUpdate, publishes on thepricesPUB socket. - T+6ms:
arbitragestrategy (subscriber) receives it, computes edge. Edge > threshold. EmitsSignal(market=X, side=YES, size=$80)on thesignalsPUSH socket. - T+8ms: Risk service (PULL side) receives the signal. Checks limits. OK. Converts to
Order. Pushes onordersPUSH socket. - T+10ms: Executor pulls the order. Serialises to Polymarket’s CLOB format. Signs with EIP-712. Submits via py-clob-client.
- T+300ms: Venue ACKs (CLOB latency). Executor emits
OrderAckon theexecutionsPUB socket. - T+350ms: Executor also pushes the paired NO leg (from the original
Signalpair). - T+600ms: Both legs fill. Executor emits
Fillevents. Analytics consumes, writes to DuckDB. Strategies receive, update internal state.
Total elapsed: ~600ms, dominated by on-chain venue latency. polybot’s in-process time is ~10ms.
Why NNG and not alternatives
Not direct function calls
A monolithic async app can call risk.check(signal) directly. Why split processes?
- Fault isolation. If the executor hangs on a misbehaving venue, the scanner keeps producing prices. Strategies keep generating signals (which buffer or drop per backpressure policy). Risk keeps rejecting what it should reject. A service boundary is a blast-radius wall.
- Independent deployment. You can restart the executor to ship a venue fix without disturbing analytics. Strategies can hot-reload with zero downtime on the scanner.
- Concurrent scaling. Scanner and executor are I/O-bound; analytics is CPU-bound. Running them as separate processes lets each use the right runtime primitives without compromise.
Not Redis pub/sub
Redis was considered. NNG wins because:
- No external broker. One less production dependency to run, upgrade, and alert on.
- Lower latency. In-kernel IPC (ipc://) shaves hundreds of microseconds in the hot path.
- Exact semantics. PUSH/PULL is exactly-once round-robin; PUB/SUB is at-most-once fan-out. Redis’s streams approximate both but with operational cost.
- No message persistence. polybot is a real-time system. Messages older than a few seconds are useless. Persistence is anti-feature noise.
Not Kafka
Overkill. polybot’s message rate is < 10k msgs/sec at peak. Kafka is the right answer at 1M msgs/sec with replay requirements. We don’t have either.
Not ZeroMQ
NNG is a successor to ZeroMQ from the same author, fixing design issues (in-process threading, socket lifecycle). For greenfield Python code, NNG + pynng is the cleaner path.
Socket taxonomy
prices PUB (scanner) → SUB (strategies, analytics)
events PUB (scanner) → SUB (strategies, analytics)
signals PUSH (strategies) → PULL (risk) [round-robin]
orders PUSH (risk) → PULL (executor) [round-robin]
executions PUB (executor) → SUB (strategies, analytics)
control REQ (cli) → REP (services) [request/reply]
PUSH/PULL is used where every message must be processed exactly once (signals, orders). PUB/SUB is used where fan-out is desirable (prices, executions). REQ/REP is for the CLI to query state.
Backpressure
Each service publishes a lag metric to Prometheus. If the executor’s inbound queue grows beyond a threshold, strategies get a throttle signal on the control channel and slow down signal emission. This is cooperative backpressure — not bulletproof, but enough in practice. Hard backpressure (blocking publishers) was rejected because a slow executor should never block the scanner.
What this lets you do
- Swap the executor. A backtest replays recorded
pricesandeventsinto the same strategies; the “executor” is a mock that records signals to a file. Strategies don’t know. - Add new consumers. A compliance service can
subscribeto theordersandexecutionschannels without touching anything else. That’s how the audit log works. - Scale horizontally. Run N strategy processes, each handling disjoint markets. The risk service PULLs from all of them via a shared socket.
What this costs you
- Process lifecycle management. You need systemd, Docker, or something to keep services up. polybot ships a
docker-compose.ymland apolybot startsupervisor that handles this. - Debugging is harder. A bug in the signal-to-order path involves at least three services. polybot’s distributed tracing (optional, via OpenTelemetry) is the answer — if you use it.
- More moving parts. A monolith is simpler to reason about for small deployments. If you’re running a single strategy on your laptop, the service mesh is overkill. We ship it anyway because at $X/month in capital the simpler thing stops being simpler.
The design takeaway
Pick your coupling at the right layer. For polybot, the right layer was “services communicate by messages, not calls”. It cost us complexity in deployment and won us fault isolation, hot-reload, independent scaling, and a trivial backtesting story.
If you’re building a real-time agent system — prediction markets, algorithmic trading, any domain with async venue I/O and strict risk invariants — it’s the layout worth copying.
Need an agent system built like this?
Cryptuon builds production AI agents, MCP integrations, and trading systems. polybot is our open-source showcase.