Agents & MCP
Writing a custom AI plugin for probability prediction
polybot's AIModelPlugin is 30 lines of interface. Here's how to wire up your own model — local, remote, or fine-tuned — and feed it into the ai_model strategy.
Published Apr 12, 2026
The built-in plugins (Anthropic, OpenAI, Perplexity) are useful, but the real power of polybot is that AIModelPlugin is a simple interface you can implement yourself. This guide builds one from scratch: a local classifier that predicts probability on political markets, and plugs into the ai_model strategy.
The interface
# src/polybot/plugins/base.py
from abc import ABC, abstractmethod
from polybot.models import Market
class AIModelPlugin(ABC):
name: str
@abstractmethod
async def probability(self, market: Market) -> tuple[float, float]:
"""Return (probability, confidence), both in [0, 1]."""
...
async def warmup(self) -> None:
"""Optional: load models, open connections."""
pass
async def shutdown(self) -> None:
pass
That’s it. Implementations live in src/polybot/plugins/<yourname>.py and are registered in polybot/plugins/__init__.py.
Step 1: build the plugin
We’ll build my_classifier.py, backed by a local scikit-learn model.
# src/polybot/plugins/my_classifier.py
import joblib
from pathlib import Path
from polybot.plugins.base import AIModelPlugin
from polybot.models import Market
class MyClassifierPlugin(AIModelPlugin):
name = "my_classifier"
def __init__(self, model_path: str):
self.model_path = Path(model_path)
self.model = None
self.feature_extractor = None
async def warmup(self) -> None:
payload = joblib.load(self.model_path)
self.model = payload["model"]
self.feature_extractor = payload["features"]
async def probability(self, market: Market) -> tuple[float, float]:
features = self.feature_extractor(market)
prob = float(self.model.predict_proba([features])[0][1])
confidence = self._confidence_from_features(features)
return prob, confidence
def _confidence_from_features(self, features) -> float:
# example: confidence shrinks if features fall outside training distribution
z_scores = features.z_scores()
if max(abs(z) for z in z_scores) > 3:
return 0.3
return 0.8
Step 2: register it
# src/polybot/plugins/__init__.py
from .my_classifier import MyClassifierPlugin
REGISTRY = {
"anthropic": AnthropicPlugin,
"openai": OpenAIPlugin,
"perplexity": PerplexityPlugin,
"my_classifier": MyClassifierPlugin,
}
Step 3: enable it
polybot plugin enable my_classifier --model-path /srv/models/politics_v3.joblib
polybot plugin list
polybot plugin test my_classifier --market-id politics-iowa-caucus-2028
polybot plugin test calls probability() on one market and prints the result. Useful for debugging before you point a strategy at it.
Step 4: wire to a strategy
polybot strategy config ai_model --plugin my_classifier
polybot strategy shadow ai_model --enable
polybot start
Run for a week. Inspect the calibration report:
polybot strategy report ai_model --calibration --window 7d
You’ll get a chart of predicted probability vs. realised outcome bucket. A well-calibrated model plots near the diagonal. A miscalibrated one shows systematic over- or under-confidence — retrain before going live.
Advanced: LLM-backed plugin with caching
If you’re wrapping a remote LLM, two things matter: prompt caching and rate limiting. Here’s a sketch with Anthropic’s SDK:
from anthropic import AsyncAnthropic
from polybot.plugins.base import AIModelPlugin
class MyLLMPlugin(AIModelPlugin):
name = "my_llm"
def __init__(self, model="claude-sonnet-4-6"):
self.client = AsyncAnthropic()
self.model = model
async def probability(self, market):
system = [
{
"type": "text",
"text": self._system_prompt(),
"cache_control": {"type": "ephemeral"},
}
]
response = await self.client.messages.create(
model=self.model,
max_tokens=200,
system=system,
messages=[{"role": "user", "content": self._user_prompt(market)}],
)
data = self._parse(response)
return data["probability"], data["confidence"]
Prompt caching is critical here — the system prompt doesn’t change per market, so cache_control on it saves 80–90% of the token cost. polybot’s built-in LLMPlugin does this; you should too.
Gotchas
- Probabilities outside [0, 1]. Clamp, don’t raise. A misbehaving model should fail closed (return 0.5, low confidence), not take down the strategy.
- Latency variance. If your model takes 10+ seconds on tail cases, mark them low-confidence and move on. polybot’s
ai_modelstrategy has a per-call timeout; honour it. - Feature drift. Markets evolve. A model trained on 2024 data may miscalibrate on 2026 elections. Re-train on rolling windows; automate with a scheduled
polybot plugin retrainhook you define. - Token budgets. If the plugin is LLM-backed, add a cost metric. polybot’s risk service enforces per-strategy token budgets when the plugin reports cost.
What’s next
- Patch
src/polybot/plugins/my_classifier.pyinto your fork, PR it to contrib if it’s useful to others. - Chain plugins: a cheap local classifier screens markets, a remote LLM scores the short list. polybot’s
ensembleplugin pattern shows how. - Write a calibration eval harness — shadow performance over months is the ground truth.
Need an agent system built like this?
Cryptuon builds production AI agents, MCP integrations, and trading systems. polybot is our open-source showcase.