Agents & MCP

Writing a custom AI plugin for probability prediction

polybot's AIModelPlugin is 30 lines of interface. Here's how to wire up your own model — local, remote, or fine-tuned — and feed it into the ai_model strategy.

Published Apr 12, 2026

The built-in plugins (Anthropic, OpenAI, Perplexity) are useful, but the real power of polybot is that AIModelPlugin is a simple interface you can implement yourself. This guide builds one from scratch: a local classifier that predicts probability on political markets, and plugs into the ai_model strategy.

The interface

# src/polybot/plugins/base.py
from abc import ABC, abstractmethod
from polybot.models import Market

class AIModelPlugin(ABC):
    name: str

    @abstractmethod
    async def probability(self, market: Market) -> tuple[float, float]:
        """Return (probability, confidence), both in [0, 1]."""
        ...

    async def warmup(self) -> None:
        """Optional: load models, open connections."""
        pass

    async def shutdown(self) -> None:
        pass

That’s it. Implementations live in src/polybot/plugins/<yourname>.py and are registered in polybot/plugins/__init__.py.

Step 1: build the plugin

We’ll build my_classifier.py, backed by a local scikit-learn model.

# src/polybot/plugins/my_classifier.py
import joblib
from pathlib import Path
from polybot.plugins.base import AIModelPlugin
from polybot.models import Market

class MyClassifierPlugin(AIModelPlugin):
    name = "my_classifier"

    def __init__(self, model_path: str):
        self.model_path = Path(model_path)
        self.model = None
        self.feature_extractor = None

    async def warmup(self) -> None:
        payload = joblib.load(self.model_path)
        self.model = payload["model"]
        self.feature_extractor = payload["features"]

    async def probability(self, market: Market) -> tuple[float, float]:
        features = self.feature_extractor(market)
        prob = float(self.model.predict_proba([features])[0][1])
        confidence = self._confidence_from_features(features)
        return prob, confidence

    def _confidence_from_features(self, features) -> float:
        # example: confidence shrinks if features fall outside training distribution
        z_scores = features.z_scores()
        if max(abs(z) for z in z_scores) > 3:
            return 0.3
        return 0.8

Step 2: register it

# src/polybot/plugins/__init__.py
from .my_classifier import MyClassifierPlugin

REGISTRY = {
    "anthropic": AnthropicPlugin,
    "openai": OpenAIPlugin,
    "perplexity": PerplexityPlugin,
    "my_classifier": MyClassifierPlugin,
}

Step 3: enable it

polybot plugin enable my_classifier --model-path /srv/models/politics_v3.joblib
polybot plugin list
polybot plugin test my_classifier --market-id politics-iowa-caucus-2028

polybot plugin test calls probability() on one market and prints the result. Useful for debugging before you point a strategy at it.

Step 4: wire to a strategy

polybot strategy config ai_model --plugin my_classifier
polybot strategy shadow ai_model --enable
polybot start

Run for a week. Inspect the calibration report:

polybot strategy report ai_model --calibration --window 7d

You’ll get a chart of predicted probability vs. realised outcome bucket. A well-calibrated model plots near the diagonal. A miscalibrated one shows systematic over- or under-confidence — retrain before going live.

Advanced: LLM-backed plugin with caching

If you’re wrapping a remote LLM, two things matter: prompt caching and rate limiting. Here’s a sketch with Anthropic’s SDK:

from anthropic import AsyncAnthropic
from polybot.plugins.base import AIModelPlugin

class MyLLMPlugin(AIModelPlugin):
    name = "my_llm"

    def __init__(self, model="claude-sonnet-4-6"):
        self.client = AsyncAnthropic()
        self.model = model

    async def probability(self, market):
        system = [
            {
                "type": "text",
                "text": self._system_prompt(),
                "cache_control": {"type": "ephemeral"},
            }
        ]
        response = await self.client.messages.create(
            model=self.model,
            max_tokens=200,
            system=system,
            messages=[{"role": "user", "content": self._user_prompt(market)}],
        )
        data = self._parse(response)
        return data["probability"], data["confidence"]

Prompt caching is critical here — the system prompt doesn’t change per market, so cache_control on it saves 80–90% of the token cost. polybot’s built-in LLMPlugin does this; you should too.

Gotchas

Probabilities outside [0, 1]. Clamp, don’t raise. A misbehaving model should fail closed (return 0.5, low confidence), not take down the strategy.
Latency variance. If your model takes 10+ seconds on tail cases, mark them low-confidence and move on. polybot’s ai_model strategy has a per-call timeout; honour it.
Feature drift. Markets evolve. A model trained on 2024 data may miscalibrate on 2026 elections. Re-train on rolling windows; automate with a scheduled polybot plugin retrain hook you define.
Token budgets. If the plugin is LLM-backed, add a cost metric. polybot’s risk service enforces per-strategy token budgets when the plugin reports cost.

What’s next

Patch src/polybot/plugins/my_classifier.py into your fork, PR it to contrib if it’s useful to others.
Chain plugins: a cheap local classifier screens markets, a remote LLM scores the short list. polybot’s ensemble plugin pattern shows how.
Write a calibration eval harness — shadow performance over months is the ground truth.

Need an agent system built like this?

Cryptuon builds production AI agents, MCP integrations, and trading systems. polybot is our open-source showcase.

Book a 30-min intro call → Email contact@cryptuon.com