Models
Lantern uses capability-based routing to map abstract capability names to concrete LLM models. This means your agent code never hardcodes a vendor model name -- it specifies what kind of intelligence it needs, and the model router picks the best option at runtime.
Supported providers
Lantern supports the following LLM providers out of the box:
- Anthropic -- Claude Opus, Sonnet, Haiku
- OpenAI -- GPT-5, GPT-4o, GPT-4o-mini
- Google -- Gemini Ultra, Gemini Pro, Gemini Flash
Capability routing
Instead of specifying "gpt-4" or "claude-3-opus", you specify a capability:
| Capability | Description | Example models |
|---------------------|------------------------------------------|-----------------------------------|
| "auto" | Best model for each step (cost+quality) | Varies per step |
| "reasoning-large" | Maximum reasoning capability | Opus, GPT-5 |
| "reasoning-small" | Fast, cheap reasoning | Haiku, GPT-4o-mini, Gemini Flash |
| "code" | Optimized for code generation | Sonnet, GPT-4o |
| "vision" | Image understanding | Opus, GPT-4o, Gemini Ultra |
| "embedding" | Text embeddings | text-embedding-3, embed-v4 |How "auto" works
When you set model: "auto", the smart router scores every available model and picks the best one. The scoring considers:
- Quality (40%) — how capable the model is for complex reasoning
- Speed (30%) — response latency
- Cost efficiency (30%) — inverse of token pricing
Model scoring table
| Model | Provider | Quality | Speed | Cost (in/out per 1M) | Balanced Score |
|---|---|---|---|---|---|
| Claude Opus 4 | Anthropic | 10 | 4 | $15 / $75 | 53.4 |
| Claude Sonnet 4 | Anthropic | 9 | 7 | $3 / $15 | 62.7 |
| Claude Haiku 4 | Anthropic | 6 | 10 | $0.25 / $1.25 | 94.5 |
| GPT-4o | OpenAI | 8 | 8 | $2.50 / $10 | 63.9 |
| GPT-4o Mini | OpenAI | 5 | 10 | $0.15 / $0.60 | 155.3 |
Routing strategies
You can control the routing strategy via environment variable:
# Default — best balance of quality, speed, and cost
LANTERN_ROUTE_STRATEGY=balanced make run-api
# Cheapest available model (Haiku or GPT-4o-mini)
LANTERN_ROUTE_STRATEGY=cheap make run-api
# Highest quality regardless of cost (Opus 4 or GPT-4o)
LANTERN_ROUTE_STRATEGY=quality make run-api
# Fastest response time (Haiku or GPT-4o-mini)
LANTERN_ROUTE_STRATEGY=fast make run-api| Strategy | Formula | Best for |
|---|---|---|
balanced | Quality×4 + Speed×3 + CostEfficiency×3 | Production workloads |
cheap | 100 / (costIn + costOut + 1) | High-volume, cost-sensitive |
quality | Quality × 10 | Complex analysis, reasoning |
fast | Speed × 10 | Real-time chat, low latency |
Adding API keys
To use a provider, add your API key in the dashboard:
- Navigate to Settings > Models
- Click the provider you want to configure
- Enter your API key
- Click Save -- the key is encrypted at rest
[Screenshot: Model provider settings with API key input]
lantern.secret/... reference form.Multiple keys per provider
You can add multiple API keys for the same provider. The router will distribute requests across keys to avoid rate limits. You can also set a primary and fallback key.
Failover behavior
The model router handles provider failures automatically:
- Latency spike -- if a provider's response time exceeds the P95 threshold, subsequent requests route to an alternative
- 5xx errors -- immediate retry on an alternative provider
- Rate limits (429) -- exponential backoff with automatic rotation to another key or provider
- Timeout -- configurable per-step timeout with failover
Cost tracking
Every LLM call is metered and tracked. You can view per-agent and per-run cost breakdowns in the dashboard under Usage > Model costs. The router also provides recommendations for switching capabilities to save costs.
Overriding the router
If you need a specific model for a step (e.g., for compliance reasons), you can override the router in the SDK:
const result = await step("analyze", async () => {
return ctx.llm.complete({
messages: [...],
capability: "reasoning-large",
provider: "anthropic", // Force Anthropic
model: "claude-opus-4", // Force a specific model
});
});