LanternDOCS

Models

Lantern uses capability-based routing to map abstract capability names to concrete LLM models. This means your agent code never hardcodes a vendor model name -- it specifies what kind of intelligence it needs, and the model router picks the best option at runtime.

Supported providers

Lantern supports the following LLM providers out of the box:

  • Anthropic -- Claude Opus, Sonnet, Haiku
  • OpenAI -- GPT-5, GPT-4o, GPT-4o-mini
  • Google -- Gemini Ultra, Gemini Pro, Gemini Flash
Note: Additional providers (Mistral, Cohere, open-source models via vLLM) are on the roadmap. You can also add custom providers via the SDK.

Capability routing

Instead of specifying "gpt-4" or "claude-3-opus", you specify a capability:

| Capability          | Description                              | Example models                    |
|---------------------|------------------------------------------|-----------------------------------|
| "auto"              | Best model for each step (cost+quality)  | Varies per step                   |
| "reasoning-large"   | Maximum reasoning capability             | Opus, GPT-5                       |
| "reasoning-small"   | Fast, cheap reasoning                    | Haiku, GPT-4o-mini, Gemini Flash  |
| "code"              | Optimized for code generation            | Sonnet, GPT-4o                    |
| "vision"            | Image understanding                      | Opus, GPT-4o, Gemini Ultra        |
| "embedding"         | Text embeddings                          | text-embedding-3, embed-v4        |

How "auto" works

When you set model: "auto", the smart router scores every available model and picks the best one. The scoring considers:

  • Quality (40%) — how capable the model is for complex reasoning
  • Speed (30%) — response latency
  • Cost efficiency (30%) — inverse of token pricing

Model scoring table

ModelProviderQualitySpeedCost (in/out per 1M)Balanced Score
Claude Opus 4Anthropic104$15 / $7553.4
Claude Sonnet 4Anthropic97$3 / $1562.7
Claude Haiku 4Anthropic610$0.25 / $1.2594.5
GPT-4oOpenAI88$2.50 / $1063.9
GPT-4o MiniOpenAI510$0.15 / $0.60155.3
Note: The router only considers models whose provider API key you have configured. If you only have an Anthropic key, it picks from Claude models only.

Routing strategies

You can control the routing strategy via environment variable:

# Default — best balance of quality, speed, and cost
LANTERN_ROUTE_STRATEGY=balanced make run-api

# Cheapest available model (Haiku or GPT-4o-mini)
LANTERN_ROUTE_STRATEGY=cheap make run-api

# Highest quality regardless of cost (Opus 4 or GPT-4o)
LANTERN_ROUTE_STRATEGY=quality make run-api

# Fastest response time (Haiku or GPT-4o-mini)
LANTERN_ROUTE_STRATEGY=fast make run-api
StrategyFormulaBest for
balancedQuality×4 + Speed×3 + CostEfficiency×3Production workloads
cheap100 / (costIn + costOut + 1)High-volume, cost-sensitive
qualityQuality × 10Complex analysis, reasoning
fastSpeed × 10Real-time chat, low latency
Tip: In practice, "auto" with balanced routing saves 40-60% on LLM costs compared to always using the most powerful model. Most agent steps are simple enough for a smaller, faster model.

Adding API keys

To use a provider, add your API key in the dashboard:

  1. Navigate to Settings > Models
  2. Click the provider you want to configure
  3. Enter your API key
  4. Click Save -- the key is encrypted at rest

[Screenshot: Model provider settings with API key input]

Warning: API keys are stored encrypted and never appear in logs, traces, or run state. They are resolved at execution time inside the microVM using the lantern.secret/... reference form.

Multiple keys per provider

You can add multiple API keys for the same provider. The router will distribute requests across keys to avoid rate limits. You can also set a primary and fallback key.

Failover behavior

The model router handles provider failures automatically:

  • Latency spike -- if a provider's response time exceeds the P95 threshold, subsequent requests route to an alternative
  • 5xx errors -- immediate retry on an alternative provider
  • Rate limits (429) -- exponential backoff with automatic rotation to another key or provider
  • Timeout -- configurable per-step timeout with failover
Note: Failover is transparent to the agent. The step receives the response regardless of which provider actually served it. The run trace shows which provider was used for observability.

Cost tracking

Every LLM call is metered and tracked. You can view per-agent and per-run cost breakdowns in the dashboard under Usage > Model costs. The router also provides recommendations for switching capabilities to save costs.

Overriding the router

If you need a specific model for a step (e.g., for compliance reasons), you can override the router in the SDK:

const result = await step("analyze", async () => {
  return ctx.llm.complete({
    messages: [...],
    capability: "reasoning-large",
    provider: "anthropic",  // Force Anthropic
    model: "claude-opus-4", // Force a specific model
  });
});
Warning: Overriding the router bypasses failover and cost optimization. Use this only when you have a specific reason (e.g., regulatory requirements for a particular provider).