Models

Lantern uses capability-based routing to map abstract capability names to concrete LLM models. This means your agent code never hardcodes a vendor model name -- it specifies what kind of intelligence it needs, and the model router picks the best option at runtime.

Supported providers

Lantern supports the following LLM providers out of the box:

Anthropic -- Claude Opus, Sonnet, Haiku
OpenAI -- GPT-5, GPT-4o, GPT-4o-mini
Google -- Gemini Ultra, Gemini Pro, Gemini Flash

Note: Additional providers (Mistral, Cohere, open-source models via vLLM) are on the roadmap. You can also add custom providers via the SDK.

Capability routing

Instead of specifying "gpt-4" or "claude-3-opus", you specify a capability:

| Capability          | Description                              | Example models                    |
|---------------------|------------------------------------------|-----------------------------------|
| "auto"              | Best model for each step (cost+quality)  | Varies per step                   |
| "reasoning-large"   | Maximum reasoning capability             | Opus, GPT-5                       |
| "reasoning-small"   | Fast, cheap reasoning                    | Haiku, GPT-4o-mini, Gemini Flash  |
| "code"              | Optimized for code generation            | Sonnet, GPT-4o                    |
| "vision"            | Image understanding                      | Opus, GPT-4o, Gemini Ultra        |
| "embedding"         | Text embeddings                          | text-embedding-3, embed-v4        |

How "auto" works

When you set model: "auto", the smart router scores every available model and picks the best one. The scoring considers:

Quality (40%) — how capable the model is for complex reasoning
Speed (30%) — response latency
Cost efficiency (30%) — inverse of token pricing

Model scoring table

Model	Provider	Quality	Speed	Cost (in/out per 1M)	Balanced Score
Claude Opus 4	Anthropic	10	4	$15 / $75	53.4
Claude Sonnet 4	Anthropic	9	7	$3 / $15	62.7
Claude Haiku 4	Anthropic	6	10	$0.25 / $1.25	94.5
GPT-4o	OpenAI	8	8	$2.50 / $10	63.9
GPT-4o Mini	OpenAI	5	10	$0.15 / $0.60	155.3

Note: The router only considers models whose provider API key you have configured. If you only have an Anthropic key, it picks from Claude models only.

Routing strategies

You can control the routing strategy via environment variable:

# Default — best balance of quality, speed, and cost
LANTERN_ROUTE_STRATEGY=balanced make run-api

# Cheapest available model (Haiku or GPT-4o-mini)
LANTERN_ROUTE_STRATEGY=cheap make run-api

# Highest quality regardless of cost (Opus 4 or GPT-4o)
LANTERN_ROUTE_STRATEGY=quality make run-api

# Fastest response time (Haiku or GPT-4o-mini)
LANTERN_ROUTE_STRATEGY=fast make run-api

Strategy	Formula	Best for
`balanced`	Quality×4 + Speed×3 + CostEfficiency×3	Production workloads
`cheap`	100 / (costIn + costOut + 1)	High-volume, cost-sensitive
`quality`	Quality × 10	Complex analysis, reasoning
`fast`	Speed × 10	Real-time chat, low latency

Tip: In practice, "auto" with balanced routing saves 40-60% on LLM costs compared to always using the most powerful model. Most agent steps are simple enough for a smaller, faster model.

Adding API keys

To use a provider, add your API key in the dashboard:

Navigate to Settings > Models
Click the provider you want to configure
Enter your API key
Click Save -- the key is encrypted at rest

[Screenshot: Model provider settings with API key input]

Warning: API keys are stored encrypted and never appear in logs, traces, or run state. They are resolved at execution time inside the microVM using the lantern.secret/... reference form.

Multiple keys per provider

You can add multiple API keys for the same provider. The router will distribute requests across keys to avoid rate limits. You can also set a primary and fallback key.

Failover behavior

The model router handles provider failures automatically:

Latency spike -- if a provider's response time exceeds the P95 threshold, subsequent requests route to an alternative
5xx errors -- immediate retry on an alternative provider
Rate limits (429) -- exponential backoff with automatic rotation to another key or provider
Timeout -- configurable per-step timeout with failover

Note: Failover is transparent to the agent. The step receives the response regardless of which provider actually served it. The run trace shows which provider was used for observability.

Cost tracking

Every LLM call is metered and tracked. You can view per-agent and per-run cost breakdowns in the dashboard under Usage > Model costs. The router also provides recommendations for switching capabilities to save costs.

Overriding the router

If you need a specific model for a step (e.g., for compliance reasons), you can override the router in the SDK:

const result = await step("analyze", async () => {
  return ctx.llm.complete({
    messages: [...],
    capability: "reasoning-large",
    provider: "anthropic",  // Force Anthropic
    model: "claude-opus-4", // Force a specific model
  });
});

Warning: Overriding the router bypasses failover and cost optimization. Use this only when you have a specific reason (e.g., regulatory requirements for a particular provider).