14 KiB
Claude API — TypeScript
Installation
npm install @anthropic-ai/sdk
Client Initialization
import Anthropic from "@anthropic-ai/sdk";
// Default — resolves credentials from the environment:
// ANTHROPIC_API_KEY, or ANTHROPIC_AUTH_TOKEN, or an `ant auth login` profile.
// Prefer this for local dev; don't hardcode a key.
const client = new Anthropic();
// Explicit API key (only when you must inject a specific key)
const client = new Anthropic({ apiKey: "your-api-key" });
Basic Message Request
const response = await client.messages.create({
model: "{{OPUS_ID}}",
max_tokens: 16000,
messages: [{ role: "user", content: "What is the capital of France?" }],
});
// response.content is ContentBlock[] — a discriminated union. Narrow by .type
// before accessing .text (TypeScript will error on content[0].text without this).
for (const block of response.content) {
if (block.type === "text") {
console.log(block.text);
}
}
System Prompts
const response = await client.messages.create({
model: "{{OPUS_ID}}",
max_tokens: 16000,
system:
"You are a helpful coding assistant. Always provide examples in Python.",
messages: [{ role: "user", content: "How do I read a JSON file?" }],
});
Mid-conversation system messages (beta, model-gated)
For operator instructions that arrive mid-conversation (mode switches, injected state), append {role: "system", ...} to messages instead of editing top-level system — this preserves the cached prefix and carries operator authority. Must follow a user message; cannot be messages[0]. Unsupported models return a 400 (role 'system' is not supported on this model). See shared/prompt-caching.md for when to use this vs. top-level system.
// SDK types for role:"system" in messages are pending — pass the beta header
// directly until the SDK updates, then switch to client.beta.messages.create
// with betas: ["mid-conversation-system-2026-04-07"].
const response = await client.messages.create(
{
model: MODEL_ID, // must support mid-conversation system messages
max_tokens: 16000,
system: [
{ type: "text", text: STABLE_SYSTEM, cache_control: { type: "ephemeral" } },
],
messages: [
...history,
{ role: "user", content: userMessage },
// @ts-expect-error — role:"system" pending SDK types
{ role: "system", content: "Terse mode enabled — keep responses under 40 words." },
],
},
{ headers: { "anthropic-beta": "mid-conversation-system-2026-04-07" } },
);
Vision (Images)
URL
const response = await client.messages.create({
model: "{{OPUS_ID}}",
max_tokens: 16000,
messages: [
{
role: "user",
content: [
{
type: "image",
source: { type: "url", url: "https://example.com/image.png" },
},
{ type: "text", text: "Describe this image" },
],
},
],
});
Base64
import fs from "fs";
const imageData = fs.readFileSync("image.png").toString("base64");
const response = await client.messages.create({
model: "{{OPUS_ID}}",
max_tokens: 16000,
messages: [
{
role: "user",
content: [
{
type: "image",
source: { type: "base64", media_type: "image/png", data: imageData },
},
{ type: "text", text: "What's in this image?" },
],
},
],
});
Prompt Caching
Caching is a prefix match — any byte change anywhere in the prefix invalidates everything after it. For placement patterns, architectural guidance (frozen system prompt, deterministic tool order, where to put volatile content), and the silent-invalidator audit checklist, read shared/prompt-caching.md.
Automatic Caching (Recommended)
Use top-level cache_control to automatically cache the last cacheable block in the request:
const response = await client.messages.create({
model: "{{OPUS_ID}}",
max_tokens: 16000,
cache_control: { type: "ephemeral" }, // auto-caches the last cacheable block
system: "You are an expert on this large document...",
messages: [{ role: "user", content: "Summarize the key points" }],
});
Manual Cache Control
For fine-grained control, add cache_control to specific content blocks:
const response = await client.messages.create({
model: "{{OPUS_ID}}",
max_tokens: 16000,
system: [
{
type: "text",
text: "You are an expert on this large document...",
cache_control: { type: "ephemeral" }, // default TTL is 5 minutes
},
],
messages: [{ role: "user", content: "Summarize the key points" }],
});
// With explicit TTL (time-to-live)
const response2 = await client.messages.create({
model: "{{OPUS_ID}}",
max_tokens: 16000,
system: [
{
type: "text",
text: "You are an expert on this large document...",
cache_control: { type: "ephemeral", ttl: "1h" }, // 1 hour TTL
},
],
messages: [{ role: "user", content: "Summarize the key points" }],
});
Verifying Cache Hits
console.log(response.usage.cache_creation_input_tokens); // tokens written to cache (~1.25x cost)
console.log(response.usage.cache_read_input_tokens); // tokens served from cache (~0.1x cost)
console.log(response.usage.input_tokens); // uncached tokens (full cost)
If cache_read_input_tokens is zero across repeated identical-prefix requests, a silent invalidator is at work — Date.now() or a UUID in the system prompt, non-deterministic key ordering, or a varying tool set. See shared/prompt-caching.md for the full audit table.
Extended Thinking
Fable 5, Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6: Use adaptive thinking.
budget_tokensis removed on Fable 5, Opus 4.8, and 4.7 (400 if sent); deprecated on Opus 4.6 and Sonnet 4.6. Older models: Usethinking: {type: "enabled", budget_tokens: N}(must be <max_tokens, min 1024).
// Fable 5 / Opus 4.8 / 4.7 / 4.6: adaptive thinking (recommended)
const response = await client.messages.create({
model: "{{OPUS_ID}}",
max_tokens: 16000,
thinking: { type: "adaptive", display: "summarized" }, // display opt-in: default is omitted (empty thinking text) on Fable 5 / Mythos 5 / Opus 4.8 / 4.7
output_config: { effort: "high" }, // low | medium | high | max
messages: [
{ role: "user", content: "Solve this math problem step by step..." },
],
});
for (const block of response.content) {
if (block.type === "thinking") {
console.log("Thinking:", block.thinking);
} else if (block.type === "text") {
console.log("Response:", block.text);
}
}
Error Handling
Use the SDK's typed exception classes — never check error messages with string matching:
import Anthropic from "@anthropic-ai/sdk";
try {
const response = await client.messages.create({...});
} catch (error) {
if (error instanceof Anthropic.BadRequestError) {
console.error("Bad request:", error.message);
} else if (error instanceof Anthropic.AuthenticationError) {
console.error("Invalid API key");
} else if (error instanceof Anthropic.RateLimitError) {
console.error("Rate limited - retry later");
} else if (error instanceof Anthropic.APIError) {
console.error(`API error ${error.status}:`, error.message);
}
}
All classes extend Anthropic.APIError with a typed status field. Check from most specific to least specific. See shared/error-codes.md for the full error code reference.
Multi-Turn Conversations
The API is stateless — send the full conversation history each time. Use Anthropic.MessageParam[] to type the messages array:
const messages: Anthropic.MessageParam[] = [
{ role: "user", content: "My name is Alice." },
{ role: "assistant", content: "Hello Alice! Nice to meet you." },
{ role: "user", content: "What's my name?" },
];
const response = await client.messages.create({
model: "{{OPUS_ID}}",
max_tokens: 16000,
messages: messages,
});
Rules:
- Consecutive same-role messages are allowed — the API combines them into a single turn
- First message must be
user - Use SDK types (
Anthropic.MessageParam,Anthropic.Message,Anthropic.Tool, etc.) for all API data structures — don't redefine equivalent interfaces
Compaction (long conversations)
Beta, Fable 5, Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6. When conversations approach the 200K context window, compaction automatically summarizes earlier context server-side. The API returns a
compactionblock; you must pass it back on subsequent requests — appendresponse.content, not just the text.
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const messages: Anthropic.Beta.BetaMessageParam[] = [];
async function chat(userMessage: string): Promise<string> {
messages.push({ role: "user", content: userMessage });
const response = await client.beta.messages.create({
betas: ["compact-2026-01-12"],
model: "{{OPUS_ID}}",
max_tokens: 16000,
messages,
context_management: {
edits: [{ type: "compact_20260112" }],
},
});
// Append full content — compaction blocks must be preserved
messages.push({ role: "assistant", content: response.content });
const textBlock = response.content.find(
(b): b is Anthropic.Beta.BetaTextBlock => b.type === "text",
);
return textBlock?.text ?? "";
}
// Compaction triggers automatically when context grows large
console.log(await chat("Help me build a Python web scraper"));
console.log(await chat("Add support for JavaScript-rendered pages"));
console.log(await chat("Now add rate limiting and error handling"));
Stop Reasons
The stop_reason field in the response indicates why the model stopped generating:
| Value | Meaning |
|---|---|
end_turn |
Claude finished its response naturally |
max_tokens |
Hit the max_tokens limit — increase it or use streaming |
stop_sequence |
Hit a custom stop sequence |
tool_use |
Claude wants to call a tool — execute it and continue |
pause_turn |
Model paused and can be resumed (agentic flows) |
refusal |
Claude refused for safety reasons — check stop_details |
Structured Stop Details
When stop_reason is "refusal", the response includes a stop_details object with structured information about the refusal:
if (response.stop_reason === "refusal" && response.stop_details) {
console.log(`Category: ${response.stop_details.category}`); // e.g. "cyber", "bio", "reasoning_extraction", "frontier_llm", or null — see docs for the full set
console.log(`Explanation: ${response.stop_details.explanation}`);
}
Refusal Fallbacks ({{FABLE_NAME}}) — opt in by default
Fallbacks are opt-in: without them a refused request simply stops. Include the server-side fallbacks parameter in {{FABLE_ID}} code by default — on a policy decline the API re-runs the same request on the fallback model inside the same call. A decline before any output isn't billed (a mid-stream decline bills the streamed partial); the rescue bills at the fallback model's own rates, with cache repricing applied automatically.
const response = await client.beta.messages.create({
model: "{{FABLE_ID}}",
max_tokens: 16000,
betas: ["server-side-fallback-2026-06-01"],
fallbacks: [{ model: "{{OPUS_ID}}" }],
messages: [{ role: "user", content: "..." }],
});
// Switch points: one fallback block per model that ran and declined this turn
for (const block of response.content) {
if (block.type === "fallback") {
console.log(`${block.from.model} declined; ${block.to.model} continued`);
}
}
// Served-by signal — covers sticky turns, which carry no fallback block.
// Pair with stop_reason: the fallback model can itself refuse.
const fallbackRan = (response.usage.iterations ?? []).some(
(entry) => entry.type === "fallback_message",
);
if (fallbackRan && response.stop_reason !== "refusal") {
console.log(`Served by ${response.model}`);
}
A stop_reason: "refusal" on the final response means the whole chain refused. The header must be exactly server-side-fallback-2026-06-01; the parameter is rejected on the Batches API and unavailable on Amazon Bedrock, Vertex AI, and Microsoft Foundry — register the client-side betaRefusalFallbackMiddleware on the client there instead. Full semantics (sticky routing, billing, streaming, echoing fallback turns back): shared/model-migration.md → Migrating to {{FABLE_NAME}} → refusal stop reason.
Cost Optimization Strategies
1. Use Prompt Caching for Repeated Context
// Automatic caching (simplest — caches the last cacheable block)
const response = await client.messages.create({
model: "{{OPUS_ID}}",
max_tokens: 16000,
cache_control: { type: "ephemeral" },
system: largeDocumentText, // e.g., 50KB of context
messages: [{ role: "user", content: "Summarize the key points" }],
});
// First request: full cost
// Subsequent requests: ~90% cheaper for cached portion
2. Use Token Counting Before Requests
const countResponse = await client.messages.countTokens({
model: "{{OPUS_ID}}",
messages: messages,
system: system,
});
const estimatedInputCost = countResponse.input_tokens * 0.000005; // $5/1M tokens
console.log(`Estimated input cost: $${estimatedInputCost.toFixed(4)}`);