AI applications fail in ways that traditional software doesn't. Your database rarely decides to rate-limit you. Your file system doesn't return a 529 status code because it's overloaded. But LLM APIs do all of this — and your production code needs to handle it gracefully.
This lesson covers Python's exception system, custom exception types, retry strategies, and the defensive programming patterns that prevent a transient API hiccup from cascading into a user-facing failure.
Python Exceptions vs C# Exceptions
The mental model is similar to C#, but the syntax and hierarchy differ. Python uses try/except/else/finally instead of try/catch/finally. The key difference: Python's except can catch multiple exception types in a tuple, and the else block runs only if no exception was raised.
try
{
var response = await client
.Messages.CreateAsync(request);
ProcessResponse(response);
}
catch (RateLimitException ex)
{
logger.LogWarning("Rate limited: {msg}", ex.Message);
await Task.Delay(1000);
}
catch (Exception ex)
{
logger.LogError(ex, "Unexpected error");
throw;
}
finally
{
// always runs
metrics.RecordApiCall();
}
import anthropic
try:
response = client.messages.create(**params)
# else block runs if NO exception:
except anthropic.RateLimitError as e:
logger.warning(f"Rate limited: {e}")
time.sleep(1)
except anthropic.APIError as e:
logger.error(f"API error: {e}")
raise
except Exception as e:
logger.error(f"Unexpected: {e}")
raise
else:
process_response(response) # success path
finally:
metrics.record_api_call() # always runs
Anthropic SDK Exception Hierarchy
The Anthropic Python SDK has a structured exception hierarchy. Knowing which exceptions map to which conditions lets you handle them precisely:
| Exception | HTTP Status | Cause | Action |
|---|---|---|---|
AuthenticationError | 401 | Invalid API key | Fail fast, alert ops |
PermissionDeniedError | 403 | No access to model | Fail fast |
NotFoundError | 404 | Model doesn't exist | Fail fast, fix config |
RateLimitError | 429 | Quota exceeded | Retry with backoff |
InternalServerError | 500 | Anthropic server error | Retry with backoff |
APIConnectionError | N/A | Network failure | Retry with backoff |
APIStatusError | 4xx/5xx | Base for status errors | Check status code |
import anthropic
import logging
logger = logging.getLogger(__name__)
def call_claude(prompt: str, client: anthropic.Anthropic) -> str:
"""Call Claude with proper exception handling."""
try:
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[{"role": "user", "content": prompt}],
)
return response.content[0].text
except anthropic.AuthenticationError:
# Don't retry — this is a configuration error
logger.critical("Invalid API key — check ANTHROPIC_API_KEY")
raise SystemExit(1)
except anthropic.RateLimitError as e:
logger.warning(f"Rate limited by Anthropic API: {e}")
raise # Let the retry layer handle this
except anthropic.InternalServerError as e:
logger.error(f"Anthropic server error (HTTP {e.status_code}): {e}")
raise # Retryable
except anthropic.APIConnectionError as e:
logger.error(f"Network error connecting to Anthropic: {e}")
raise # Retryable
except anthropic.APIError as e:
# Catch-all for other SDK errors
logger.error(f"Anthropic API error: {e}")
raise
Custom Exception Types
Define your own exception hierarchy for AI application-level errors. This lets calling code distinguish between infrastructure errors (API failures) and domain errors (invalid prompt, failed validation, exceeded budget).
# src/exceptions.py
class AIApplicationError(Exception):
"""Base exception for all application-level AI errors."""
pass
class PromptValidationError(AIApplicationError):
"""Raised when a prompt fails validation before sending to the API."""
def __init__(self, message: str, prompt_length: int):
super().__init__(message)
self.prompt_length = prompt_length
class ResponseParseError(AIApplicationError):
"""Raised when an LLM response can't be parsed into the expected structure."""
def __init__(self, message: str, raw_response: str):
super().__init__(message)
self.raw_response = raw_response
class BudgetExceededError(AIApplicationError):
"""Raised when a request would exceed the configured token budget."""
def __init__(self, estimated_cost: float, budget_limit: float):
super().__init__(
f"Estimated cost ${estimated_cost:.4f} exceeds budget ${budget_limit:.4f}"
)
self.estimated_cost = estimated_cost
self.budget_limit = budget_limit
# Usage:
def validate_prompt(prompt: str, max_chars: int = 50_000) -> None:
if not prompt.strip():
raise PromptValidationError("Prompt cannot be empty", len(prompt))
if len(prompt) > max_chars:
raise PromptValidationError(
f"Prompt too long: {len(prompt)} > {max_chars}",
len(prompt),
)
Retry Logic with Exponential Backoff
Naive retries hammer a rate-limited API. Exponential backoff with jitter — increasing wait time between attempts, with a random offset — is the production standard. The tenacity library implements this elegantly:
pip install tenacity
import time
import random
import anthropic
from tenacity import (
retry,
stop_after_attempt,
wait_exponential,
wait_random,
retry_if_exception_type,
before_sleep_log,
)
import logging
logger = logging.getLogger(__name__)
# Decorator-based retry — clean and reusable
@retry(
retry=retry_if_exception_type((
anthropic.RateLimitError,
anthropic.InternalServerError,
anthropic.APIConnectionError,
)),
wait=wait_exponential(multiplier=1, min=2, max=60) + wait_random(0, 2),
stop=stop_after_attempt(5),
before_sleep=before_sleep_log(logger, logging.WARNING),
reraise=True, # re-raise the original exception after all retries fail
)
def call_with_retry(client: anthropic.Anthropic, **kwargs) -> anthropic.Message:
"""Call the Anthropic API with automatic retry on transient errors."""
return client.messages.create(**kwargs)
Manual Retry Implementation (Without tenacity)
If you'd rather not add a dependency, here's the pattern implemented manually — useful for understanding what's happening under the hood:
import time
import math
import random
def call_with_manual_retry(
client: anthropic.Anthropic,
max_attempts: int = 5,
base_delay: float = 1.0,
max_delay: float = 60.0,
**kwargs,
) -> anthropic.Message:
"""Exponential backoff with jitter."""
retryable = (
anthropic.RateLimitError,
anthropic.InternalServerError,
anthropic.APIConnectionError,
)
for attempt in range(max_attempts):
try:
return client.messages.create(**kwargs)
except retryable as e:
if attempt == max_attempts - 1:
raise # Last attempt — give up
# Exponential backoff: 1s, 2s, 4s, 8s... capped at max_delay
delay = min(base_delay * (2 ** attempt), max_delay)
# Jitter: random 0–30% offset prevents thundering herd
jitter = delay * random.uniform(0, 0.3)
total_delay = delay + jitter
logger.warning(
f"Attempt {attempt + 1} failed ({type(e).__name__}). "
f"Retrying in {total_delay:.1f}s..."
)
time.sleep(total_delay)
Anthropic's RateLimitError may include a Retry-After header specifying exactly how long to wait. Access it via e.response.headers.get("retry-after") and use that value instead of your computed backoff — it's more precise and avoids unnecessary waiting.
The try/except/else/finally Pattern
Python's else clause on a try block is underused but valuable. It runs only when no exception was raised, which lets you separate the success path from the error path clearly:
def process_ai_request(prompt: str, client: anthropic.Anthropic) -> dict:
result = {}
try:
response = call_with_retry(
client,
model="claude-3-5-sonnet-20241022",
max_tokens=512,
messages=[{"role": "user", "content": prompt}],
)
except anthropic.AuthenticationError:
# Unrecoverable
return {"error": "auth_error", "text": None}
except anthropic.APIError as e:
logger.error(f"All retries failed: {e}")
return {"error": "api_error", "text": None}
else:
# Only runs if no exception — clean separation of success logic
result["text"] = response.content[0].text
result["tokens"] = response.usage.output_tokens
result["error"] = None
finally:
# Always runs — logging, metrics, cleanup
logger.debug(f"Request completed for prompt[:50]: {prompt[:50]}...")
return result
Context Managers for Resource Safety
Python's with statement (like C#'s using) ensures resources are released even if exceptions occur. For AI apps, this means database connections, file handles, and HTTP client sessions are always properly closed:
import anthropic
from pathlib import Path
# Context manager for file-safe prompt loading
def load_and_run_prompt(prompt_file: Path, client: anthropic.Anthropic) -> str:
try:
with open(prompt_file, encoding="utf-8") as f:
prompt = f.read() # file auto-closed even if read() raises
except FileNotFoundError:
raise AIApplicationError(f"Prompt file not found: {prompt_file}")
# The Anthropic client itself supports use as a context manager
with anthropic.Anthropic() as local_client:
response = local_client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=512,
messages=[{"role": "user", "content": prompt}],
)
return response.content[0].text # client properly closed
Defensive Programming: Validation Before the Call
The cheapest errors to handle are the ones you catch before making the API call. Validate inputs early and fail with informative errors rather than getting cryptic API responses:
from src.exceptions import PromptValidationError
SUPPORTED_MODELS = {
"claude-3-5-sonnet-20241022",
"claude-3-opus-20240229",
"claude-3-haiku-20240307",
}
def validated_create(
client: anthropic.Anthropic,
model: str,
messages: list[dict],
max_tokens: int,
**kwargs,
) -> anthropic.Message:
"""Validates parameters before hitting the API."""
if model not in SUPPORTED_MODELS:
raise ValueError(f"Unknown model '{model}'. Choose from: {SUPPORTED_MODELS}")
if not messages:
raise PromptValidationError("messages cannot be empty", 0)
if max_tokens < 1 or max_tokens > 8192:
raise ValueError(f"max_tokens must be 1–8192, got {max_tokens}")
last_role = messages[-1].get("role")
if last_role != "user":
raise PromptValidationError(
f"Last message must have role 'user', got '{last_role}'",
len(str(messages)),
)
return client.messages.create(
model=model, messages=messages, max_tokens=max_tokens, **kwargs
)
Key Takeaways
- Python uses
try/except/else/finally— theelseblock only runs on success, keeping error and success paths clearly separated - Catch specific exceptions first, broad ones last — never use bare
except:without re-raising - Distinguish retryable errors (rate limits, server errors, network) from non-retryable (auth errors, validation failures)
- Use exponential backoff with jitter for retries — the
tenacitylibrary makes this a one-decorator solution - Define a custom exception hierarchy rooted in your app's base exception to distinguish infrastructure from domain errors
- Validate early — check inputs before the API call to catch errors cheaply and provide useful error messages