Error Handling for AI Systems

Tech Buddy June 12, 2026 3 min read
Error Handling for AI Systems

AI applications fail in ways that traditional software doesn't. Your database rarely decides to rate-limit you. Your file system doesn't return a 529 status code because it's overloaded. But LLM APIs do all of this — and your production code needs to handle it gracefully.

This lesson covers Python's exception system, custom exception types, retry strategies, and the defensive programming patterns that prevent a transient API hiccup from cascading into a user-facing failure.

Python Exceptions vs C# Exceptions

The mental model is similar to C#, but the syntax and hierarchy differ. Python uses try/except/else/finally instead of try/catch/finally. The key difference: Python's except can catch multiple exception types in a tuple, and the else block runs only if no exception was raised.

C#
try
                          {
                              var response = await client
                                  .Messages.CreateAsync(request);
                              ProcessResponse(response);
                          }
                          catch (RateLimitException ex)
                          {
                              logger.LogWarning("Rate limited: {msg}", ex.Message);
                              await Task.Delay(1000);
                          }
                          catch (Exception ex)
                          {
                              logger.LogError(ex, "Unexpected error");
                              throw;
                          }
                          finally
                          {
                              // always runs
                              metrics.RecordApiCall();
                          }
Python
import anthropic
                          
                          try:
                              response = client.messages.create(**params)
                              # else block runs if NO exception:
                          except anthropic.RateLimitError as e:
                              logger.warning(f"Rate limited: {e}")
                              time.sleep(1)
                          except anthropic.APIError as e:
                              logger.error(f"API error: {e}")
                              raise
                          except Exception as e:
                              logger.error(f"Unexpected: {e}")
                              raise
                          else:
                              process_response(response)  # success path
                          finally:
                              metrics.record_api_call()   # always runs

Anthropic SDK Exception Hierarchy

The Anthropic Python SDK has a structured exception hierarchy. Knowing which exceptions map to which conditions lets you handle them precisely:

ExceptionHTTP StatusCauseAction
AuthenticationError401Invalid API keyFail fast, alert ops
PermissionDeniedError403No access to modelFail fast
NotFoundError404Model doesn't existFail fast, fix config
RateLimitError429Quota exceededRetry with backoff
InternalServerError500Anthropic server errorRetry with backoff
APIConnectionErrorN/ANetwork failureRetry with backoff
APIStatusError4xx/5xxBase for status errorsCheck status code
import anthropic
                          import logging
                          
                          logger = logging.getLogger(__name__)
                          
                          def call_claude(prompt: str, client: anthropic.Anthropic) -> str:
                              """Call Claude with proper exception handling."""
                              try:
                                  response = client.messages.create(
                                      model="claude-3-5-sonnet-20241022",
                                      max_tokens=1024,
                                      messages=[{"role": "user", "content": prompt}],
                                  )
                                  return response.content[0].text
                          
                              except anthropic.AuthenticationError:
                                  # Don't retry — this is a configuration error
                                  logger.critical("Invalid API key — check ANTHROPIC_API_KEY")
                                  raise SystemExit(1)
                          
                              except anthropic.RateLimitError as e:
                                  logger.warning(f"Rate limited by Anthropic API: {e}")
                                  raise  # Let the retry layer handle this
                          
                              except anthropic.InternalServerError as e:
                                  logger.error(f"Anthropic server error (HTTP {e.status_code}): {e}")
                                  raise  # Retryable
                          
                              except anthropic.APIConnectionError as e:
                                  logger.error(f"Network error connecting to Anthropic: {e}")
                                  raise  # Retryable
                          
                              except anthropic.APIError as e:
                                  # Catch-all for other SDK errors
                                  logger.error(f"Anthropic API error: {e}")
                                  raise

Custom Exception Types

Define your own exception hierarchy for AI application-level errors. This lets calling code distinguish between infrastructure errors (API failures) and domain errors (invalid prompt, failed validation, exceeded budget).

# src/exceptions.py
                          
                          class AIApplicationError(Exception):
                              """Base exception for all application-level AI errors."""
                              pass
                          
                          
                          class PromptValidationError(AIApplicationError):
                              """Raised when a prompt fails validation before sending to the API."""
                              def __init__(self, message: str, prompt_length: int):
                                  super().__init__(message)
                                  self.prompt_length = prompt_length
                          
                          
                          class ResponseParseError(AIApplicationError):
                              """Raised when an LLM response can't be parsed into the expected structure."""
                              def __init__(self, message: str, raw_response: str):
                                  super().__init__(message)
                                  self.raw_response = raw_response
                          
                          
                          class BudgetExceededError(AIApplicationError):
                              """Raised when a request would exceed the configured token budget."""
                              def __init__(self, estimated_cost: float, budget_limit: float):
                                  super().__init__(
                                      f"Estimated cost ${estimated_cost:.4f} exceeds budget ${budget_limit:.4f}"
                                  )
                                  self.estimated_cost = estimated_cost
                                  self.budget_limit = budget_limit
                          
                          
                          # Usage:
                          def validate_prompt(prompt: str, max_chars: int = 50_000) -> None:
                              if not prompt.strip():
                                  raise PromptValidationError("Prompt cannot be empty", len(prompt))
                              if len(prompt) > max_chars:
                                  raise PromptValidationError(
                                      f"Prompt too long: {len(prompt)} > {max_chars}",
                                      len(prompt),
                                  )

Retry Logic with Exponential Backoff

Naive retries hammer a rate-limited API. Exponential backoff with jitter — increasing wait time between attempts, with a random offset — is the production standard. The tenacity library implements this elegantly:

pip install tenacity
import time
                          import random
                          import anthropic
                          from tenacity import (
                              retry,
                              stop_after_attempt,
                              wait_exponential,
                              wait_random,
                              retry_if_exception_type,
                              before_sleep_log,
                          )
                          import logging
                          
                          logger = logging.getLogger(__name__)
                          
                          
                          # Decorator-based retry — clean and reusable
                          @retry(
                              retry=retry_if_exception_type((
                                  anthropic.RateLimitError,
                                  anthropic.InternalServerError,
                                  anthropic.APIConnectionError,
                              )),
                              wait=wait_exponential(multiplier=1, min=2, max=60) + wait_random(0, 2),
                              stop=stop_after_attempt(5),
                              before_sleep=before_sleep_log(logger, logging.WARNING),
                              reraise=True,  # re-raise the original exception after all retries fail
                          )
                          def call_with_retry(client: anthropic.Anthropic, **kwargs) -> anthropic.Message:
                              """Call the Anthropic API with automatic retry on transient errors."""
                              return client.messages.create(**kwargs)

Manual Retry Implementation (Without tenacity)

If you'd rather not add a dependency, here's the pattern implemented manually — useful for understanding what's happening under the hood:

import time
                          import math
                          import random
                          
                          def call_with_manual_retry(
                              client: anthropic.Anthropic,
                              max_attempts: int = 5,
                              base_delay: float = 1.0,
                              max_delay: float = 60.0,
                              **kwargs,
                          ) -> anthropic.Message:
                              """Exponential backoff with jitter."""
                              retryable = (
                                  anthropic.RateLimitError,
                                  anthropic.InternalServerError,
                                  anthropic.APIConnectionError,
                              )
                          
                              for attempt in range(max_attempts):
                                  try:
                                      return client.messages.create(**kwargs)
                          
                                  except retryable as e:
                                      if attempt == max_attempts - 1:
                                          raise  # Last attempt — give up
                          
                                      # Exponential backoff: 1s, 2s, 4s, 8s... capped at max_delay
                                      delay = min(base_delay * (2 ** attempt), max_delay)
                                      # Jitter: random 0–30% offset prevents thundering herd
                                      jitter = delay * random.uniform(0, 0.3)
                                      total_delay = delay + jitter
                          
                                      logger.warning(
                                          f"Attempt {attempt + 1} failed ({type(e).__name__}). "
                                          f"Retrying in {total_delay:.1f}s..."
                                      )
                                      time.sleep(total_delay)

The try/except/else/finally Pattern

Python's else clause on a try block is underused but valuable. It runs only when no exception was raised, which lets you separate the success path from the error path clearly:

def process_ai_request(prompt: str, client: anthropic.Anthropic) -> dict:
                              result = {}
                          
                              try:
                                  response = call_with_retry(
                                      client,
                                      model="claude-3-5-sonnet-20241022",
                                      max_tokens=512,
                                      messages=[{"role": "user", "content": prompt}],
                                  )
                              except anthropic.AuthenticationError:
                                  # Unrecoverable
                                  return {"error": "auth_error", "text": None}
                              except anthropic.APIError as e:
                                  logger.error(f"All retries failed: {e}")
                                  return {"error": "api_error", "text": None}
                              else:
                                  # Only runs if no exception — clean separation of success logic
                                  result["text"] = response.content[0].text
                                  result["tokens"] = response.usage.output_tokens
                                  result["error"] = None
                              finally:
                                  # Always runs — logging, metrics, cleanup
                                  logger.debug(f"Request completed for prompt[:50]: {prompt[:50]}...")
                          
                              return result

Context Managers for Resource Safety

Python's with statement (like C#'s using) ensures resources are released even if exceptions occur. For AI apps, this means database connections, file handles, and HTTP client sessions are always properly closed:

import anthropic
                          from pathlib import Path
                          
                          # Context manager for file-safe prompt loading
                          def load_and_run_prompt(prompt_file: Path, client: anthropic.Anthropic) -> str:
                              try:
                                  with open(prompt_file, encoding="utf-8") as f:
                                      prompt = f.read()  # file auto-closed even if read() raises
                              except FileNotFoundError:
                                  raise AIApplicationError(f"Prompt file not found: {prompt_file}")
                          
                              # The Anthropic client itself supports use as a context manager
                              with anthropic.Anthropic() as local_client:
                                  response = local_client.messages.create(
                                      model="claude-3-5-sonnet-20241022",
                                      max_tokens=512,
                                      messages=[{"role": "user", "content": prompt}],
                                  )
                          
                              return response.content[0].text  # client properly closed

Defensive Programming: Validation Before the Call

The cheapest errors to handle are the ones you catch before making the API call. Validate inputs early and fail with informative errors rather than getting cryptic API responses:

from src.exceptions import PromptValidationError
                          
                          SUPPORTED_MODELS = {
                              "claude-3-5-sonnet-20241022",
                              "claude-3-opus-20240229",
                              "claude-3-haiku-20240307",
                          }
                          
                          def validated_create(
                              client: anthropic.Anthropic,
                              model: str,
                              messages: list[dict],
                              max_tokens: int,
                              **kwargs,
                          ) -> anthropic.Message:
                              """Validates parameters before hitting the API."""
                          
                              if model not in SUPPORTED_MODELS:
                                  raise ValueError(f"Unknown model '{model}'. Choose from: {SUPPORTED_MODELS}")
                          
                              if not messages:
                                  raise PromptValidationError("messages cannot be empty", 0)
                          
                              if max_tokens < 1 or max_tokens > 8192:
                                  raise ValueError(f"max_tokens must be 1–8192, got {max_tokens}")
                          
                              last_role = messages[-1].get("role")
                              if last_role != "user":
                                  raise PromptValidationError(
                                      f"Last message must have role 'user', got '{last_role}'",
                                      len(str(messages)),
                                  )
                          
                              return client.messages.create(
                                  model=model, messages=messages, max_tokens=max_tokens, **kwargs
                              )

Key Takeaways

  • Python uses try/except/else/finally — the else block only runs on success, keeping error and success paths clearly separated
  • Catch specific exceptions first, broad ones last — never use bare except: without re-raising
  • Distinguish retryable errors (rate limits, server errors, network) from non-retryable (auth errors, validation failures)
  • Use exponential backoff with jitter for retries — the tenacity library makes this a one-decorator solution
  • Define a custom exception hierarchy rooted in your app's base exception to distinguish infrastructure from domain errors
  • Validate early — check inputs before the API call to catch errors cheaply and provide useful error messages