● Complete Learning Series Python & AI Engineering for .NET Developers

Error Handling for AI Systems

Tech Buddy June 12, 2026 3 min read

Part of Complete Learning Series Python & AI Engineering for .NET Developers

All lessons

AI applications fail in ways that traditional software doesn't. Your database rarely decides to rate-limit you. Your file system doesn't return a 529 status code because it's overloaded. But LLM APIs do all of this — and your production code needs to handle it gracefully.

This lesson covers Python's exception system, custom exception types, retry strategies, and the defensive programming patterns that prevent a transient API hiccup from cascading into a user-facing failure.

Python Exceptions vs C# Exceptions

The mental model is similar to C#, but the syntax and hierarchy differ. Python uses try/except/else/finally instead of try/catch/finally. The key difference: Python's except can catch multiple exception types in a tuple, and the else block runs only if no exception was raised.

try
                      {
                          var response = await client
                              .Messages.CreateAsync(request);
                          ProcessResponse(response);
                      }
                      catch (RateLimitException ex)
                      {
                          logger.LogWarning("Rate limited: {msg}", ex.Message);
                          await Task.Delay(1000);
                      }
                      catch (Exception ex)
                      {
                          logger.LogError(ex, "Unexpected error");
                          throw;
                      }
                      finally
                      {
                          // always runs
                          metrics.RecordApiCall();
                      }

Python

from openai import OpenAI
                      
                      try:
                          response = client.chat.completions.create(**params)
                          # else block runs if NO exception:
                      except openai.RateLimitError as e:
                          logger.warning(f"Rate limited: {e}")
                          time.sleep(1)
                      except openai.APIError as e:
                          logger.error(f"API error: {e}")
                          raise
                      except Exception as e:
                          logger.error(f"Unexpected: {e}")
                          raise
                      else:
                          process_response(response)  # success path
                      finally:
                          metrics.record_api_call()   # always runs

OpenAI SDK Exception Hierarchy

The OpenAI Python SDK has a structured exception hierarchy. Knowing which exceptions map to which conditions lets you handle them precisely:

Exception	HTTP Status	Cause	Action
`AuthenticationError`	401	Invalid API key	Fail fast, alert ops
`PermissionDeniedError`	403	No access to model	Fail fast
`NotFoundError`	404	Model doesn't exist	Fail fast, fix config
`RateLimitError`	429	Quota exceeded	Retry with backoff
`InternalServerError`	500	OpenAI server error	Retry with backoff
`APIConnectionError`	N/A	Network failure	Retry with backoff
`APIStatusError`	4xx/5xx	Base for status errors	Check status code

from openai import OpenAI
                      import logging
                      
                      logger = logging.getLogger(__name__)
                      
                      def call_claude(prompt: str, client: OpenAI) -> str:
                          """Call Claude with proper exception handling."""
                          try:
                              response = client.chat.completions.create(
                                  model="claude-3-5-sonnet-20241022",
                                  max_tokens=1024,
                                  messages=[{"role": "user", "content": prompt}],
                              )
                              return response.choices[0].message.content
                      
                          except openai.AuthenticationError:
                              # Don't retry — this is a configuration error
                              logger.critical("Invalid API key — check OPENAI_API_KEY")
                              raise SystemExit(1)
                      
                          except openai.RateLimitError as e:
                              logger.warning(f"Rate limited by OpenAI API: {e}")
                              raise  # Let the retry layer handle this
                      
                          except openai.InternalServerError as e:
                              logger.error(f"OpenAI server error (HTTP {e.status_code}): {e}")
                              raise  # Retryable
                      
                          except openai.APIConnectionError as e:
                              logger.error(f"Network error connecting to OpenAI: {e}")
                              raise  # Retryable
                      
                          except openai.APIError as e:
                              # Catch-all for other SDK errors
                              logger.error(f"OpenAI API error: {e}")
                              raise

Custom Exception Types

Define your own exception hierarchy for AI application-level errors. This lets calling code distinguish between infrastructure errors (API failures) and domain errors (invalid prompt, failed validation, exceeded budget).

# src/exceptions.py
                      
                      class AIApplicationError(Exception):
                          """Base exception for all application-level AI errors."""
                          pass
                      
                      
                      class PromptValidationError(AIApplicationError):
                          """Raised when a prompt fails validation before sending to the API."""
                          def __init__(self, message: str, prompt_length: int):
                              super().__init__(message)
                              self.prompt_length = prompt_length
                      
                      
                      class ResponseParseError(AIApplicationError):
                          """Raised when an LLM response can't be parsed into the expected structure."""
                          def __init__(self, message: str, raw_response: str):
                              super().__init__(message)
                              self.raw_response = raw_response
                      
                      
                      class BudgetExceededError(AIApplicationError):
                          """Raised when a request would exceed the configured token budget."""
                          def __init__(self, estimated_cost: float, budget_limit: float):
                              super().__init__(
                                  f"Estimated cost ${estimated_cost:.4f} exceeds budget ${budget_limit:.4f}"
                              )
                              self.estimated_cost = estimated_cost
                              self.budget_limit = budget_limit
                      
                      
                      # Usage:
                      def validate_prompt(prompt: str, max_chars: int = 50_000) -> None:
                          if not prompt.strip():
                              raise PromptValidationError("Prompt cannot be empty", len(prompt))
                          if len(prompt) > max_chars:
                              raise PromptValidationError(
                                  f"Prompt too long: {len(prompt)} > {max_chars}",
                                  len(prompt),
                              )

Retry Logic with Exponential Backoff

Naive retries hammer a rate-limited API. Exponential backoff with jitter — increasing wait time between attempts, with a random offset — is the production standard. The tenacity library implements this elegantly:

pip install tenacity

import time
                      import random
                      from openai import OpenAI
                      from tenacity import (
                          retry,
                          stop_after_attempt,
                          wait_exponential,
                          wait_random,
                          retry_if_exception_type,
                          before_sleep_log,
                      )
                      import logging
                      
                      logger = logging.getLogger(__name__)
                      
                      
                      # Decorator-based retry — clean and reusable
                      @retry(
                          retry=retry_if_exception_type((
                              openai.RateLimitError,
                              openai.InternalServerError,
                              openai.APIConnectionError,
                          )),
                          wait=wait_exponential(multiplier=1, min=2, max=60) + wait_random(0, 2),
                          stop=stop_after_attempt(5),
                          before_sleep=before_sleep_log(logger, logging.WARNING),
                          reraise=True,  # re-raise the original exception after all retries fail
                      )
                      def call_with_retry(client: OpenAI, **kwargs) -> openai.types.chat.ChatCompletion:
                          """Call the OpenAI API with automatic retry on transient errors."""
                          return client.chat.completions.create(**kwargs)

Manual Retry Implementation (Without tenacity)

If you'd rather not add a dependency, here's the pattern implemented manually — useful for understanding what's happening under the hood:

import time
                      import math
                      import random
                      
                      def call_with_manual_retry(
                          client: OpenAI,
                          max_attempts: int = 5,
                          base_delay: float = 1.0,
                          max_delay: float = 60.0,
                          **kwargs,
                      ) -> openai.types.chat.ChatCompletion:
                          """Exponential backoff with jitter."""
                          retryable = (
                              openai.RateLimitError,
                              openai.InternalServerError,
                              openai.APIConnectionError,
                          )
                      
                          for attempt in range(max_attempts):
                              try:
                                  return client.chat.completions.create(**kwargs)
                      
                              except retryable as e:
                                  if attempt == max_attempts - 1:
                                      raise  # Last attempt — give up
                      
                                  # Exponential backoff: 1s, 2s, 4s, 8s... capped at max_delay
                                  delay = min(base_delay * (2 ** attempt), max_delay)
                                  # Jitter: random 0–30% offset prevents thundering herd
                                  jitter = delay * random.uniform(0, 0.3)
                                  total_delay = delay + jitter
                      
                                  logger.warning(
                                      f"Attempt {attempt + 1} failed ({type(e).__name__}). "
                                      f"Retrying in {total_delay:.1f}s..."
                                  )
                                  time.sleep(total_delay)

🕒 Check Retry-After Headers

OpenAI's RateLimitError may include a Retry-After header specifying exactly how long to wait. Access it via e.response.headers.get("retry-after") and use that value instead of your computed backoff — it's more precise and avoids unnecessary waiting.

The try/except/else/finally Pattern

Python's else clause on a try block is underused but valuable. It runs only when no exception was raised, which lets you separate the success path from the error path clearly:

def process_ai_request(prompt: str, client: OpenAI) -> dict:
                          result = {}
                      
                          try:
                              response = call_with_retry(
                                  client,
                                  model="claude-3-5-sonnet-20241022",
                                  max_tokens=512,
                                  messages=[{"role": "user", "content": prompt}],
                              )
                          except openai.AuthenticationError:
                              # Unrecoverable
                              return {"error": "auth_error", "text": None}
                          except openai.APIError as e:
                              logger.error(f"All retries failed: {e}")
                              return {"error": "api_error", "text": None}
                          else:
                              # Only runs if no exception — clean separation of success logic
                              result["text"] = response.choices[0].message.content
                              result["tokens"] = response.usage.completion_tokens
                              result["error"] = None
                          finally:
                              # Always runs — logging, metrics, cleanup
                              logger.debug(f"Request completed for prompt[:50]: {prompt[:50]}...")
                      
                          return result

Context Managers for Resource Safety

Python's with statement (like C#'s using) ensures resources are released even if exceptions occur. For AI apps, this means database connections, file handles, and HTTP client sessions are always properly closed:

from openai import OpenAI
                      from pathlib import Path
                      
                      # Context manager for file-safe prompt loading
                      def load_and_run_prompt(prompt_file: Path, client: OpenAI) -> str:
                          try:
                              with open(prompt_file, encoding="utf-8") as f:
                                  prompt = f.read()  # file auto-closed even if read() raises
                          except FileNotFoundError:
                              raise AIApplicationError(f"Prompt file not found: {prompt_file}")
                      
                          # The OpenAI client itself supports use as a context manager
                          with OpenAI() as local_client:
                              response = local_client.chat.completions.create(
                                  model="claude-3-5-sonnet-20241022",
                                  max_tokens=512,
                                  messages=[{"role": "user", "content": prompt}],
                              )
                      
                          return response.choices[0].message.content  # client properly closed

Defensive Programming: Validation Before the Call

The cheapest errors to handle are the ones you catch before making the API call. Validate inputs early and fail with informative errors rather than getting cryptic API responses:

from src.exceptions import PromptValidationError
                      
                      SUPPORTED_MODELS = {
                          "claude-3-5-sonnet-20241022",
                          "gpt-4o-20240229",
                          "gpt-4o-mini-20240307",
                      }
                      
                      def validated_create(
                          client: OpenAI,
                          model: str,
                          messages: list[dict],
                          max_tokens: int,
                          **kwargs,
                      ) -> openai.types.chat.ChatCompletion:
                          """Validates parameters before hitting the API."""
                      
                          if model not in SUPPORTED_MODELS:
                              raise ValueError(f"Unknown model '{model}'. Choose from: {SUPPORTED_MODELS}")
                      
                          if not messages:
                              raise PromptValidationError("messages cannot be empty", 0)
                      
                          if max_tokens < 1 or max_tokens > 8192:
                              raise ValueError(f"max_tokens must be 1–8192, got {max_tokens}")
                      
                          last_role = messages[-1].get("role")
                          if last_role != "user":
                              raise PromptValidationError(
                                  f"Last message must have role 'user', got '{last_role}'",
                                  len(str(messages)),
                              )
                      
                          return client.chat.completions.create(
                              model=model, messages=messages, max_tokens=max_tokens, **kwargs
                          )

Key Takeaways

Python uses try/except/else/finally — the else block only runs on success, keeping error and success paths clearly separated
Catch specific exceptions first, broad ones last — never use bare except: without re-raising
Distinguish retryable errors (rate limits, server errors, network) from non-retryable (auth errors, validation failures)
Use exponential backoff with jitter for retries — the tenacity library makes this a one-decorator solution
Define a custom exception hierarchy rooted in your app's base exception to distinguish infrastructure from domain errors
Validate early — check inputs before the API call to catch errors cheaply and provide useful error messages