Error Handling and Retry Mechanisms

Robust error handling is essential for building reliable applications that interact with AI APIs. This guide covers comprehensive strategies for handling various error scenarios and implementing intelligent retry mechanisms.

Smart Retry Strategy

import time
import random
from typing import Callable, Any
import requests

class SmartRetryHandler:
    def __init__(self, max_retries=3, base_delay=1.0):
        self.max_retries = max_retries
        self.base_delay = base_delay
    
    def exponential_backoff_with_jitter(self, attempt):
        """Exponential backoff + random jitter"""
        delay = self.base_delay * (2 ** attempt)
        jitter = random.uniform(0, delay * 0.1)
        return delay + jitter
    
    def should_retry(self, exception, attempt):
        """Determine if retry should be attempted"""
        # Network errors, timeouts, server errors -> retry
        retryable_errors = [
            "timeout", "connection", "502", "503", "504", "429"
        ]
        
        error_str = str(exception).lower()
        return any(error in error_str for error in retryable_errors)
    
    def retry(self, func: Callable, *args, **kwargs) -> Any:
        """Execute function call with retry"""
        last_exception = None
        
        for attempt in range(self.max_retries + 1):
            try:
                return func(*args, **kwargs)
                
            except Exception as e:
                last_exception = e
                
                if attempt == self.max_retries:
                    break
                
                if not self.should_retry(e, attempt):
                    break
                
                delay = self.exponential_backoff_with_jitter(attempt)
                print(f"Attempt {attempt + 1} failed: {e}")
                print(f"Retrying in {delay:.2f} seconds...")
                time.sleep(delay)
        
        raise last_exception

# Usage example
retry_handler = SmartRetryHandler(max_retries=3, base_delay=1.0)

def api_call():
    # Simulate API call that might fail
    response = requests.post(
        'https://ai.machinefi.com/v1/chat/completions',
        headers={'Authorization': 'Bearer your-api-key'},
        json={
            'model': 'gpt-3.5-turbo',
            'messages': [{'role': 'user', 'content': 'Hello'}]
        },
        timeout=10
    )
    response.raise_for_status()
    return response.json()

try:
    result = retry_handler.retry(api_call)
    print(f"Success: {result}")
except Exception as e:
    print(f"Final failure: {e}")

Comprehensive Error Classification

Advanced Retry Mechanisms

Circuit Breaker Pattern

Adaptive Retry with Success Rate Tracking

Error Recovery Strategies

Graceful Degradation

Monitoring and Logging

Best Practices for Error Handling

  1. Classify Errors Properly: Different error types require different handling strategies

  2. Implement Exponential Backoff: Avoid overwhelming the server during outages

  3. Use Circuit Breakers: Fail fast when the service is consistently unavailable

  4. Plan for Graceful Degradation: Provide fallback options when possible

  5. Monitor and Alert: Track error rates and set up appropriate alerts

  6. Log Comprehensively: Capture enough detail for debugging without logging sensitive data

  7. Test Error Scenarios: Regularly test your error handling in staging environments

  8. Set Reasonable Timeouts: Balance between allowing enough time and failing fast

  9. Respect Rate Limits: Implement proper rate limiting to avoid 429 errors

  10. Document Error Responses: Keep clear documentation of how different errors are handled