Streaming Response Handling
Streaming responses allow users to see AI replies faster, improving user experience by displaying content as it's generated rather than waiting for the complete response.
Python Streaming Response Example
from openai import OpenAI
import json
client = OpenAI(
api_key="your-api-key",
base_url="https://ai.machinefi.com/v1"
)
def stream_with_callback(prompt, on_content=None):
"""
Stream response handling with callback function support
"""
stream = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": prompt}],
stream=True
)
full_content = ""
for chunk in stream:
if chunk.choices[0].delta.content is not None:
content = chunk.choices[0].delta.content
full_content += content
# Call callback function to process each chunk
if on_content:
on_content(content)
return full_content
# Usage example
def print_content(content):
print(content, end="", flush=True)
# Stream output to console
result = stream_with_callback("Write a Python quicksort algorithm", print_content)JavaScript Streaming Response
Benefits of Streaming Responses
Faster perceived response time: Users see content immediately as it's generated
Better user experience: Reduces waiting time and provides real-time feedback
Interruptible responses: Can stop generation early if needed
Ideal for chat applications: Perfect for conversational interfaces and real-time generation scenarios
Implementation Considerations
Error Handling: Implement proper error handling for network interruptions during streaming
Rate Limiting: Consider rate limits when processing streaming chunks
Memory Management: For very long responses, consider chunking and buffering strategies
UI Updates: Ensure smooth UI updates without blocking the main thread
Cancellation: Implement ability to cancel ongoing streams when needed

