Streaming Response Handling

Streaming responses allow users to see AI replies faster, improving user experience by displaying content as it's generated rather than waiting for the complete response.

Python Streaming Response Example

from openai import OpenAI
import json

client = OpenAI(
    api_key="your-api-key",
    base_url="https://ai.machinefi.com/v1"
)

def stream_with_callback(prompt, on_content=None):
    """
    Stream response handling with callback function support
    """
    stream = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": prompt}],
        stream=True
    )
    
    full_content = ""
    for chunk in stream:
        if chunk.choices[0].delta.content is not None:
            content = chunk.choices[0].delta.content
            full_content += content
            
            # Call callback function to process each chunk
            if on_content:
                on_content(content)
    
    return full_content

# Usage example
def print_content(content):
    print(content, end="", flush=True)

# Stream output to console
result = stream_with_callback("Write a Python quicksort algorithm", print_content)

JavaScript Streaming Response

async function streamChat(prompt, onChunk) {
    const response = await fetch('https://ai.machinefi.com/v1/chat/completions', {
        method: 'POST',
        headers: {
            'Authorization': 'Bearer your-api-key',
            'Content-Type': 'application/json'
        },
        body: JSON.stringify({
            model: 'gpt-3.5-turbo',
            messages: [{ role: 'user', content: prompt }],
            stream: true
        })
    });

    const reader = response.body.getReader();
    const decoder = new TextDecoder();

    while (true) {
        const { value, done } = await reader.read();
        if (done) break;

        const chunk = decoder.decode(value);
        const lines = chunk.split('\n');

        for (const line of lines) {
            if (line.startsWith('data: ') && line !== 'data: [DONE]') {
                try {
                    const data = JSON.parse(line.slice(6));
                    const content = data.choices[0]?.delta?.content;
                    if (content) {
                        onChunk(content);
                    }
                } catch (e) {
                    // Ignore parsing errors
                }
            }
        }
    }
}

// Usage example
streamChat('Explain what machine learning is', (content) => {
    document.getElementById('output').textContent += content;
});

Benefits of Streaming Responses

Faster perceived response time: Users see content immediately as it's generated
Better user experience: Reduces waiting time and provides real-time feedback
Interruptible responses: Can stop generation early if needed
Ideal for chat applications: Perfect for conversational interfaces and real-time generation scenarios

Implementation Considerations

Error Handling: Implement proper error handling for network interruptions during streaming
Rate Limiting: Consider rate limits when processing streaming chunks
Memory Management: For very long responses, consider chunking and buffering strategies
UI Updates: Ensure smooth UI updates without blocking the main thread
Cancellation: Implement ability to cancel ongoing streams when needed

Previousprompt engineering NextAPI Usage