Rate Limits & Quotas

Understanding IIO API limits, headers, and best practices

Tier Limits

Tier Requests/Month Requests/Min Max Tokens/Request Models Price
Free 1,000 10 2,048 qwen2.5:7b €0
Pro 50,000 100 8,192 All models €29/Mo
Enterprise Unlimited 500+ 128,000 All + dedicated €149+/Mo
No per-token charges on any tier — flat monthly fee only. All inference runs locally on Hetzner DE/FI.

Rate Limit Headers

Every response includes these headers:

HeaderDescription
X-RateLimit-LimitYour monthly request limit
X-RateLimit-RemainingRemaining requests this month
X-RateLimit-ResetUnix timestamp when limit resets
X-RateLimit-BurstRemaining burst requests this minute
Retry-AfterSeconds to wait when rate limited (429)
# Check your limits
curl -I https://api.iio.space/v1/models \
  -H "Authorization: Bearer YOUR_KEY"

# Response headers:
# X-RateLimit-Limit: 50000
# X-RateLimit-Remaining: 49876
# X-RateLimit-Reset: 1748736000
# X-RateLimit-Burst: 98

Handling Rate Limit Errors (429)

import time
import random

def call_with_backoff(client, max_retries=3, **kwargs):
    """Exponential backoff for rate limit errors."""
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(**kwargs)
        except Exception as e:
            if '429' in str(e) and attempt < max_retries - 1:
                wait = (2 ** attempt) + random.uniform(0, 1)
                print(f"Rate limited. Waiting {wait:.1f}s...")
                time.sleep(wait)
            else:
                raise
    raise RuntimeError("Max retries exceeded")

Best Practices

PracticeWhy
Cache responses where possibleReduce quota usage for identical queries
Use streaming for long responsesBetter UX, same quota cost
Choose the smallest sufficient modelllama3.2:3b is 3x faster than qwen2.5:7b for simple tasks
Batch requests where possibleReduce per-request overhead
Monitor X-RateLimit-RemainingAvoid unexpected 429 errors
Set max_tokens explicitlyPrevent unexpectedly long/costly responses
⚠️ Free tier requests that exceed limits return 429. Upgrade to Pro for 50x more capacity.
Quick Reference · Troubleshooting 429 · Upgrade Tier