Rate Limits & Quotas
Understanding IIO API limits, headers, and best practices
Tier Limits
| Tier |
Requests/Month |
Requests/Min |
Max Tokens/Request |
Models |
Price |
| Free |
1,000 |
10 |
2,048 |
qwen2.5:7b |
€0 |
| Pro |
50,000 |
100 |
8,192 |
All models |
€29/Mo |
| Enterprise |
Unlimited |
500+ |
128,000 |
All + dedicated |
€149+/Mo |
No per-token charges on any tier — flat monthly fee only. All inference runs locally on Hetzner DE/FI.
Rate Limit Headers
Every response includes these headers:
| Header | Description |
X-RateLimit-Limit | Your monthly request limit |
X-RateLimit-Remaining | Remaining requests this month |
X-RateLimit-Reset | Unix timestamp when limit resets |
X-RateLimit-Burst | Remaining burst requests this minute |
Retry-After | Seconds to wait when rate limited (429) |
# Check your limits
curl -I https://api.iio.space/v1/models \
-H "Authorization: Bearer YOUR_KEY"
# Response headers:
# X-RateLimit-Limit: 50000
# X-RateLimit-Remaining: 49876
# X-RateLimit-Reset: 1748736000
# X-RateLimit-Burst: 98
Handling Rate Limit Errors (429)
import time
import random
def call_with_backoff(client, max_retries=3, **kwargs):
"""Exponential backoff for rate limit errors."""
for attempt in range(max_retries):
try:
return client.chat.completions.create(**kwargs)
except Exception as e:
if '429' in str(e) and attempt < max_retries - 1:
wait = (2 ** attempt) + random.uniform(0, 1)
print(f"Rate limited. Waiting {wait:.1f}s...")
time.sleep(wait)
else:
raise
raise RuntimeError("Max retries exceeded")
Best Practices
| Practice | Why |
| Cache responses where possible | Reduce quota usage for identical queries |
| Use streaming for long responses | Better UX, same quota cost |
| Choose the smallest sufficient model | llama3.2:3b is 3x faster than qwen2.5:7b for simple tasks |
| Batch requests where possible | Reduce per-request overhead |
| Monitor X-RateLimit-Remaining | Avoid unexpected 429 errors |
| Set max_tokens explicitly | Prevent unexpectedly long/costly responses |
⚠️ Free tier requests that exceed limits return 429. Upgrade to Pro for 50x more capacity.