Rate Limits & Caching
Understand API rate limits, caching behavior, and how to optimize your FetchPrompt API usage.
Rate Limits & Caching
FetchPrompt includes rate limiting to ensure fair usage and caching to minimize latency. Understanding these mechanisms helps you build efficient integrations.
Rate limits
API calls are rate-limited per organization per month. All API keys belonging to the same organization share a single monthly quota.
| Plan | Monthly Limit |
|---|---|
| Free | 30,000 calls/month |
| Pro (coming soon) | 300,000 calls/month |
| Business (coming soon) | 1,500,000 calls/month |
| Enterprise (coming soon) | Unlimited |
How rate limiting works
- The counter tracks total API calls across all prompts, all environments, and all API keys within an organization.
- The counter resets 30 days after your organization was created, and every 30 days thereafter.
- Each successful API response (including
304 Not Modified) counts as one call.
Rate limit headers
Every API response includes rate limit headers:
| Header | Description | Example |
|---|---|---|
X-RateLimit-Limit | Maximum calls allowed per month | 30000 |
X-RateLimit-Remaining | Calls remaining this month | 28742 |
X-RateLimit-Reset | Unix timestamp when the limit resets | 1708992000 |
When the limit is exceeded
If your organization exceeds the monthly limit, the API returns:
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 30000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1708992000
{
"error": "Rate limit exceeded"
}Monitoring usage
You can monitor your current usage in two places:
- API Keys page — The usage tab shows this month's call count, monthly limit, and remaining calls.
- Rate limit headers — Every API response includes
X-RateLimit-Remainingso your application can track usage programmatically.
Caching
FetchPrompt uses a two-layer caching strategy to minimize latency.
Server-side cache (Redis)
When a prompt is fetched via the API:
- FetchPrompt first checks a Redis cache for the prompt content.
- If found (cache hit), the cached content is returned immediately.
- If not found (cache miss), the prompt is fetched from the database and written to the cache.
Cache entries have a 60-second TTL (time-to-live). This means:
- After you update a prompt in the dashboard, the API may serve the previous version for up to 60 seconds.
- After 60 seconds, the cache expires and the next request fetches fresh data from the database.
API key validation results are also cached with a 5-minute TTL for performance.
Client-side cache (ETag)
Both the GET and POST /api/v1/prompts/{slug} endpoints support ETag-based conditional requests:
- Every response includes an
ETagheader (a hash of the rendered content). - On subsequent requests, include the ETag in an
If-None-Matchheader. - If the content hasn't changed, the API returns
304 Not Modifiedwith no body, saving bandwidth.
# First request — get the ETag
curl -i -H "Authorization: Bearer fp_prod_xxx" \
https://www.fetchprompt.com/api/v1/prompts/my-prompt
# Response includes:
# ETag: "a1b2c3d4e5f67890"
# Cache-Control: public, max-age=60
# Subsequent request — use the ETag
curl -H "Authorization: Bearer fp_prod_xxx" \
-H 'If-None-Match: "a1b2c3d4e5f67890"' \
https://www.fetchprompt.com/api/v1/prompts/my-prompt
# Returns 304 Not Modified if content hasn't changedThe Cache-Control header is set to public, max-age=60, which allows intermediate caches (CDNs, proxies) to cache the response for 60 seconds.
Cache invalidation
When you update a prompt through the dashboard:
- The server-side Redis cache for that prompt is immediately invalidated.
- However, if a cached version was already served to a client within the 60-second TTL window, that client may continue using the stale version until it expires or makes a new request.
Optimizing API usage
1. Cache on your side
If your application serves the same prompt to many users, cache the prompt content in your application for a reasonable duration:
let cachedPrompt: { content: string; etag: string } | null = null;
async function getPrompt(slug: string) {
const headers: Record<string, string> = {
Authorization: `Bearer ${process.env.FETCHPROMPT_API_KEY}`,
};
// Use ETag for conditional fetch
if (cachedPrompt?.etag) {
headers["If-None-Match"] = cachedPrompt.etag;
}
const response = await fetch(
`https://www.fetchprompt.com/api/v1/prompts/${slug}`,
{ headers }
);
if (response.status === 304) {
return cachedPrompt!.content; // Content hasn't changed
}
const data = await response.json();
cachedPrompt = {
content: data.content,
etag: response.headers.get("etag") || "",
};
return data.content;
}2. Fetch prompts at startup
For prompts that don't change frequently, fetch them once at application startup and refresh on a schedule:
// Fetch prompt once at startup
const systemPrompt = await getPrompt("system-instructions");
// Refresh every 5 minutes
setInterval(async () => {
systemPrompt = await getPrompt("system-instructions");
}, 5 * 60 * 1000);3. Monitor rate limit headers
Proactively check the X-RateLimit-Remaining header to avoid hitting the limit:
const response = await fetch(url, { headers });
const remaining = parseInt(response.headers.get("X-RateLimit-Remaining") || "0");
if (remaining < 1000) {
console.warn(`FetchPrompt rate limit running low: ${remaining} calls remaining`);
}4. Use POST for variable-heavy prompts
Both GET and POST support ETag caching. Choose the right method based on your use case:
- GET — Fewer variables, simple values, variables passed as query parameters
- POST — Many variables, complex values, no URL encoding needed