Rate Limits & Caching

Understand API rate limits, caching behavior, and how to optimize your FetchPrompt API usage.

Rate Limits & Caching

FetchPrompt includes rate limiting to ensure fair usage and caching to minimize latency. Understanding these mechanisms helps you build efficient integrations.

Rate limits

API calls are rate-limited per organization per month. All API keys belonging to the same organization share a single monthly quota.

PlanMonthly Limit
Free30,000 calls/month
Pro (coming soon)300,000 calls/month
Business (coming soon)1,500,000 calls/month
Enterprise (coming soon)Unlimited

How rate limiting works

  • The counter tracks total API calls across all prompts, all environments, and all API keys within an organization.
  • The counter resets 30 days after your organization was created, and every 30 days thereafter.
  • Each successful API response (including 304 Not Modified) counts as one call.

Rate limit headers

Every API response includes rate limit headers:

HeaderDescriptionExample
X-RateLimit-LimitMaximum calls allowed per month30000
X-RateLimit-RemainingCalls remaining this month28742
X-RateLimit-ResetUnix timestamp when the limit resets1708992000

When the limit is exceeded

If your organization exceeds the monthly limit, the API returns:

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 30000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1708992000

{
  "error": "Rate limit exceeded"
}

Monitoring usage

You can monitor your current usage in two places:

  1. API Keys page — The usage tab shows this month's call count, monthly limit, and remaining calls.
  2. Rate limit headers — Every API response includes X-RateLimit-Remaining so your application can track usage programmatically.

Caching

FetchPrompt uses a two-layer caching strategy to minimize latency.

Server-side cache (Redis)

When a prompt is fetched via the API:

  1. FetchPrompt first checks a Redis cache for the prompt content.
  2. If found (cache hit), the cached content is returned immediately.
  3. If not found (cache miss), the prompt is fetched from the database and written to the cache.

Cache entries have a 60-second TTL (time-to-live). This means:

  • After you update a prompt in the dashboard, the API may serve the previous version for up to 60 seconds.
  • After 60 seconds, the cache expires and the next request fetches fresh data from the database.

API key validation results are also cached with a 5-minute TTL for performance.

Client-side cache (ETag)

Both the GET and POST /api/v1/prompts/{slug} endpoints support ETag-based conditional requests:

  1. Every response includes an ETag header (a hash of the rendered content).
  2. On subsequent requests, include the ETag in an If-None-Match header.
  3. If the content hasn't changed, the API returns 304 Not Modified with no body, saving bandwidth.
# First request — get the ETag
curl -i -H "Authorization: Bearer fp_prod_xxx" \
  https://www.fetchprompt.com/api/v1/prompts/my-prompt

# Response includes:
# ETag: "a1b2c3d4e5f67890"
# Cache-Control: public, max-age=60

# Subsequent request — use the ETag
curl -H "Authorization: Bearer fp_prod_xxx" \
  -H 'If-None-Match: "a1b2c3d4e5f67890"' \
  https://www.fetchprompt.com/api/v1/prompts/my-prompt

# Returns 304 Not Modified if content hasn't changed

The Cache-Control header is set to public, max-age=60, which allows intermediate caches (CDNs, proxies) to cache the response for 60 seconds.

Cache invalidation

When you update a prompt through the dashboard:

  • The server-side Redis cache for that prompt is immediately invalidated.
  • However, if a cached version was already served to a client within the 60-second TTL window, that client may continue using the stale version until it expires or makes a new request.

Optimizing API usage

1. Cache on your side

If your application serves the same prompt to many users, cache the prompt content in your application for a reasonable duration:

let cachedPrompt: { content: string; etag: string } | null = null;

async function getPrompt(slug: string) {
  const headers: Record<string, string> = {
    Authorization: `Bearer ${process.env.FETCHPROMPT_API_KEY}`,
  };

  // Use ETag for conditional fetch
  if (cachedPrompt?.etag) {
    headers["If-None-Match"] = cachedPrompt.etag;
  }

  const response = await fetch(
    `https://www.fetchprompt.com/api/v1/prompts/${slug}`,
    { headers }
  );

  if (response.status === 304) {
    return cachedPrompt!.content; // Content hasn't changed
  }

  const data = await response.json();
  cachedPrompt = {
    content: data.content,
    etag: response.headers.get("etag") || "",
  };
  return data.content;
}

2. Fetch prompts at startup

For prompts that don't change frequently, fetch them once at application startup and refresh on a schedule:

// Fetch prompt once at startup
const systemPrompt = await getPrompt("system-instructions");

// Refresh every 5 minutes
setInterval(async () => {
  systemPrompt = await getPrompt("system-instructions");
}, 5 * 60 * 1000);

3. Monitor rate limit headers

Proactively check the X-RateLimit-Remaining header to avoid hitting the limit:

const response = await fetch(url, { headers });
const remaining = parseInt(response.headers.get("X-RateLimit-Remaining") || "0");

if (remaining < 1000) {
  console.warn(`FetchPrompt rate limit running low: ${remaining} calls remaining`);
}

4. Use POST for variable-heavy prompts

Both GET and POST support ETag caching. Choose the right method based on your use case:

  • GET — Fewer variables, simple values, variables passed as query parameters
  • POST — Many variables, complex values, no URL encoding needed