REST API Rate Limiting Strategies That Prevented 2.4M Requests from Crashing Our Servers

Last Tuesday at 3:47 AM, our monitoring dashboard lit up red. A misconfigured automation script was hammering our API with 14,000 requests per second. Without rate limiting in place, we would’ve burned through $8,400 in infrastructure costs before anyone woke up to kill it.
Rate limiting isn’t sexy. It doesn’t show up in product demos or fundraising decks. But here’s what I learned after implementing three different strategies across five production APIs: the difference between graceful degradation and complete service collapse often comes down to 200 lines of well-placed code.
The stakes are higher than most teams realize. According to Cloudflare’s 2023 API security report, 17% of all internet traffic consists of API calls, and misconfigured or malicious requests account for 1 in 4 outages. That’s roughly the same odds as streaming video buffering during a pivotal scene – except when your API goes down, you’re losing real money, not just viewer patience.
Why Token Bucket Beat Every Other Algorithm We Tested
Can an algorithm actually be elegant?
We tested four rate limiting approaches over six months: fixed window, sliding window, leaky bucket, and token bucket. Token bucket won decisively, handling burst traffic 3.2x better than fixed window while using 40% less memory than sliding window implementations.
Here’s how it works in practice. Imagine a bucket that holds 100 tokens. Every API request costs one token. The bucket refills at 10 tokens per second. A client can make 100 requests instantly if the bucket is full, then they’re throttled to 10 requests per second after that. This matches real-world usage patterns perfectly – users browse, click rapidly, then pause.
The math matters here. When Spotify scales its API to handle 500 million active users, they’re not just counting requests. They’re managing burst patterns from mobile apps that sync playlists, queue multiple tracks, and download album art simultaneously. Token bucket accommodates these natural usage spikes without punishing users or wasting server capacity.
We implemented our token bucket using Redis with Lua scripts to ensure atomic operations. Total setup time was four hours. The infrastructure cost increase was $0.08 per million requests – essentially free compared to the $0.40 per million cost of handling those requests without limits.
“Rate limiting is insurance you pay for in CPU cycles instead of dollars. The premium is negligible compared to the payout when things go wrong.” – Excerpt from our post-mortem after blocking a 2.4M request attack
The Hidden Cost of Generous Rate Limits
What’s the real price of being nice?
Our initial rate limits were absurdly permissive: 10,000 requests per hour per API key. We thought we were being developer-friendly. What we actually created was an invitation for abuse and a $12,000 monthly AWS bill that made our CFO ask uncomfortable questions.
The economics of API pricing mirror the broader subscription fatigue debate that DHH from 37signals has been hammering on. Companies like Google Photos and Apple’s iCloud built empires by making individual services seem cheap – $1.99 here, $9.99 there – until consumers realize they’re spending $200 monthly on software that would’ve cost $400 upfront five years ago. Your API costs work the same way for your business.
We analyzed three months of API logs and discovered 80% of our traffic came from 12 API keys. Seven of those were legitimate high-volume customers. Five were forgotten test scripts running in infinite loops on orphaned EC2 instances. One belonged to a now-defunct startup that shut down without cleaning up their cron jobs.
After implementing tiered limits based on actual usage patterns, our infrastructure costs dropped 43%. Here’s the structure that worked:
- Free tier: 1,000 requests/hour (sufficient for development and small projects)
- Standard tier: 10,000 requests/hour with burst allowance of 500 requests/minute
- Enterprise tier: 100,000 requests/hour with custom burst limits
- Emergency override: Manual approval process for temporary limit increases
The surprise benefit? Customer conversations improved. When someone hit their limit and contacted us, we had data-driven discussions about their actual needs instead of vague complaints about “slow API response times.”
Adaptive Rate Limiting Saved Us During Black Friday
Should rate limits stay static when traffic patterns don’t?
Static rate limits are like running your heating system at full blast regardless of outside temperature. Wasteful when it’s unnecessary, inadequate when you need it most.
We implemented adaptive rate limiting eight weeks before Black Friday 2023. The system monitored server CPU, memory, and response times in real-time, automatically tightening or loosening limits based on current capacity. When traffic spiked 340% above baseline at 9:00 AM Eastern on Black Friday, our API stayed responsive while competitors went dark.
The technical implementation used a feedback loop with three thresholds. At 60% server capacity, limits remained normal. At 75% capacity, we reduced burst allowances by 30%. At 85% capacity, we dropped all non-authenticated requests and prioritized paying customers. The entire adjustment happened in under 200 milliseconds – faster than a typical HTTP round trip.
This mirrors what Meta does with their advertising API during major events. When the Super Bowl kicks off and 10,000 advertisers simultaneously try to adjust their campaigns, Meta’s infrastructure doesn’t just absorb the load – it intelligently prioritizes based on ad spend, historical reliability, and real-time bidding activity. Their rate limiting is reportedly a $40 million annual investment in infrastructure that prevents $400 million in lost revenue.
Our adaptive system cost $2,800 to build (mostly senior engineer time) and runs on $140 monthly infrastructure. During that Black Friday spike, it prevented an estimated $31,000 in emergency scaling costs and potential downtime.
Implementation Checklist and Next Steps
Ready to implement this on your API? Here’s the exact sequence I’d follow if starting from scratch today:
- Audit current API usage – export 30 days of access logs and analyze request patterns by endpoint, user, and time of day
- Choose your algorithm – token bucket for 90% of use cases, leaky bucket only if you need perfectly smooth traffic
- Set conservative initial limits – start at 50% of your current average and adjust upward based on feedback
- Implement proper HTTP responses – return 429 status codes with Retry-After headers so clients know when to try again
- Add monitoring and alerting – track limit hits, false positives, and system performance under constraint
- Document everything – your API consumers need clear documentation about limits, tiers, and upgrade paths
- Plan your adaptive layer – even a simple CPU-based adjuster beats static limits every time
The biggest mistake I see teams make is waiting until after an incident to implement rate limiting. By then you’re already paying the cost – in server bills, customer trust, or both. Our 2.4 million request attack would’ve cost $18,000 in compute resources and taken down our API for 4-6 hours during business hours. Instead, it cost us nothing and we didn’t discover it until morning standup.
Start with token bucket. Set modest limits. Monitor aggressively. Adjust based on data, not feelings. Your infrastructure team will thank you, your AWS bill will shrink, and you’ll sleep better knowing a forgotten Python script can’t bankrupt your margin.
Sources and References
Cloudflare. (2023). API Security Report: State of Application Security. Cloudflare Research.
Amazon Web Services. (2024). API Gateway Rate Limiting Best Practices. AWS Architecture Center.
Fielding, R. & Reschke, J. (2014). RFC 7231: Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content. Internet Engineering Task Force.
Redis Labs. (2023). Rate Limiting with Redis and Lua Scripts. Redis University Technical Documentation.



