Crawl Budget Optimisation
Learn what crawl budget is, why it matters, and how to optimise it so Google spends time crawling your most important pages instead of wasting resources.
Crawl budget is the number of pages Google will crawl on your site within a given time period. For small websites (under 1,000 pages), crawl budget is rarely a concern — Google has more than enough capacity. For larger sites (10,000+ pages), crawl budget optimisation becomes important to ensure Google spends its limited crawl resources on your most valuable pages.
- Crawl budget is the number of pages Googlebot will crawl on your site within a time window.
- It matters most for large sites (10,000+ pages) — small sites rarely have crawl budget issues.
- Crawl budget is determined by crawl rate limit (how fast Google crawls without overloading your server) and crawl demand (how important Google considers your content).
- Wasted crawl budget means Google spends time on low-value pages (parameter URLs, duplicate content, error pages) instead of important ones.
- Optimise by removing low-value pages from Google's scope, improving site speed, and maintaining a clean sitemap.
If you want the full breakdown, continue below.
How Crawl Budget Works
Crawl Rate Limit
Google limits how fast it crawls your site to avoid overloading your server. The rate limit is based on:
- Server capacity — if your server responds slowly or returns errors, Google reduces crawl rate
- Server health — 5xx errors cause Google to slow down or stop crawling temporarily
- Crawl rate settings — you can request a reduced crawl rate in Google Search Console (but cannot increase it)
Crawl Demand
How much Google wants to crawl your site. Factors include:
- Popularity — pages with more backlinks and traffic get crawled more frequently
- Staleness — pages that change frequently get re-crawled more often
- New content — freshly published pages trigger crawl demand
- Sitemap signals — pages listed in sitemaps with recent lastmod dates signal crawl demand
The Combined Budget
Your effective crawl budget is the lower of: crawl rate limit and crawl demand. If Google wants to crawl 10,000 pages but your server can only handle 5,000 requests per day, your crawl budget is effectively 5,000.
When Crawl Budget Matters
Crawl budget is primarily a concern for:
- Sites with 10,000+ pages — enough pages that Google cannot crawl everything in one session
- E-commerce sites — product pages, filter combinations, and variant URLs multiply quickly
- Sites with many parameter URLs — tracking parameters, sort/filter options create thousands of duplicates
- News/publishing sites — frequent content publication requires timely crawling
- Sites with slow servers — poor server performance reduces the rate at which Google can crawl
For small to medium websites (under 5,000 pages) with reasonable server performance, crawl budget is almost never an issue.
What Wastes Crawl Budget
Duplicate Content
Multiple URLs serving the same content force Google to crawl the same page repeatedly:
- URL parameter variants (sorting, filtering, tracking)
- Session IDs in URLs
- www/non-www, HTTP/HTTPS, trailing slash variants
- Print-friendly versions of pages
Low-Value Pages
Pages that exist but provide minimal value:
- Thin content pages with a few sentences
- Tag and category archive pages with no unique content
- Old, outdated pages that are no longer relevant
- Internal search results pages
Redirect Chains
Each redirect hop costs a crawl. A chain of 3 redirects means Google uses 4 crawl requests to reach the final page.
Soft 404 Errors
Pages that return a 200 status code but display an error message. Google must crawl and process them before determining they are useless.
Infinite Crawl Spaces
Dynamic URLs that generate infinite combinations:
- Calendar widgets with clickable date navigation
- Faceted search with unlimited filter combinations
- Session-based URLs that generate new URLs per visit
How to Optimise Crawl Budget
1. Remove or Block Low-Value URLs
- Disallow parameter URLs in robots.txt
- Add noindex to thin content pages
- Block infinite crawl spaces (calendar pages, faceted navigation)
- Use canonical tags to consolidate variations
2. Fix Redirect Chains
Ensure every redirect leads directly to the final destination — no chains:
- ✅ Page A → Page C (one hop)
- ❌ Page A → Page B → Page C → Page D (three hops)
3. Maintain a Clean Sitemap
Your sitemap should only contain canonical, indexable URLs:
- Remove 404 and redirect URLs
- Remove noindex pages
- Update lastmod dates only when content genuinely changes
- Use sitemap index files to organise large URL sets
4. Improve Server Response Time
Faster server response means Google can crawl more pages in the same time window:
- Upgrade hosting if server is slow
- Implement server-side caching
- Use a CDN for static resources
- Optimise database queries
5. Fix Errors
Reduce the volume of error pages Google encounters:
- Fix 5xx server errors (these cause Google to reduce crawl rate)
- Fix or redirect 404 pages that still receive crawl attention
- Resolve soft 404s by returning proper status codes
6. Improve Internal Linking
A well-connected site allows Google to discover important pages efficiently:
- Link to important pages from high-authority pages
- Eliminate orphaned pages
- Maintain shallow architecture (3 clicks from homepage)
7. Use HTTP/2
HTTP/2 allows multiple simultaneous requests over a single connection, enabling Google to crawl more efficiently.
Monitoring Crawl Budget
Google Search Console — Crawl Stats
Navigate to Settings → Crawl Stats in Google Search Console:
- Total crawl requests — how many pages Google crawled per day
- Download size — total data downloaded during crawling
- Average response time — how fast your server responds to Googlebot
- Response codes — breakdown of 200, 301, 404, 5xx responses
- Crawl purpose — refresh (re-crawling existing pages) vs discovery (finding new pages)
What to Look For
- Declining crawl rate — may indicate server problems or quality issues
- High error rate — too many 404s or 5xx errors waste budget
- Slow response times — servers taking over 500ms per request limit crawl capacity
- Refresh vs discovery ratio — if Google is mostly re-crawling old pages, new content discovery suffers
Key Takeaways
- Crawl budget matters primarily for large sites (10,000+ pages).
- It is determined by crawl rate limit (server capacity) and crawl demand (content importance).
- Duplicates, low-value pages, redirect chains, and errors waste crawl budget.
- Optimise by cleaning your sitemap, fixing redirects, improving server speed, and blocking low-value URLs.
- Monitor crawl stats in Google Search Console Settings.
Quick Crawl Budget Checklist
- Parameter URLs blocked or canonicalised
- Redirect chains resolved (max one hop)
- XML sitemap contains only canonical, indexable URLs
- No soft 404 errors
- Server response time under 500ms
- 5xx errors resolved
- Low-value pages blocked or removed
- Infinite crawl spaces blocked in robots.txt
- Internal linking structure ensures important pages are easily discoverable
- Crawl Stats monitored monthly in Google Search Console
Tools & Resources (Coming Soon)
- Crawl Budget Analyzer (Coming soon)
- SEO Audit Tool (Coming soon)
- Redirect Chain Checker (Coming soon)
Related SEO Documentation
Was this helpful?