The Linki

What Is Crawl Budget and How to Optimise It | Linki

Written by Linki | May 3, 2026 3:00:00 AM

Crawl budget is one of the most consequential technical SEO concepts for large or rapidly growing websites. When Googlebot allocates a finite number of page visits to your site, every wasted crawl on a low-value URL is a missed opportunity to get a high-value page indexed, refreshed, or ranked. For smaller sites, crawl budget rarely constrains performance. For sites with tens of thousands of URLs, it can be the bottleneck between publishing content and seeing it rank.

This guide explains what crawl budget is, how Google calculates it, how to check it, and how to optimise it with a focus on the internal linking decisions that most directly influence what Googlebot chooses to crawl.

Definition

Crawl budget is "the set of URLs that Google can and wants to crawl" on a given website within a given time period. It is determined by two factors: crawl capacity (how fast and often Googlebot can crawl without overloading the server) and crawl demand (how much Googlebot wants to crawl specific URLs based on perceived importance and freshness). This definition comes directly from Google's developer documentation.

Who needs to worry about crawl budget?

Google has been explicit about this. Their documentation states that crawl budget management matters for sites with more than 1 million pages, or sites with at least 10,000 pages that update daily.[1] If your site has fewer than 1,000 pages and updates infrequently, crawl budget is unlikely to be a limiting factor for your SEO.

That said, understanding crawl budget is useful for any site, because the practices that optimise it (clean URL structure, fast server response, minimal duplicate content, strong internal linking) are also SEO best practices that improve performance regardless of site size.

Site size Crawl budget priority Typical impact
Under 1,000 pages Low Rarely a constraint
1,000–10,000 pages Medium Worth auditing; some waste likely
10,000+ pages with frequent updates High Direct impact on indexing speed and rankings

How Google calculates crawl budget

Crawl budget is the product of two independent signals that Google evaluates for every website.

Crawl capacity limit

This is the maximum rate at which Googlebot will crawl your site without causing server degradation. It is determined by your server's response time, error rate, and stability. Fast, reliable servers get a higher crawl capacity; slow or frequently erroring servers see Googlebot back off. You have direct control over this through hosting quality, CDN configuration, image optimisation, and server-side performance work. See: Core Web Vitals and server performance.

Crawl demand

This is Google's assessment of how much it wants to crawl each URL on your site. High demand pages are those with strong signals of importance and freshness: they receive many internal and external links, they have been recently updated, and they generate click traffic in Search. Low demand pages are those with few links, thin content, no external authority, and no update history. Demand is the factor most directly influenced by internal linking.

~60%

of the internet is estimated to be duplicate content, wasting crawl budget at scale

Source: Gary Illyes (Google), via Ahrefs

Gary Illyes of Google has stated that "Google's crawling process is highly focused on removing duplication because 60% of the internet is duplicate."[2] Duplicate content is one of the biggest crawl budget drains. Every time Googlebot visits a parameter-generated duplicate of a page it has already seen, that is a crawl wasted on content with no unique value.

How to check your crawl budget in Google Search Console

GSC provides two reports that give you a practical picture of your crawl budget situation.

Crawl Stats report

Navigate to Settings (gear icon) in GSC, then "Crawl stats". This report shows the total number of requests from Googlebot over the past 90 days, broken down by response code, file type, and purpose. Key metrics to monitor:

  • Total crawl requests: How many pages Googlebot visited daily. Compare this to your total indexed pages. If Googlebot is crawling 500 pages daily and you have 50,000 indexed pages, at that rate it takes 100 days to recrawl every page. Fresh content updates may take weeks to be discovered.
  • Download size: Large average page sizes consume more crawl capacity. Oversized pages (due to heavy JavaScript, large images, unoptimised CSS) slow the crawl.
  • Response time: Googlebot's crawl rate correlates with server response time. Average response times above 500ms are worth investigating.

Indexing report (Coverage)

The Index Coverage report (under "Indexing > Pages" in GSC) shows you how many pages are indexed vs excluded, and why excluded pages were not indexed. High counts of "Crawled, currently not indexed" can indicate crawl budget waste on low-quality pages. High counts of "Discovered, currently not indexed" suggest Googlebot has found the URL but has not prioritised crawling it, often because of insufficient internal link signals.

Crawl budget optimisation tactics

The most effective optimisations target either the numerator (increasing crawl capacity) or the denominator (reducing wasted crawl on low-value URLs).

1. Block or noindex low-value URL types

Identify URL categories that provide no unique value to searchers and should not consume crawl budget:

  • Pagination pages (especially deep pages: page 50 of a blog archive)
  • Faceted navigation parameter combinations
  • Session ID parameters
  • Thin or auto-generated content pages
  • Staging or testing pages accidentally exposed to Googlebot

Use robots.txt to disallow crawling of URL patterns that should never be accessed by crawlers. Use noindex meta tags for pages that users can access but should not be indexed. Use canonical tags to consolidate parameter variants. See: canonical tags explained.

10k+

pages with frequent updates: the threshold at which Google recommends active crawl budget management

Source: Google Developers documentation

2. Fix redirect chains

Each redirect hop Googlebot follows uses crawl capacity and time. A chain of 301 redirects (A to B to C) wastes two crawls to reach the final destination. Audit for redirect chains in your site crawl, and update all links (internal and in sitemaps) to point directly to the final destination URL. See: fixing broken internal links and redirect chains.

3. Remove soft 404s and server errors

Soft 404s (pages that return a 200 HTTP status but display a "not found" or empty content response) are particularly damaging. Googlebot wastes a crawl, receives no useful content, and may suppress indexing of other pages based on the quality signal. Audit for soft 404s in GSC's Pages report under "Not found (404)" and "Soft 404" categories.

4. Improve server response time

Googlebot adjusts its crawl rate based on server health. Pages loading over 500ms consistently signal a strained server. Work with your hosting provider or implement CDN caching, browser caching, and image compression to bring average response times below 200ms for important pages.

5. Optimise internal linking to increase crawl demand

This is where crawl budget optimisation intersects most directly with internal link architecture. Internal links are the primary mechanism through which Googlebot discovers pages and judges their relative importance. Pages that receive many internal links are crawled more frequently. Pages that receive few or no links are deprioritised or missed entirely.

Key actions:

  • Identify and fix orphan pages (pages with 0 internal inlinks). These pages generate no crawl demand signal from the link graph. See: how to identify pages with too few internal links.
  • Add contextual links from high-authority hub pages to important but under-linked content.
  • Ensure your XML sitemap only includes URLs you want indexed, and that all sitemap URLs are internally linked from at least one other crawlable page.
  • Remove internal links pointing to noindex or disallowed pages. These waste crawl capacity and can confuse crawl signals.

"Google's crawling process is highly focused on removing duplication because 60% of the internet is duplicate."

Gary Illyes, Google, via Ahrefs crawl budget guide

6. Submit an accurate XML sitemap

Your sitemap tells Googlebot which pages exist and (optionally) when they were last updated. Ensure your sitemap contains only canonical, indexable, 200-status URLs. Remove noindex pages, redirect URLs, and parameter variants from the sitemap. A sitemap that accurately represents your best content steers crawl budget towards those pages.

Optimise crawl budget with smarter internal linking

Linki identifies orphan pages, redirect-chain links, and under-linked high-value pages that drain your crawl budget. Join the waitlist for early access.

Join the Linki Waitlist

How to calculate your crawl score

A simple diagnostic metric is the crawl score: the ratio of indexed pages to daily crawl requests.

Crawl score = Indexed pages / Daily Googlebot requests

A score of 1-3 is considered healthy: Googlebot is visiting each page roughly every 1-3 days. A score above 10 means Googlebot is crawling infrequently relative to your indexed pages, and freshness signals are slow to update.[3]

Example: A site with 5,000 indexed pages and 1,500 daily crawl requests has a crawl score of 3.3 (healthy). A site with 20,000 indexed pages and 800 daily crawls has a score of 25 (problematic). The second site should aggressively reduce low-value URLs and improve internal linking to raise crawl demand for its most important pages.

Linki and crawl budget optimisation

The fastest way to improve crawl demand for your best pages is to fix your internal link architecture. Linki analyses your complete internal link graph to identify:

  • Orphan and near-orphan pages generating no crawl demand signal
  • Internal links pointing to 301 redirect chains (wasted crawl hops)
  • High-value pages that are under-linked relative to their business or organic importance
  • Pages that link to noindex or disallowed URLs (sending crawlers into dead ends)

By fixing these issues, you concentrate Googlebot's crawl capacity on the pages that matter most, accelerating indexing of new content and improving recrawl frequency for your most important existing pages.

Stop wasting Googlebot's visits on low-value pages

Linki maps your internal link graph, identifies crawl budget drains, and surfaces the fixes that concentrate crawl demand on your highest-priority pages.

Get Early Access to Linki

Frequently asked questions

What is crawl budget in SEO?

Crawl budget is the number of URLs Googlebot will crawl on your site within a given period. Google defines it as "the set of URLs that Google can and wants to crawl." It is determined by crawl capacity (how fast Googlebot can crawl without overloading your server) multiplied by crawl demand (how much Google wants to crawl specific URLs based on their importance and freshness signals).

How do I check crawl budget in Google Search Console?

In GSC, go to Settings (gear icon) and click "Crawl stats". This report shows total Googlebot requests over 90 days, broken down by response code and file type. To check crawl demand signals, use the "Pages" report under "Indexing" to identify "Discovered, currently not indexed" pages, which indicate URLs Googlebot has found but not prioritised for crawling.

What is a good crawl budget score?

A crawl score of 1-3 (indexed pages divided by daily crawl requests) is considered healthy, indicating Googlebot revisits each page roughly every 1-3 days. A score above 10 suggests Googlebot is crawling infrequently, and freshness signals may be slow to update. Sites with scores above 10 should focus on reducing low-value URLs and strengthening internal links to high-priority pages.

Does internal linking affect crawl budget?

Yes, significantly. Internal links are the primary mechanism through which Googlebot discovers pages and judges their crawl priority. Pages with many internal inlinks receive higher crawl demand and are revisited more frequently. Pages with zero or very few internal links (orphan and near-orphan pages) generate minimal crawl demand and may be crawled rarely or missed entirely. Improving internal link distribution is one of the most direct ways to optimise crawl budget for large sites.

Can crawl budget affect rankings?

Indirectly, yes. Crawl budget does not directly influence ranking algorithms, but pages that are crawled infrequently receive delayed indexing of updates, slower discovery of new content, and reduced freshness signals. For sites publishing time-sensitive content or frequent updates, poor crawl budget management translates directly into slower ranking improvements.