An XML sitemap is one of the simplest tools in technical SEO. It tells search engines which pages exist on your site, when they were last updated, and how to find them. Yet most guides stop at creation and submission, missing the optimisation work that actually moves the needle.
This guide covers everything: building a valid sitemap, submitting it to Google Search Console and Bing Webmaster Tools, optimising its contents, and integrating it with your internal linking strategy to maximise crawl efficiency.
Definition
An XML sitemap is a structured file in XML format that lists the URLs on a website along with optional metadata (last modification date, change frequency, priority). It acts as a roadmap for search engine crawlers, helping them discover and prioritise pages for indexing, particularly on large sites or those with poor internal linking.
Search engines discover pages primarily through links. But links are not perfect: new pages may not yet have incoming links, deep pages may be linked infrequently, and orphan pages have no links at all. The XML sitemap fills these gaps by providing a direct inventory of your site's pages to crawlers.
A basic sitemap entry looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/about/</loc>
<lastmod>2026-03-15</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
</urlset>
The <loc> tag is the only required element. Everything else is optional metadata that crawlers may or may not use.
50,000
maximum URLs per sitemap file, with a 50MB uncompressed file size limit
Source: Google Developers
The evidence for sitemap efficacy is compelling. In a 2009 experiment, Moz found that sitemap submission reduced Google's crawl time from 1,375 minutes to 14 minutes for a test site.[1] That is a dramatic improvement in crawl efficiency.
Sitemaps are particularly valuable in these scenarios:
83% of the top 100 US websites pass basic sitemap tests,[2] yet 23% of all websites still lack a properly referenced sitemap.[3] There is a clear adoption gap among smaller sites that stand to benefit most.
Most content management systems generate sitemaps automatically. Here is how to create one across the most common platforms.
WordPress powers 34.5% of all websites,[4] so this covers a large portion of sites. The Yoast SEO plugin generates a dynamic sitemap at /sitemap_index.xml automatically. The Rank Math plugin does the same. Both exclude password-protected and noindexed pages by default, which is the correct behaviour.
For non-CMS sites or custom builds, Screaming Frog's SEO Spider can generate an XML sitemap after crawling your site. After running a crawl, go to Sitemaps in the top menu to generate and export. This approach is also useful for auditing what your existing sitemap includes versus what it should include.
Small sites can create a sitemap manually using a text editor. Tools like xml-sitemaps.com generate one automatically by crawling up to 500 URLs on a free plan.
For large sites exceeding 50,000 URLs or 50MB, use a sitemap index file that references multiple child sitemaps:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap-posts.xml</loc>
<lastmod>2026-03-31</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-products.xml</loc>
<lastmod>2026-03-31</lastmod>
</sitemap>
</sitemapindex>
sitemap.xml or sitemap_index.xml)GSC shows the submission date, status, and a count of discovered and indexed URLs. If status shows an error, click through for details: common issues include URLs returning 404s, invalid XML, or blocked resources.
Also reference your sitemap in your robots.txt with a Sitemap: directive. This means every crawler that reads robots.txt also discovers your sitemap automatically, without requiring manual submission to each search engine.
Not all sitemap tags carry equal weight. Here is what Google actually uses, based on official guidance and expert input.
"Webmasters are doing a horrible job keeping [lastmod] accurate."
Gary Illyes, Google, via Ahrefs
Google only trusts <lastmod> if it is "consistently and verifiably accurate."[5] If you set all pages to today's date to game freshness signals, Google ignores it. Set it only when a page genuinely changes, and update it accurately whenever content is modified.
"Priority and change frequency doesn't really play that much of a role with Sitemaps anymore."
John Mueller, Google, via Ahrefs
Do not spend time manually calibrating <priority> values. Include them for completeness if your CMS generates them automatically, but they will not materially influence which pages Google crawls first.
Only include pages in your sitemap that you actively want indexed. Exclude:
34.5%
of all websites run on WordPress, which auto-generates XML sitemaps via SEO plugins
Source: Ahrefs
83%
of the top 100 US websites pass basic sitemap validation tests
Source: SEO Site Checkup
A sitemap tells crawlers where pages are. Internal links tell crawlers how important they are and how they relate to each other. Both signals work together, and understanding their interplay is key to a strong site architecture.
A common scenario: a site has 500 product pages in its XML sitemap, but 80 of them have zero internal links pointing to them. These orphan pages appear in the sitemap, so Googlebot can discover them. But without any internal links, they carry no link equity and signal low importance. They may be crawled but rank poorly because the site's own structure does not endorse them.
The solution is two-directional:
Tools like Linki analyse your internal link structure against your indexed pages, identifying orphan pages and link equity gaps that sitemap data alone cannot reveal. For a full picture of your site architecture, combining sitemap analysis with an internal link audit is the approach professionals use.
For the next step in building a technically sound site, see our guide to robots.txt configuration and how it complements your sitemap strategy. Our technical SEO beginner's guide covers both in the context of a complete technical foundation.
| Error | Why It Matters | Fix |
|---|---|---|
| Noindex pages included | Contradictory signals confuse crawlers | Exclude any page with a noindex directive |
| Redirected URLs | Wastes crawl budget on non-canonical URLs | Use only final destination URLs in sitemap |
| Blocked by robots.txt | Crawler cannot access the listed page | Remove from sitemap or update robots.txt |
| 404 pages included | Sends crawlers to dead ends | Remove deleted pages or restore them |
| Non-canonical URLs | Dilutes indexing signals across duplicate versions | Use only the canonical version in the sitemap |
| HTTP instead of HTTPS | Mismatches the served version if site uses HTTPS | Update all sitemap entries to use HTTPS URLs |
Google Search Console's Coverage report flags these issues automatically. After submitting your sitemap, monitor this report weekly for the first month to catch and resolve any errors quickly.
An XML sitemap is a structured file that lists the URLs on your website, along with optional metadata such as the last modification date. It follows the Sitemaps Protocol standard and is submitted to search engines via Google Search Console and Bing Webmaster Tools. Its primary purpose is to help crawlers discover and prioritise your pages for indexing, particularly useful for large sites, new sites with few external links, and pages that cannot easily be reached through internal links alone.
Log in to Google Search Console and select your property. Navigate to Indexing, then Sitemaps, in the left sidebar. In the "Add a new sitemap" field, enter the URL path to your sitemap (e.g. sitemap.xml). Click Submit. Google will then attempt to fetch and process your sitemap. The Sitemaps section will show the submission date, status, and URL counts within a few hours of processing.
An XML sitemap does not directly improve rankings, but it improves crawl efficiency, which enables faster and more complete indexing of your content. A Moz experiment found sitemap submission reduced crawl time from 1,375 to 14 minutes. The real SEO benefit is ensuring your important pages are discovered and indexed promptly, rather than waiting for crawlers to find them through link following alone.
Include all pages you want indexed: blog posts, product pages, category pages, landing pages, and service pages. Exclude pages with noindex directives, canonical tags pointing to different URLs, redirected pages (use the final destination URL instead), admin and login pages, thin or duplicate content pages, and URL parameter variants. The sitemap should be a curated list of your best, crawlable, indexable pages.
Each sitemap file can contain a maximum of 50,000 URLs and must not exceed 50MB uncompressed. For larger sites, use a sitemap index file that references multiple child sitemap files. There is no limit to the number of child sitemaps a sitemap index can reference, and Google supports sitemap index files that themselves reference other sitemap index files.
Sources