Canonical URL Resolution

Before any path-specific fetches, the scanner follows the root URL and records the final destination after all redirects. Specifically, it records the scheme://host of the terminal response. All subsequent fetches - robots.txt, llms.txt, sitemap.xml, etc. - use this canonical base.

Why this matters

Without canonical URL resolution, a domain like example.co.uk that redirects to example.com would cause a problem: fetching example.co.uk/robots.txt would follow the redirect chain to example.com/ (the homepage), not example.com/robots.txt.

By resolving the canonical base first, all path-specific fetches are constructed against the correct terminal host.

Step in execution order

Canonical URL resolution is Step 0 - sequential and blocking. No other fetches begin until the canonical base is known.

See Execution Order for the full scan sequence.