Overview
Pagination splits long lists across numbered URLs; faceted navigation lets users filter those lists by attributes like color, size, or price. Both produce URLs that crawlers must decide whether to index. Get the rules right and Google indexes every product or post page on a deep catalog. Get them wrong and crawl budget burns on filter combinations while the long-tail listings stay unindexed.
Make every paginated page self-canonical
Each numbered page declares its own URL as canonical. Google deprecated rel="next" and rel="prev" as an indexing signal in 2019; the tags are inert. The current rule is one canonical per page, pointing to itself:
<!-- on /blog/page/3 -->
<link rel="canonical" href="https://example.com/blog/page/3" />Pointing every paginated page back at /blog/ (page 1) is the most common error. It looks tidy and it loses every long-tail ranking signal that lives on pages 2 through N. Posts on page 7 become unreachable for Google as ranking targets even though users can still click through to them. Self-canonical pagination keeps each listing eligible to rank for the queries that match the items it shows.
Keep deep pages reachable in two or three clicks
Crawlers follow links by depth from the homepage. A post on page 47 of a flat numbered list is 47 clicks deep, and Google will treat it as low priority. Two patterns fix this.
- Numbered pagination with first, last, and page-jump links. Always render the first page, the last page, and the immediate neighbors of the current page. A user (or crawler) reaches any page in two clicks at most.
- Hub pages by date, tag, or category. A
/blog/2025/page links to every post from 2025, flattening the depth of every post on the site.
See site-architecture for the broader rule that the most valuable pages should sit within three clicks of the homepage, and internal-linking for the link-equity side of the same problem.
Use noindex,follow for parameter-heavy facet URLs
Faceted navigation that filters on color, size, or price produces URL variants like /shoes?color=red&size=10. Most variants do not deserve their own index entry; they are slices of the parent category. Mark them noindex,follow:
<!-- on /shoes?color=red&size=10 -->
<meta name="robots" content="noindex,follow" />
<link rel="canonical" href="https://example.com/shoes" />noindex keeps the filter URL out of the index. follow lets crawlers walk product links from it. The canonical points to the unfiltered category so any inbound link equity flows there. A small number of high-value facets (a brand, a top category) can be promoted to indexable landing pages with their own content, but the default for filter combinations is noindex.
Block infinite-combination facets in robots.txt
Some facets produce a combinatorial explosion that even noindex cannot save crawl budget on, because Google still has to fetch the page to read the meta tag. For those, block at robots.txt:
User-agent: *
Disallow: /*?sort=
Disallow: /*?price=
Disallow: /*&color=
Use this only for parameters that never produce a useful landing page. Once blocked, Google cannot read any canonical or noindex on the URL, so the block is permanent until removed. See crawl-budget for the broader principle that crawl budget should be spent on pages that can rank.
Render real URLs even when the UI is infinite scroll
Infinite scroll feels modern and breaks pagination if implemented naively. The fix is progressive enhancement: render normal numbered pagination URLs in the initial HTML response, then let JavaScript take over for users with JS enabled.
| Pattern | Crawlable | User experience |
|---|---|---|
| Numbered pagination only | Yes | One reload per page |
| Infinite scroll, no URLs | No | Smooth but unindexable past page 1 |
| Numbered pagination upgraded to infinite scroll by JS | Yes | Smooth, every page still has a URL |
The third pattern is the only one that satisfies both crawlers and users. Each page has a real URL Google can fetch; the JS layer adds the scroll-to-load behavior on top. See javascript-seo for the broader rule that critical content must exist in the initial HTML response.
Common errors
rel="canonical"from page 2+ back to page 1. Long-tail traffic dies because Google treats pages 2 through N as duplicates of page 1.- Infinite scroll with no fallback URLs. Anything past the first viewport is invisible to crawlers.
- Faceted nav generating millions of indexable filter URLs. Crawl budget burns on
?color=red&size=10&material=leather&sort=pricepermutations while real product pages stay unindexed. - Blocking faceted URLs in robots.txt and also declaring noindex. Google cannot read the noindex through a robots block; pick one signal.
- Mixing pagination with session IDs in the URL. Every session produces a fresh set of “pages” from Google’s perspective.