Site Architecture
Site architecture is the information design layer of SEO: how URLs are structured, how pages group into sections, and how navigation exposes them. Good architecture makes every other optimization more effective; bad architecture quietly caps how far a site can rank regardless of content quality.
URL design#
URLs are read by users (in SERPs and link previews), by engines (as a relevance hint), and by you (in analytics forever). Design them once, correctly:
GOOD https://example.com/guides/technical-seo/crawl-budget
lowercase · hyphens · hierarchical · descriptive · stable
BAD https://example.com/index.php?id=4302&cat=7&ref=nav
opaque · parameterized · fragileRules that age well:
- Lowercase, hyphens, ASCII. Mixed case creates duplicate-content URLs on case-sensitive servers.
- Reflect hierarchy (
/guides/technical-seo/...) - it feeds breadcrumbs, lets you measure sections, and communicates structure. - Keep slugs short and descriptive; skip filler words.
- No dates in evergreen slugs -
/blog/2019/05/postlooks stale forever and breaks on republish. - Stability is the prime directive. Every URL change requires a redirect and loses a little equity. Choose patterns you can keep for a decade.
The hierarchy and click depth#
Architect as a pyramid: homepage → section hubs → detail pages. Two structural rules:
- Important pages within 3 clicks of the homepage. Crawl frequency and authority decay with depth.
- Flat beats deep, until hubs lose meaning. A thousand pages linked straight from the homepage is not architecture; group them into meaningful hubs that can themselves rank for category terms (your pillar pages).
Navigation should expose the hierarchy: primary nav for sections, breadcrumbs everywhere (BreadcrumbList schema), contextual links within content, and an HTML sitemap or hub indexes for long-tail discovery.
Pagination#
For lists spanning multiple pages (/blog?page=2 or /blog/page/2):
- Give each page a unique, self-canonical URL. Don't canonicalize page 2+ to page 1 - that asks Google to ignore everything past page one, orphaning deep items.
- Link pages sequentially with plain anchors (prev/next plus numbered links).
- Prefer "load more" backed by real URLs over infinite scroll that exists only in JS - items unreachable without scripting are unreachable for the crawler's first wave.
Faceted navigation: the crawl trap#
Filters and sorts (?color=red&size=m&sort=price) can mint millions of URL combinations from one category - the classic crawl budget catastrophe for e-commerce. The standard containment strategy:
| Facet type | Treatment |
|---|---|
| High-demand combos ("red running shoes") | Indexable landing pages with clean URLs, in sitemap |
| Everything else | Canonical to the base category + Disallow patterns in robots.txt |
| Sorts, view options, session params | Never indexable; exclude from internal links where possible |
Decide which facet combinations have real search demand (your keyword map answers this) and let only those exist as crawlable URLs.
Subdomain vs. subdirectory#
The eternal question: blog.example.com vs example.com/blog. Practical guidance: prefer subdirectories for content that should reinforce the main site's authority - engines treat subdomains as more loosely connected, and consolidation usually measures better. Use subdomains for genuinely separate applications (app., status., docs. for a different product).
Internationalization (hreflang)#
If you serve multiple languages/regions, hreflang tells engines which variant to show whom:
export const metadata = {
alternates: {
languages: {
"en-US": "https://example.com/en",
"de-DE": "https://example.com/de",
"x-default": "https://example.com/en",
},
},
};Rules: annotations must be bidirectional (each variant lists all others, including itself), each variant must be indexable, and x-default covers everyone unmatched. Half-implemented hreflang is the most common international SEO defect - validate with a crawler.
Next: Core Web Vitals - the performance metrics that are also ranking signals.
