Site Architecture

Site architecture is the information design layer of SEO: how URLs are structured, how pages group into sections, and how navigation exposes them. Good architecture makes every other optimization more effective; bad architecture quietly caps how far a site can rank regardless of content quality.

URL design#

URLs are read by users (in SERPs and link previews), by engines (as a relevance hint), and by you (in analytics forever). Design them once, correctly:

GOOD  https://example.com/guides/technical-seo/crawl-budget
      lowercase · hyphens · hierarchical · descriptive · stable
 
BAD   https://example.com/index.php?id=4302&cat=7&ref=nav
      opaque · parameterized · fragile

Rules that age well:

Lowercase, hyphens, ASCII. Mixed case creates duplicate-content URLs on case-sensitive servers.
Reflect hierarchy (/guides/technical-seo/...) - it feeds breadcrumbs, lets you measure sections, and communicates structure.
Keep slugs short and descriptive; skip filler words.
No dates in evergreen slugs - /blog/2019/05/post looks stale forever and breaks on republish.
Stability is the prime directive. Every URL change requires a redirect and loses a little equity. Choose patterns you can keep for a decade.

The hierarchy and click depth#

Architect as a pyramid: homepage → section hubs → detail pages. Two structural rules:

Important pages within 3 clicks of the homepage. Crawl frequency and authority decay with depth.
Flat beats deep, until hubs lose meaning. A thousand pages linked straight from the homepage is not architecture; group them into meaningful hubs that can themselves rank for category terms (your pillar pages).

Navigation should expose the hierarchy: primary nav for sections, breadcrumbs everywhere (BreadcrumbList schema), contextual links within content, and an HTML sitemap or hub indexes for long-tail discovery.

Pagination#

For lists spanning multiple pages (/blog?page=2 or /blog/page/2):

Give each page a unique, self-canonical URL. Don't canonicalize page 2+ to page 1 - that asks Google to ignore everything past page one, orphaning deep items.
Link pages sequentially with plain anchors (prev/next plus numbered links).
Prefer "load more" backed by real URLs over infinite scroll that exists only in JS - items unreachable without scripting are unreachable for the crawler's first wave.

Filters and sorts (?color=red&size=m&sort=price) can mint millions of URL combinations from one category - the classic crawl budget catastrophe for e-commerce. The standard containment strategy:

Facet type	Treatment
High-demand combos ("red running shoes")	Indexable landing pages with clean URLs, in sitemap
Everything else	Canonical to the base category + `Disallow` patterns in robots.txt
Sorts, view options, session params	Never indexable; exclude from internal links where possible

Decide which facet combinations have real search demand (your keyword map answers this) and let only those exist as crawlable URLs.

Subdomain vs. subdirectory#

The eternal question: blog.example.com vs example.com/blog. Practical guidance: prefer subdirectories for content that should reinforce the main site's authority - engines treat subdomains as more loosely connected, and consolidation usually measures better. Use subdomains for genuinely separate applications (app., status., docs. for a different product).

Internationalization (hreflang)#

If you serve multiple languages/regions, hreflang tells engines which variant to show whom:

app/[lang]/layout.tsx (metadata excerpt)

export const metadata = {
  alternates: {
    languages: {
      "en-US": "https://example.com/en",
      "de-DE": "https://example.com/de",
      "x-default": "https://example.com/en",
    },
  },
};

Rules: annotations must be bidirectional (each variant lists all others, including itself), each variant must be indexable, and x-default covers everyone unmatched. Half-implemented hreflang is the most common international SEO defect - validate with a crawler.

Next: Core Web Vitals - the performance metrics that are also ranking signals.