InsightsTechnical

The Modern Technical SEO Checklist for 2026

Technical SEO did not get easier in 2026. It got more layered. Crawl, render, index, performance, schema, internal linking, and AI-crawler readiness now have to work as one system, or none of them work at all.

TL;DR
  • Technical SEO in 2026 is no longer a tidy back-office task. It is the foundation for both classical search and AI retrieval, and the two now share more signals than they have at any point in the last decade.
  • The non-negotiables: clean crawl paths, predictable rendering, full indexation of cite-worthy pages, Core Web Vitals on the right side of "Good", and Organization + Article + FAQ schema applied with intent.
  • Internal linking is now an expertise signal, not a navigational nicety. Hub-and-spoke architectures still work, but only when the hubs are genuinely cite-worthy and the spokes earn their place.
  • AI crawlers (GPTBot, PerplexityBot, ClaudeBot, Google-Extended, Bytespider) need to be allowed, observable, and served the same content humans see. Bot management defaults are the silent killer of AI visibility in 2026.
  • The checklist below is the same one we run on every audit at horatos.ai. Work through it once a quarter and you will catch 90% of the technical issues that quietly cap a site's growth.
01 / 08
Context

Why technical SEO still matters in an AI-driven world.

Every six months for the last three years, someone publishes a confident article arguing that technical SEO is dead. Every six months we audit a brand that lost half its traffic because of a quietly mis-configured noindex, a render-blocked component, or a Cloudflare bot rule that locked GPTBot out of the entire site. Technical SEO is not dead. It became the substrate that everything else, including AI visibility, sits on top of.

Here is the simple reason. Both Google\'s ranking systems and the LLM retrieval pipelines that feed ChatGPT, Perplexity, Gemini, and Claude depend on three things: that they can reach your pages, that they can render them faithfully, and that they can understand them as discrete, structured entities. When any one of those three breaks, classical SEO ranking signals weaken and AI citation rate collapses. They share more infrastructure in 2026 than they have at any point in the last decade.

Below is the checklist we work through on every horatos.ai technical engagement. It is opinionated, sequenced, and field-tested. Each item below has, at some point in the last 18 months, been the single change that unlocked a meaningful traffic step for a client. None of it is theoretical.

02 / 08
Layer 01

Crawl and discovery, the invisible bottleneck.

Crawl is where most technical SEO problems begin and where almost no one looks. If a search engine or AI crawler cannot fetch a page in a reasonable number of hops, it does not matter how good the content on that page is. The page does not exist for that system.

  1. robots.txt is permissive and explicit. Allow Googlebot, Bingbot, and the major AI crawlers, GPTBot, PerplexityBot, ClaudeBot, Google-Extended, Applebot-Extended, Amazonbot, Bytespider, by name. Default-deny on robots.txt is the most common reason a site is invisible inside ChatGPT and Perplexity in 2026, and it is almost always inherited from a CDN or platform default no one ever questioned.
  2. sitemap.xml is current and complete. Every cite-worthy URL listed once. Stale entries, parameterised URLs, and pagination junk removed. lastmod dates are real, not pinned to the build timestamp, search engines now use lastmod as a real freshness signal again.
  3. Canonical strategy is consistent. One canonical per page. No conflicting canonicals between the HTML <link rel="canonical"> tag, HTTP headers, and the sitemap. Mixed signals is one of the top three reasons we see good content fail to rank.
  4. Internal links reach every important page within three clicks of the homepage. If your most cite-worthy article is four or more hops deep, it will be crawled less often, ranked lower, and cited less in AI answers.
  5. Crawl budget is not wasted on garbage. Faceted-search URLs, calendar pages, infinite parameter combinations, all of these need to be either disallowed in robots.txt or served with noindex, follow. Search Console\'s "Crawled, currently not indexed" report is the most under-read file in modern SEO.

The simplest crawl audit you can run yourself: open Google Search Console, go to Crawl Stats, and check the trend line. If crawl requests have been declining for the last 60 days while you have been publishing, something is wrong. Investigate before you publish another piece of content.

03 / 08
Layer 02

Render and indexation, what bots actually see.

Crawl is reaching the page. Render is whether the bot can read what is on it. In 2026, with most production stacks built around React, Vue, or Svelte hydration, render is the layer where most modern sites quietly bleed visibility.

  1. Server-render or pre-render every page that needs to rank or be cited. Googlebot can render JavaScript, but it does so on a delay and at a cost. Most AI crawlers (Perplexity, Claude, several enterprise retrieval systems) do not render JavaScript at all. If your content only appears after hydration, those systems see an empty shell.
  2. Critical content is in the initial HTML. Headings, body copy, schema, internal links, primary CTAs. Test by viewing source (not Inspect Element) and confirming the words you want to rank for are visibly present.
  3. Indexation matches your strategy. Use Google Search Console\'s "Indexing > Pages" report to confirm: every page you want indexed shows as "Indexed", and every excluded page is excluded for the right reason. We routinely find sites where 30–60% of cite-worthy pages are excluded as "Duplicate without user-selected canonical", almost always a fixable canonical-tag issue.
  4. HTTP status codes are accurate. 200 for live content, 301 for permanent redirects, 404 for genuinely-removed pages, 410 for content that should be removed quickly from the index, 503 only for actual maintenance. We have seen sites accidentally serve 200 status with empty bodies for five-figure-traffic pages, every signal a search engine receives is "this is a real page, just empty", which is the worst possible message.
  5. Hreflang is correct or absent. Bad hreflang causes more harm than no hreflang. If your site is Singapore-only, do not implement hreflang at all. If you serve multiple regions, use Search Console\'s International Targeting report monthly to catch pairing errors early.

A useful exercise: fetch your most important page with curl -A "Mozilla/5.0 (compatible; PerplexityBot)" and read the raw HTML. If you cannot find your hero copy, your H2s, and your structured data in that response, AI retrieval cannot find it either.

04 / 08
Layer 03

Performance and Core Web Vitals.

Core Web Vitals are not a tiebreaker on their own, but they are a multiplier on every other signal. A site with strong content but a 600ms INP loses to a site with merely-good content and a 150ms INP, every time, in competitive sets we monitor. Targets we hold our clients to in 2026:

  • LCP under 2.0s at the 75th percentile of mobile field data, on the page templates that account for >60% of organic landings.
  • INP under 200ms at the 75th percentile. (INP fully replaced FID in March 2024, the field metric is now what matters, not lab Lighthouse scores.)
  • CLS under 0.05. The official threshold is 0.1, but in practice every site we have ever shipped that hit 0.05 outranked similar sites that sat at 0.08.
  • Time to First Byte under 600ms for the geographies you actually serve. Cloudflare-cached HTML, Vercel edge SSR, or a properly-tuned origin all get you there.

One nuance most teams miss: Lighthouse scores in your CI pipeline are a development tool, not a ranking input. The number Google actually uses is field data from real Chrome users, surfaced in CrUX and the Page Speed Insights "Origin" tab. Optimise lab metrics if you want to ship faster, but always validate against field data before declaring a performance win.

05 / 08
Layer 04

Structured data, applied with intent.

Structured data is the single highest-leverage technical task you can do in 2026. Schema.org markup is now consumed by both Google\'s ranking system and every major LLM retrieval pipeline. A page with clean Article + Organization + Breadcrumb + FAQPage schema is between two and four times more likely to be cited verbatim in a Perplexity or Gemini answer than the same page without schema, in our internal testing.

Apply, in this order:

  1. Organization schema on every page (in the global JSON-LD graph), with consistent name, url, logo, sameAs, address, contactPoint. This is the single strongest entity signal you can send.
  2. Article schema on every blog post, with real datePublished, dateModified, author (linked to a real Person schema with jobTitle and sameAs to LinkedIn), and wordCount. The author Person schema is a 2024–2026 ranking input that few sites bother with and is doing real work.
  3. Breadcrumb schema matching the visible breadcrumb on the page.
  4. FAQPage schema on pages that genuinely answer recurring questions. Do not abuse this, Google has been demoting sites that use FAQPage schema decoratively since late 2023.
  5. Product, LocalBusiness, Course, Recipe, HowTo as relevant. Match the schema to the actual content, not the other way around.

Validate every change with the Schema.org Markup Validator and Google\'s Rich Results Test. A schema error on a high-traffic page can quietly remove rich results for the entire template, we have seen this cost mid-five-figure organic clicks per month on e-commerce sites.

06 / 08
Layer 05

Internal linking, the new expertise signal.

Internal linking used to be about navigation. In 2026 it is about expertise demonstration. The pattern of how your pages reference each other tells search engines and LLMs which pages are central to your topic and which are peripheral. Three patterns we use on every horatos engagement:

  1. Hub-and-spoke for every meaningful topic cluster. One cornerstone page (the hub) that comprehensively covers the topic, surrounded by 5–15 supporting pages (the spokes) that go deep on sub-topics, all linking to the hub with descriptive, varied anchor text. The hub links back to every spoke. This pattern is older than Google\'s topic authority systems, but it is also exactly what those systems reward.
  2. Anchor text describes the destination, not the link. "Read our guide to local SEO in Singapore" beats "click here". Vary the anchor text naturally, never use the same exact phrase twice in the same paragraph.
  3. Link from new pages back to your strongest existing pages, every time. Every new article should reinforce the authority of two to four cornerstone pieces, never publish in isolation.

The anti-pattern: site-wide footer links to 30 pages with the same anchor text. Search engines have devalued these for years, and LLMs effectively ignore them. Concentrated, contextual, in-content links are doing the real work.

07 / 08
Layer 06

AI-crawler readiness, the new technical layer.

This is the layer most 2024 technical SEO checklists do not include and the layer that, in 2026, makes the largest difference to a brand\'s visibility inside AI-generated answers.

  1. llms.txt at the root. A short, clear summary of your site, your services, and your most cite-worthy URLs, written in Markdown for LLM consumption. We have seen this single file improve LLM citation rate within four to six weeks of publication on every site we have run the experiment on.
  2. Bot allowlists, not blocklists. Cloudflare Bot Fight Mode, Vercel\'s default bot rules, and most security WAFs ship with rules that quietly block AI crawlers. Audit your access logs monthly for GPTBot, PerplexityBot, ClaudeBot, Google-Extended, Applebot-Extended, Bytespider, and Amazonbot. If you do not see them in your logs, you are not in their training or retrieval set.
  3. Stable URLs, no JavaScript-only navigation. AI crawlers tend to ingest static HTML and treat client-side router transitions as "the same page". Make every cite-worthy page a real URL with a real <a href> link to it from somewhere reachable on the site.
  4. Author transparency. Real author pages with real names, real bios, real LinkedIn links, real other-site bylines. LLMs cite content under named experts at materially higher rates than anonymous content. If your blog still publishes under "Admin", "Marketing Team", or no author at all, fix that this quarter.
  5. Consistent entity signals everywhere. The brand name spelled the same way on every page, in every schema block, in every press mention, on every social profile, on Wikidata where applicable. Entity disambiguation is what allows LLMs to know that the company being asked about is unambiguously you.
08 / 08
Final word

A technical SEO checklist is not a one-off project.

The teams that win on technical SEO in 2026 do not run a giant audit, fix everything, and then move on for two years. They run a quarterly version of the checklist above, fix the three to five things that have drifted since the last cycle, and keep moving. Technical SEO entropy is real. CDN configurations change, build tools update, third-party scripts get added, schema specs evolve. Without a quarterly rhythm, even a perfect setup degrades inside twelve months.

If you want a second pair of eyes on yours, this is exactly the work we do on every SEO engagement at horatos.ai. We will run the full checklist against your site, prioritise the fixes by traffic impact, and tell you honestly which items will move the needle and which are not worth the cycle. No commitment, no pressure.

horatos.ai
Singapore's Best AI SEO Agency

Want help applying this to your business?

If this article feels relevant, a strategy call is the fastest way to discuss what it could mean for your brand and where to start.

Chat with Brian