Preparing SEO for NextJS 13 site
The Context
Working on my personal website, I wanted to receive better indexing for individual articles on Google. I didn’t have any SEO set up on my site before this setup.
The Solution
There are 3 main steps:
- Add a
/robots.txt
endpoint to your site which tells search engine crawlers which URLs the crawler can access on your site.
- Add a
sitemap.xml
endpoint to your site which describes the exact available pages for a search engine to crawl. This can be done dynamically by: - Adding a new api endpoint to fetch some data and return XML.
- Redirecting requests to the
/sitemap.xml
endpoint to the/api/sitemap
endpoint by configuring yournext.config.js
- Going to Google Search Console and submitting your sitemap to be indexed by Google.
1. robots.txt
With Next JS v13.3.0 I can add an
/app/robots.ts
file which Next parses and serves as a robots.txt
file:import { MetadataRoute } from "next"; import { DOMAIN } from "./constants"; export default function robots(): MetadataRoute.Robots { return { rules: { userAgent: "*", allow: "/", disallow: "/private/", // any paths you don't want to be indexed }, sitemap: `${DOMAIN}/sitemap.xml`, }; }
Output
User-Agent: * Allow: / Disallow: /private/ Sitemap: https://jameshw.dev/sitemap.xml
2. sitemap.xml
I can then add the sitemap by creating an api endpoint at
/pages/api/sitemap.ts
: /* eslint-disable @typescript-eslint/restrict-template-expressions */ import type { NextApiRequest, NextApiResponse } from "next"; import { serverSideCmsClient } from "api/services/cms/cms.client"; import { isArticle, isJournalEntry } from "types/guards"; import { DOMAIN, PATHS } from "app/constants"; const getSitemapRoute = (path: string) => { return ` <url> <loc>${DOMAIN}${path}</loc> <lastmod>${new Date().toISOString().split("T")[0]}</lastmod> </url>`; }; export default async function handler(_: NextApiRequest, res: NextApiResponse) { res.statusCode = 200; res.setHeader("Content-Type", "text/xml"); // Instructing the Vercel edge to cache the file res.setHeader("Cache-control", "stale-while-revalidate, s-maxage=3600"); const articles = await serverSideCmsClient.getDatabaseEntries( process.env.BLOG_DB_ID, isArticle ); const journals = await serverSideCmsClient.getDatabaseEntries( process.env.JOURNAL_DB_ID, isJournalEntry ); res.send(`<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> ${Object.values(PATHS).map((path) => getSitemapRoute(path))} ${articles.map(({ slug, published }) => getSitemapRoute(`${PATHS.BLOG}/${published}/${slug}`) )} ${journals.map(({ slug, date }) => getSitemapRoute(`${PATHS.JOURNAL}/${date}/${slug}`) )} </urlset>`); }
I have one set of static URLs stored in the
PATHS
constant and two set of dynamic routes - one for my journal and one for my blog. I want the sitemap to update automatically when I publish a new blog or journal.Next I need to point the route of
/sitemap.xml
to the new endpoint I’ve created. I can do this in the next.config.js
:/** @type {import('next').NextConfig} */ const nextConfig = { ... async rewrites() { return [ { source: "/sitemap.xml", destination: "/api/sitemap", }, ]; }, }; module.exports = nextConfig;
3. Google Search Console
Now I need to tell Google to index my site.
Go to the Google Search Console. I signed in with the same Google account with which I had the domain registered (I bought through Google).
Type in the link to the link to the location of the sitemap (https://jameshw.dev/sitemap.xml), and submit.
Click through to check that the sitemap was parsed successfully - initially my date formats were invalid.
After a few days, Google will have indexed the site!
The Result
Now when a bot crawls my website, they go to
/robots.txt
and find:Which tells them where to find the sitemap.
Now navigating to
/sitemap.xml
gives: