June 10, 2026

SEO infrastructure in Umbraco: sitemap, robots.txt and legacy redirects

A pragmatic Umbraco SEO infrastructure setup for sitemap.xml, robots.txt, legacy redirects, and migration-friendly routing.

Not every SEO improvement is a title tag, a schema.org block or a content optimization.

Some of the most useful SEO work is infrastructure:

  • a sitemap that reflects the current content tree;
  • a robots.txt file that points crawlers to the right sitemap and hides backoffice paths;
  • a redirect layer for old URLs after migrations, rewrites or editorial reorganizations.

These pieces are not glamorous. They are also easy to forget because they sit below the visible page.

But on an Umbraco site, they are worth treating as first-class application code.

This article walks through a pragmatic setup based on a real Umbraco project:

  • dynamic sitemap.xml generated from published content;
  • robots.txt served by middleware;
  • legacy redirects managed from Umbraco content and cached in memory;
  • a small rewrite that maps /sitemap.xml to an Umbraco-rendered sitemap page.

The examples are simplified and use English property names, but the structure is the same as the production code.

The shape of the solution

The project has three small pieces:

Controllers/SitemapController.cs
Services/SitemapService.cs
Views/Sitemap/sitemapxml.cshtml

Middlewares/RobotsTxtMiddleware.cs
Middlewares/LegacyRedirectMiddleware.cs

And they are wired into the ASP.NET Core pipeline:

app.UseMiddleware<LegacyRedirectMiddleware>();
app.UseMiddleware<RobotsTxtMiddleware>();

app.UseRewriter(new RewriteOptions()
    .AddRewrite(@"^sitemap.xml$", "sitemapxml", skipRemainingRules: true));

The order is intentional.

Legacy redirects run early because old URLs should be handled before the request reaches Umbraco routing. robots.txt is also handled before the normal page pipeline because it is not a content page. The sitemap is still rendered by Umbraco, but /sitemap.xml is rewritten to the internal sitemap route.

Dynamic sitemap.xml

The sitemap is generated from the current published content tree.

The controller is deliberately thin:

public class SitemapController(
    ILogger<RenderController> logger,
    ICompositeViewEngine compositeViewEngine,
    IUmbracoContextAccessor umbracoContextAccessor,
    ISitemapService sitemapService)
    : RenderController(logger, compositeViewEngine, umbracoContextAccessor)
{
    public override IActionResult Index()
    {
        var culture = CurrentPage?.Cultures.FirstOrDefault().Key ?? "";
        var root = CurrentPage!.Root();
        var host = Request.Host.Value;
        var hostWithScheme = $"https://{host}";

        var elements = sitemapService.GetSitemapElements(root, culture, hostWithScheme);

        return View("~/Views/Sitemap/sitemapxml.cshtml", elements);
    }
}

The controller asks the service for sitemap entries and passes them to a Razor view that renders XML.

The sitemap entry model is tiny:

public record SitemapElement(
    string Loc,
    string LastMod,
    string? ChangeFreq,
    string? Priority);

That is enough for a standard XML sitemap:

<url>
  <loc>https://www.example.com/my-page/</loc>
  <lastmod>2026-06-07T10:30:00+00:00</lastmod>
  <changefreq>monthly</changefreq>
  <priority>0.5</priority>
</url>

Let editors control sitemap inclusion

The sitemap service walks the tree, but it does not blindly include everything.

Each page can expose a boolean field such as:

showInSitemap

The service checks that field before producing an entry:

private static SitemapElement? GetSiteMapUrlEntry(
    IPublishedContent node,
    string currentCulture,
    string hostWithScheme)
{
    if (node.HasProperty("showInSitemap") && node.Value<bool>("showInSitemap"))
    {
        var changeFreq = node.Value(
            "searchEngineChangeFrequency",
            fallback: Fallback.To(Fallback.Ancestors, Fallback.DefaultValue),
            defaultValue: "monthly");

        var priority = node.HasValue("searchEngineRelativePriority")
            ? node.Value<string>("searchEngineRelativePriority")
            : "0.5";

        var url = node.Url(mode: UrlMode.Absolute, culture: currentCulture);

        if (url.StartsWith("/"))
        {
            url = hostWithScheme + url;
        }

        if (url != "#")
        {
            return new SitemapElement(
                url,
                $"{node.UpdateDate:s}+00:00",
                changeFreq,
                priority);
        }
    }

    return null;
}

There are a few useful details here.

showInSitemap makes inclusion an editorial decision. The sitemap is not just “every page Umbraco can see”.

searchEngineChangeFrequency can fall back to ancestor values. That lets editors set a default on a parent node and override only where necessary.

searchEngineRelativePriority is page-specific. In this project it does not fall back to ancestors because relative priority is meant to describe this specific page.

The service also protects against non-routable pages by ignoring #.

Recursing through children

The service starts from the site root and walks children:

public List<SitemapElement?> GetSitemapElements(
    IPublishedContent siteHomePage,
    string culture,
    string hostWithScheme)
{
    var response = new List<SitemapElement?>
    {
        GetSiteMapUrlEntry(siteHomePage, culture, hostWithScheme)
    };

    response.AddRange(GetSiteMapUrlEntriesForChildren(
        siteHomePage,
        maxSiteMapDepth: 100,
        culture,
        hostWithScheme));

    return response.Where(entry => entry is not null).ToList();
}

The child traversal can also respect a field such as:

hideFromXmlSitemap

Example:

private static List<SitemapElement?> GetSiteMapUrlEntriesForChildren(
    IPublishedContent parentPage,
    int maxSiteMapDepth,
    string currentCulture,
    string hostWithScheme)
{
    var response = new List<SitemapElement?>();

    foreach (var page in parentPage.Children())
    {
        response.Add(GetSiteMapUrlEntry(page, currentCulture, hostWithScheme));

        if (page.Level < maxSiteMapDepth &&
            page.Children().Any(child => !child.Value<bool>("hideFromXmlSitemap")))
        {
            response.AddRange(GetSiteMapUrlEntriesForChildren(
                page,
                maxSiteMapDepth,
                currentCulture,
                hostWithScheme));
        }
    }

    return response;
}

In a large site I would revisit this and add caching or an index. For a small or medium editorial site, this is often enough and has the advantage of being easy to reason about.

Rendering XML with Razor

The sitemap view is intentionally boring:

@inherits UmbracoViewPage<List<SitemapElement>>
@{
    Layout = null;
    Context.Response.ContentType = "text/xml";
}
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    @foreach (var element in Model)
    {
        <url>
            <loc>@(element.Loc)</loc>
            <lastmod>@(element.LastMod)</lastmod>
            <changefreq>@(element.ChangeFreq)</changefreq>
            <priority>@(element.Priority)</priority>
        </url>
    }
</urlset>

This is not a normal page view. It has no layout and sets the response content type to XML.

Then the rewrite rule exposes it at the URL crawlers expect:

app.UseRewriter(new RewriteOptions()
    .AddRewrite(@"^sitemap.xml$", "sitemapxml", skipRemainingRules: true));

So the Umbraco content route can be something internal like /sitemapxml, while the public URL remains:

/sitemap.xml

robots.txt as middleware

The project serves robots.txt with middleware instead of a static file.

public class RobotsTxtMiddleware : IMiddleware
{
    public async Task InvokeAsync(HttpContext context, RequestDelegate next)
    {
        if (context.Request.Path.Value == "/robots.txt")
        {
            var host = context.Request.Host.Value;
            var hostWithScheme = $"https://{host}";

            context.Response.ContentType = "text/plain";
            await context.Response.WriteAsync(BuildResponseText(hostWithScheme));

            return;
        }

        await next(context);
    }

    private static string BuildResponseText(string host)
    {
        return $"""
                Sitemap: {host}/sitemap.xml
                User-agent: *
                Disallow: /error.html
                Disallow: /app_data/
                Disallow: /app_plugins/
                Disallow: /bin/
                Disallow: /config/
                Disallow: /data/
                Disallow: /umbraco/
                Disallow: /views/
                """;
    }
}

The main benefit is that the sitemap host is generated at runtime:

Sitemap: https://www.example.com/sitemap.xml

This is useful when the same application can run on different hosts or environments.

If your production site sits behind a reverse proxy, make sure forwarded headers are configured before this middleware. Otherwise Request.Host and scheme-related values can reflect the internal proxy request instead of the public URL.

In this project, forwarded headers are configured during application startup before project middleware runs.

Testing robots.txt

robots.txt is small enough to test directly.

The test checks two things:

  1. /robots.txt writes the expected response and does not call the next middleware.
  2. any other path continues through the pipeline.

Example:

[Fact]
public async Task InvokeAsync_ForRobotsPath_WritesRobotsFileAndSkipsNext()
{
    var middleware = new RobotsTxtMiddleware();
    var context = new DefaultHttpContext();
    context.Request.Path = "/robots.txt";
    context.Request.Host = new HostString("example.com");
    context.Response.Body = new MemoryStream();
    var nextCalled = false;

    await middleware.InvokeAsync(context, _ =>
    {
        nextCalled = true;
        return Task.CompletedTask;
    });

    context.Response.Body.Position = 0;
    using var reader = new StreamReader(context.Response.Body);
    var body = await reader.ReadToEndAsync();

    nextCalled.Should().BeFalse();
    context.Response.ContentType.Should().Be("text/plain");
    body.Should().Contain("Sitemap: https://example.com/sitemap.xml");
    body.Should().Contain("User-agent: *");
    body.Should().Contain("Disallow: /umbraco/");
}

This is not a complicated test, but it prevents accidental regressions in a file search engines will actually request.

Legacy redirects from Umbraco content

The third piece is a redirect middleware.

During migrations or URL cleanups, there are often old URLs that must keep working. You can hardcode them, use a package, or put them in server configuration.

For this project, I wanted editors to maintain a simple redirect map in Umbraco content.

The field stores one redirect per line:

https://www.example.com/old-page;https://www.example.com/new-page
https://www.example.com/old-section/old-post;https://www.example.com/new-post

The middleware reads that field, normalizes the URLs and redirects matching requests:

public class LegacyRedirectMiddleware(
    IMemoryCache cache,
    ILoggerFactory loggerFactory,
    IVariationContextAccessor variationContextAccessor,
    IUmbracoContextFactory umbracoContextFactory)
    : IMiddleware
{
    private readonly ILogger _logger = loggerFactory.CreateLogger<LegacyRedirectMiddleware>();

    public async Task InvokeAsync(HttpContext context, RequestDelegate next)
    {
        var redirects = GetRedirectDictionary(context);

        if (redirects.Count > 0)
        {
            var currentUrl = CleanUrl(context.Request.GetDisplayUrl().Split('?')[0]);

            if (redirects.TryGetValue(currentUrl, out var redirectUrl))
            {
                _logger.LogInformation(
                    "Redirecting {CurrentUrl} to {RedirectUrl}",
                    currentUrl,
                    redirectUrl);

                context.Response.Redirect(redirectUrl, permanent: true);
                return;
            }
        }

        await next(context);
    }
}

The redirect dictionary is cached:

private Dictionary<string, string> GetRedirectDictionary(HttpContext context)
{
    const string key = "LegacyRedirectDictionary";

    if (cache.TryGetValue<Dictionary<string, string>>(key, out var cachedValue) &&
        cachedValue is not null)
    {
        return cachedValue;
    }

    variationContextAccessor.VariationContext ??= new VariationContext("en-US");
    umbracoContextFactory.EnsureUmbracoContext();

    var configurationService = context.RequestServices.GetRequiredService<IConfigurationService>();

    var redirects = configurationService.GetCurrentConfiguration().RedirectMap?
        .Split(Environment.NewLine)
        .Where(line => !string.IsNullOrWhiteSpace(line))
        .Select(line => line.Split(';'))
        .Where(parts => parts.Length == 2)
        .GroupBy(parts => CleanUrl(parts[0].Trim()))
        .ToDictionary(
            group => group.Key,
            group => CleanUrl(group.First()[1].Trim()))
        ?? new Dictionary<string, string>();

    return cache.Set(key, redirects, TimeSpan.FromMinutes(30));
}

This is deliberately simple. Editors can paste a spreadsheet-like list into a field, and the site turns it into permanent redirects.

The URL normalization avoids common mismatches:

private static string CleanUrl(string url)
{
    if (string.IsNullOrWhiteSpace(url))
    {
        return string.Empty;
    }

    url = url.ToLower();

    if (!url.EndsWith('/'))
    {
        url = $"{url}/";
    }

    url = url.Replace("http://", "https://");

    return url;
}

So these can be treated as the same source URL:

http://www.example.com/Old-Page
https://www.example.com/old-page/

Why not just use IIS, Nginx or Cloudflare redirects?

You can.

For high-volume, infrastructure-level redirects, server or CDN rules may be a better fit.

But there are cases where Umbraco-managed redirects are useful:

  • editors need to add or fix redirects without a deployment;
  • redirects are part of a content migration;
  • the redirect list is small or medium-sized;
  • the site already has a configuration document for SEO/editorial settings;
  • you want the redirects versioned with content operations rather than app configuration.

The tradeoff is performance and governance.

The middleware uses a 30-minute memory cache, so it does not read Umbraco content on every request. But it still runs in the application pipeline. If you have thousands of redirects or very high traffic, use a stronger lookup strategy or move the redirects closer to the edge.

A few practical details

There are a few implementation details worth calling out.

First, decide whether query strings are part of redirect matching.

This implementation removes them:

context.Request.GetDisplayUrl().Split('?')[0]

That means:

/old-page?utm_source=newsletter

matches the same redirect as:

/old-page

For most legacy URL redirects that is what I want.

Second, decide what to do with duplicate source URLs.

This implementation groups by normalized source URL and keeps the first value:

.GroupBy(parts => CleanUrl(parts[0].Trim()))
.ToDictionary(
    group => group.Key,
    group => CleanUrl(group.First()[1].Trim()))

That avoids throwing at runtime, but it should probably be paired with an editorial validation report if redirects become business-critical.

Third, be careful with culture.

The code sets a default variation context before reading configuration:

variationContextAccessor.VariationContext ??= new VariationContext("en-US");

In a multilingual site, you may want redirect maps per culture or per domain instead of one global map.

What I would monitor

For this kind of SEO infrastructure, I would monitor:

  • requests to /robots.txt;
  • requests to /sitemap.xml;
  • sitemap entry count;
  • sitemap generation errors;
  • redirect hits by source URL;
  • redirect loops;
  • 404s after migrations;
  • old URLs that continue receiving traffic.

Logging redirect hits is useful during migrations:

_logger.LogInformation(
    "Redirecting {CurrentUrl} to {RedirectUrl}",
    currentUrl,
    redirectUrl);

You do not need to log this forever at high volume, but during a migration it can show which old URLs still matter.

What I would improve next

The current setup is intentionally pragmatic. It works well for a small or medium content site.

If the site grows, I would improve it in a few places:

  • cache the generated sitemap, invalidating it on publish/unpublish;
  • validate redirect rows in the backoffice;
  • detect duplicate redirect sources;
  • detect redirect loops;
  • support domain-specific redirect maps;
  • use DateTimeOffset for sitemap lastmod;
  • make the sitemap scheme rely on forwarded headers instead of hardcoding https;
  • add integration tests for /sitemap.xml and redirect behavior.

Those are refinements, not a different architecture.

The important part is that sitemap, robots and redirects are explicit application features instead of forgotten files or one-off server rules.

Final thought

SEO infrastructure is usually small code.

But small code can carry a lot of operational weight.

In this setup:

  • Umbraco content decides what belongs in the sitemap;
  • middleware serves a predictable robots.txt;
  • editors can maintain legacy redirects without a deployment;
  • the application keeps the behavior testable and visible.

That is the kind of SEO work that rarely gets attention, but it prevents the boring problems that cost search visibility after a migration.