Structured Data in Umbraco - Stefano Maffeis

Structured data is one of those SEO tasks that often starts as a small snippet in a template and quietly becomes technical debt.

That is what happened in one of my Umbraco projects.

At first the site only needed a bit of JSON-LD for blog posts and service pages. A couple of Razor partials looked enough. The content model already had the fields: title, meta description, Open Graph image, author, tags, categories. So the first implementation was straightforward: render a <script type="application/ld+json"> block directly from the Razor view.

It worked.

Until it didn’t.

After a while, the site had more content, more custom editorial fields, more SEO-specific data, and a few handcrafted JSON-LD blocks added from the backoffice. At that point the question was no longer “can we output schema.org markup?”. The real question became:

Can we trust the structured data we are publishing?

This article is a case study of how I cleaned that up in an Umbraco 17 project:

replacing hand-built JSON strings with serialized JSON-LD objects;
moving blog posts from generic Article to BlogPosting;
adding BreadcrumbList, WebSite and WebPage markup;
avoiding duplicated Service entities;
separating code problems from editorial content problems;
validating the result by crawling the local site.

The examples are from a real project, but the same approach applies to most Umbraco sites.

The starting point

The project already had JSON-LD partials under:

Views/StructuredData/

The initial implementation emitted structured data from Razor views such as:

Views/StructuredData/BlogPost.cshtml
Views/StructuredData/GenericPage.cshtml

The idea was reasonable: each page template includes the structured data partial that belongs to that page type.

For example:

<partial name="StructuredData/BlogPost" />

The problem was how the JSON was generated.

The old version built JSON directly in the view by mixing Razor interpolation and JSON syntax. That is tempting because it looks simple:

{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "@Model.Title",
  "description": "@Model.MetaDescription"
}

But this is fragile.

If an editor enters quotes, newlines, HTML fragments, unexpected empty values or text copied from a rich text editor, the generated JSON can become invalid or semantically dirty. The page can still render fine, but the structured data can silently degrade.

That was the first lesson: JSON-LD is not HTML. It should be generated as JSON, not assembled as a string.

The audit

Before changing code, I did a quick audit of the structured data already emitted by the site.

The important questions were:

Which templates emit JSON-LD?
Which schema.org types are used?
Are the properties supported by the visible content?
Are there empty values?
Are there hardcoded values?
Are dates valid ISO 8601 values?
Are image URLs absolute and representative?
Is the same entity described multiple times in conflicting ways?
Is any JSON-LD coming from free editorial HTML rather than code?

The last point turned out to be important.

Some structured data was generated by Razor. Some other structured data was stored by editors inside custom HTML fields. Those two sources need different fixes. If a page has invalid JSON-LD because of a broken custom HTML block, hiding it with more Razor code is the wrong fix. The content has to be corrected at the source.

The audit found several typical issues.

Problem 1: JSON-LD built by hand

The old Razor partials were building JSON with interpolation.

That creates three classes of bugs:

JSON syntax bugs, such as unescaped quotes or line breaks;
dirty values, such as empty strings or HTML fragments;
semantic bugs, such as objects that exist only because the template always prints them.

For example, an empty publisher is worse than no publisher:

{
  "publisher": {
    "@type": "Organization",
    "name": "",
    "logo": {
      "url": ""
    }
  }
}

This says: “I know the publisher and the logo, but they are empty.”

That is not useful structured data. It is noise.

Problem 2: `Article` was too generic

The blog posts were marked as:

{
  "@type": "Article"
}

Google supports Article, NewsArticle and BlogPosting for article pages. For a normal blog post, BlogPosting is the more precise type.

The Google documentation for article structured data is here:

https://developers.google.com/search/docs/appearance/structured-data/article

So the post schema changed from:

"@type": "Article"

to:

"@type": "BlogPosting"

This is not a magic ranking trick. It is just a better description of the content.

Problem 3: hardcoded author data

The old JSON-LD used a static author name.

The page model already had an author relation, and the page itself rendered an author box. That means the structured data should follow the content, not a hardcoded fallback.

The new logic became:

Use the related author when present.
Build the name from author fields.
Fall back only when no author is available.

In Razor:

var author = Model.Author as Author;
var authorName = author is not null
    ? string.Join(" ", new[] { author.FirstName, author.LastName }.Where(x => !string.IsNullOrWhiteSpace(x)))
    : "Default Author";

The schema then uses that value:

("author", JsonLdSerializer.Object(
    ("@type", "Person"),
    ("name", authorName)
))

The rule is simple: structured data should describe what the page actually says.

Problem 4: dates without timezone

The previous markup emitted dates in a plain date format.

For article markup, dates are more useful when they are ISO 8601 values with timezone information. The implementation now serializes DateTimeOffset values:

("datePublished", new DateTimeOffset(Model.PublishDate)),
("dateModified", new DateTimeOffset(Model.UpdateDate))

The serializer outputs values like:

2026-05-31T12:30:00.0000000+02:00

Problem 5: service markup on every generic page

This was the most interesting modeling issue.

The old generic page schema emitted Service markup for every generic page.

That sounds useful for an agency or professional website, but it is only correct if every generic page really is a service page. In the real site, generic pages included different kinds of content: service pages, informational pages, landing pages and utility pages.

So automatic Service markup was removed.

Some service pages already had rich editorial JSON-LD stored in content fields. Generating another Service entity in Razor would have created duplicates or conflicting information.

The better solution was:

add generic, non-conflicting markup in code;
keep specific service entities where the content explicitly defines them;
fix bad editorial JSON-LD in the backoffice instead of masking it in templates.

A small JSON-LD serializer

The main refactor was to stop writing JSON as text.

I added a small helper:

using System.Collections;
using System.Text.Json;
using System.Text.Json.Serialization;

namespace MyProject.Services.StructuredData;

public static class JsonLdSerializer
{
    private static readonly JsonSerializerOptions Options = new()
    {
        DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull
    };

    public static Dictionary<string, object?> Object(params (string Key, object? Value)[] properties)
    {
        return properties.ToDictionary(property => property.Key, property => property.Value);
    }

    public static string Serialize(Dictionary<string, object?> graph)
    {
        return JsonSerializer.Serialize(CleanValue(graph), Options);
    }

    private static object? CleanValue(object? value)
    {
        return value switch
        {
            null => null,
            string text => string.IsNullOrWhiteSpace(text) ? null : text.Trim(),
            DateTime dateTime => dateTime.ToString("O"),
            DateTimeOffset dateTimeOffset => dateTimeOffset.ToString("O"),
            IDictionary<string, object?> dictionary => CleanDictionary(dictionary),
            IEnumerable enumerable when value is not string => CleanEnumerable(enumerable),
            _ => value
        };
    }
}

The helper does three things:

serializes JSON safely with System.Text.Json;
removes empty strings and empty nested objects;
formats dates consistently.

The full implementation also cleans dictionaries and arrays recursively.

The important part is not the helper itself. The important part is the boundary: Razor builds a structured object, and System.Text.Json renders JSON.

The new `BlogPosting` partial

The blog post partial now builds a graph object:

var schema = JsonLdSerializer.Object(
    ("@context", "https://schema.org"),
    ("@type", "BlogPosting"),
    ("mainEntityOfPage", JsonLdSerializer.Object(
        ("@type", "WebPage"),
        ("@id", canonicalUrl)
    )),
    ("headline", Model.Title),
    ("description", description),
    ("image", imageUrl),
    ("author", JsonLdSerializer.Object(
        ("@type", "Person"),
        ("name", authorName)
    )),
    ("publisher", JsonLdSerializer.Object(
        ("@id", primaryEntityId)
    )),
    ("datePublished", new DateTimeOffset(Model.PublishDate)),
    ("dateModified", new DateTimeOffset(Model.UpdateDate)),
    ("keywords", string.Join(", ", tags.Select(x => x.Title))),
    ("articleSection", string.Join(", ", categories.Select(x => x.Title))),
    ("url", canonicalUrl)
);

Then the view prints it:

@Html.Raw("<script type=\"application/ld+json\">")
@Html.Raw(JsonLdSerializer.Serialize(schema))
@Html.Raw("</script>")

The image selection also became explicit:

use the blog post preview image;
otherwise use the Open Graph image;
otherwise use a site fallback.

var schemaImage = Model.PreviewImage ?? Model.OpenGraphImage;
var imageUrl = schemaImage is not null
    ? ToAbsoluteUrl(ComputedImageService.Get(schemaImage.Id).Url)
    : ToAbsoluteUrl("/content/images/default-og-image.png");

Again, this is not about adding more fields. It is about making the data truthful and predictable.

Adding breadcrumbs globally

The next improvement was BreadcrumbList.

Google’s documentation for breadcrumb structured data is here:

https://developers.google.com/search/docs/appearance/structured-data/breadcrumb

In Umbraco, the breadcrumb can be derived from the content tree:

var pages = Model
    .AncestorsOrSelf()
    .OrderBy(page => page.Level)
    .Select(page => new
    {
        Name = page.Name,
        Url = ToAbsoluteUrl(page.Url())
    })
    .Where(page => !string.IsNullOrWhiteSpace(page.Name) && !string.IsNullOrWhiteSpace(page.Url))
    .ToList();

Then map it to ListItem objects:

var itemListElements = pages
    .Select((page, index) => JsonLdSerializer.Object(
        ("@type", "ListItem"),
        ("position", index + 1),
        ("name", page.Name),
        ("item", page.Url)
    ))
    .ToList();

And emit it only when it makes sense:

@if (itemListElements.Count >= 2)
{
    @Html.Raw("<script type=\"application/ld+json\">")
    @Html.Raw(JsonLdSerializer.Serialize(schema))
    @Html.Raw("</script>")
}

The partial is included from the master template, so internal pages get consistent breadcrumbs without every template duplicating logic.

Adding homepage `WebSite` and `WebPage`

The homepage now emits a small graph:

var schema = JsonLdSerializer.Object(
    ("@context", "https://schema.org"),
    ("@graph", new[]
    {
        JsonLdSerializer.Object(
            ("@type", "WebSite"),
            ("@id", websiteId),
            ("url", siteUrl),
            ("name", "Example Site"),
            ("publisher", JsonLdSerializer.Object(
                ("@id", primaryEntityId)
            ))
        ),
        JsonLdSerializer.Object(
            ("@type", "WebPage"),
            ("@id", canonicalUrl),
            ("url", canonicalUrl),
            ("name", Model.SeoTitle),
            ("description", Model.MetaDescription),
            ("isPartOf", JsonLdSerializer.Object(
                ("@id", websiteId)
            )),
            ("about", JsonLdSerializer.Object(
                ("@id", primaryEntityId)
            ))
        )
    })
);

One deliberate choice was to reference the primary site entity by @id instead of redefining it.

In this project, some pages already had editorial JSON-LD for a ProfessionalService entity. Duplicating that entity from code would have made the graph noisier and harder to maintain.

So the generated code uses references:

{
  "@id": "https://example.com/"
}

rather than creating another Organization, Person or ProfessionalService automatically.

What about FAQ schema?

This is a good example of why structured data should not be treated as a checklist.

The site has FAQ-style content. It would be easy to add FAQPage.

But Google changed its FAQ rich result behavior. As of May 7, 2026, FAQ rich results no longer appear in Google Search, and Google is removing the related FAQ reporting and testing support.

The current FAQ structured data documentation is here:

https://developers.google.com/search/docs/appearance/structured-data/faqpage

That does not mean FAQPage is invalid schema.org markup. It means I would not prioritize it as a Google rich-result tactic.

My decision was:

do not add FAQ markup just because the content contains questions;
consider it only where it helps describe the page semantically;
do not sell it as a visual SERP enhancement.

Testing the serializer

Because the serializer became the boundary between Razor and JSON-LD, it deserved tests.

The tests cover the behavior that actually matters:

empty values are removed;
nested empty objects are removed;
text is escaped as valid JSON;
dates are serialized as ISO 8601;
@id references survive cleanup;
@graph arrays survive cleanup.

Example:

[Fact]
public void Serialize_RemovesEmptyValuesAndEmptyNestedObjects()
{
    var graph = JsonLdSerializer.Object(
        ("@context", "https://schema.org"),
        ("@type", "BlogPosting"),
        ("headline", "  A title  "),
        ("description", ""),
        ("publisher", JsonLdSerializer.Object(
            ("@type", "Organization"),
            ("name", "")
        )),
        ("keywords", new[] { "seo", "", "schema" })
    );

    var json = JsonLdSerializer.Serialize(graph);
    using var document = JsonDocument.Parse(json);
    var root = document.RootElement;

    root.GetProperty("@context").GetString().Should().Be("https://schema.org");
    root.GetProperty("headline").GetString().Should().Be("A title");
    root.TryGetProperty("description", out _).Should().BeFalse();
    root.TryGetProperty("publisher", out _).Should().BeFalse();
    root.GetProperty("keywords").EnumerateArray().Select(x => x.GetString()).Should().Equal("seo", "schema");
}

This is a small test, but it protects against the most common regression: accidentally reintroducing empty or invalid JSON-LD.

Crawling the site

Unit tests are useful, but they do not answer the whole question.

The real question is what the published pages emit.

So I crawled the local Umbraco site and extracted every <script type="application/ld+json"> block from the rendered HTML.

The crawl covered:

257 published documents read from the local Umbraco Management API;
199 resolved local URLs;
homepage;
blog listing;
146 blog posts;
category and tag pages;
32 generic pages.

The result was useful because it separated three categories:

Pages where generated Razor JSON-LD was correct.
Pages where no specific schema was needed.
Pages where editorial CustomHtml contained broken or stale JSON-LD.

That last category included real content problems:

empty @id values;
empty telephone fields;
placeholder logo URLs;
service URLs pointing to the wrong page;
one invalid JSON-LD block caused by an extra bracket/comma.

Those were not Razor bugs. They were editorial data bugs.

So I fixed them in the backoffice/API layer, then reran the focused validation.

That was the second lesson: validating structured data only at template level is not enough when editors can inject JSON-LD from content fields.

A practical checklist

This is the checklist I would use before adding or refactoring structured data in an Umbraco site.

List every template and partial that emits JSON-LD.
List every content field that can contain raw HTML or JSON-LD.
Prefer JSON serialization over string interpolation.
Remove empty values instead of emitting empty properties.
Do not emit a schema type unless the visible page content supports it.
Prefer specific types, such as BlogPosting, when they match the content.
Use stable @id values to connect entities.
Use references instead of redefining the same entity multiple times.
Make image URLs absolute.
Serialize dates as ISO 8601 values.
Add tests around the JSON generation boundary.
Crawl rendered pages, not only source templates.
Treat editorial JSON-LD as content that can go stale.

What changed in the project

The final structure is simple:

Services/StructuredData/JsonLdSerializer.cs
Views/StructuredData/BlogPost.cshtml
Views/StructuredData/Breadcrumb.cshtml
Views/StructuredData/Homepage.cshtml
tests/.../JsonLdSerializerTests.cs

The site now emits:

BlogPosting on blog posts;
BreadcrumbList on internal pages;
WebSite and WebPage on the homepage;
references to the primary site entity instead of duplicated publisher objects.

It no longer emits:

empty publisher objects;
automatic Service markup on every generic page;
hand-built JSON strings in Razor;
schema.org data that cannot be supported by page content.

Final thought

Structured data is not just a block of SEO markup.

It is part of the content model of the site.

In a CMS like Umbraco, that matters because data can come from Razor templates, generated models, relations, custom property editors, rich text fields, raw HTML blocks and editorial configuration.

The durable solution was not to add more schema. It was to make structured data boring:

generated through one safe serialization path;
tested where it can break;
validated against rendered pages;
corrected at the right layer when the problem is editorial.

That is less glamorous than chasing another rich result, but it is the kind of change that makes a site easier to trust.

References

Google Search Central, intro to structured data: https://developers.google.com/search/docs/appearance/structured-data/intro-structured-data
Google Search Central, Article structured data: https://developers.google.com/search/docs/appearance/structured-data/article
Google Search Central, Breadcrumb structured data: https://developers.google.com/search/docs/appearance/structured-data/breadcrumb
Google Search Central, FAQ structured data: https://developers.google.com/search/docs/appearance/structured-data/faqpage
schema.org, BlogPosting: https://schema.org/BlogPosting
schema.org, BreadcrumbList: https://schema.org/BreadcrumbList
schema.org, WebSite: https://schema.org/WebSite