Client WebAssembly Websites Are Held Back by Web Crawlers

😁
Opinion piece. Take it with a grain of salt. I'm not an expert in WebAssembly or crawlers, but this is my experience of them.

I love the idea of WebAssembly and how it lets me write C# in the browser via Blazor. More specifically: a client-side only Blazor SPA where you can get C# benefits running only in the client - being served from free static site hosts. Being able to do this client processing using NuGet packages is wild to me and it makes a lot of sense in a business sense with established customers using a Wasm platform.

Issues

However if you are using client Wasm for anything that needs to be crawled by a web crawler (GoogleBot/BingBot/etc) in order to be found/useful then you'll run into problems. These bots use the initial loaded HTML and some JavaScript processing to identify what to index on a page. This is where Wasm SPAs begin to fall behind. WebAssembly SPAs are dynamic by loading a simple page which then downloads the binaries to run the website. Often you might see this page as a one-off loading screen before getting to your desired page. They work fine for you and me - and other websites do loading before being able to use them, so what's the problem?

The problem is crawlers don't run WebAssembly, an official web standard. The crawler won't load in your binaries, meaning it will only see your loading screen. If you're a:

  • Blogger, none of your article content will be indexed
  • Tool creator, your website may marked as under construction
  • Sales website, no information about your product will be indexed

All of these are SEO problems. Sure you can add <meta> tags in your initial index.html for keywords, title, description, etc to get indexed, but it's the content people often look for and that's being missed. Google have a deprecated article on dynamic rendering. On top of SEO problems we can bring in Google AdSense which seems to use GoogleBot to determine whether a website is within AdSense terms.

A Wasm loading screen is not within AdSense terms.

Solutions?

I'll specifically be talking about Blazor, but I suspect these will be roughly applicable to other Wasm frameworks. To solve the crawler problems we can:

  1. Prerender at build
  2. Use a service to return your page prerendered to a crawler
  3. Enable prerendering
  4. Use a different technology

Let's quickly look through each option.

Prerender at build

This is my preferred option and I'll be talking about BlazorWasmPreRendering.Build by jsakamoto. The about describes this succinctly:

When you publish your Blazor Wasm app, this package pre-renders and saves the app as static HTML files in your public folder.

The user will still see a loading screen as the Wasm is downloaded, but the rendered HTML is actually sent too. We can easily see this by checking out the network inspector in the browser on the test website for BlazorWasmPreRendering.Build:

The comments such as <!–%%-PRERENDERING-BEGIN-%% --> show what is now present due to the prerendering build step. This helps us as the crawler can now see some information about the page.

Use a service

Luke Parker in the post Blazor WASM SEO - You have a broken website according to Google! talks about prerender.io which serves already prerendered content to known web crawlers. Read their post to learn more about it. Another writeup Blazor WebAssembly Client Side SEO Pre-rendering by Baskaran Govindaras explains their experience.

Enable prerendering

If you are in a position to move away from using a static site host via client-only Wasm, Blazor has an option to enable prerendering which will render the requested page server side and send that along with the initial request meaning your page will be as expected for a crawler. Note that it may perhaps not yet responsive for a user as the full load of the binaries still needs to happen. Read more on this post: Enabling prerendering for Blazor WebAssembly apps by Andrew Lock for more.

Use a different technology

If using prerendering at build, using a service to intercept crawlers, or moving to a server-based architecture doesn't work for you, then you might be out of options in respect to web crawlers and it might be time to move to a JavaScript SPA or something more standard/traditional such as usual ASP.NET Core for a .NET developer.

Still Not Ideal

I wish we didn't have to resort to workarounds. This is no shade to those who've developed the above solutions out of necessity - it's great we have options and they're useful for more than just this complaint I have, but I feel we shouldn't need workarounds in the first place. As WebAssembly moves forwards from 1.0, I'm hoping crawlers begin to adopt the standards as browsers have.

Header image pattern from https://heropatterns.com/