close
close

Google’s Gary Illyes continues to warn about URL parameter issues

Google’s Gary Illyes recently highlighted a recurring SEO problem on LinkedIn, echoing concerns he raised previously on the Google podcast.

The problem? URL parameters make it harder for search engines to index your web pages.

This problem is especially difficult for large websites and online stores. When different parameters are added to the URL, it can result in multiple unique web addresses that all lead to the same content.

This can hinder search engines, reducing their ability to crawl and index sites properly.

URL Parameter Puzzle

In both the podcast and the LinkedIn post, Illyes explains that URLs can contain an infinite number of parameters, each of which can create a distinct URL, even if they all point to the same content.

I’m writing:

“An interesting quirk of URLs is that you can add an infinite (I call it BS) number of URL parameters to the URL path, and in doing so essentially create new resources. The new URLs don’t even have to map to different content on the server, each new URL can simply serve the same content as the URL without parameters, but they are all distinct URLs. A good example of this is a cache-breaking URL parameter in JavaScript references: it doesn’t change the content, but forces a cache refresh.”

He gave an example showing how a simple URL such as “/path/file” can be expanded to “/path/file?param1=a“And”/path/file?param1=a¶m2=b“, all potentially offering identical content.

“Each one is a different URL, but they all have the same content,” Illyes noted.

Accidental URL Extension and Its Consequences

Sometimes search engines can find and attempt to crawl non-existent pages on your site, what Illyes calls “fake URLs.”

These can occur due to things like poorly coded relative links. What starts out as a normal sized site with around 1000 pages can grow into a million phantom URLs.

This explosion of fake sites can cause serious problems. Search engine robots can hit your servers hard as they try to crawl all these non-existent sites.

This can overwhelm your server’s resources and potentially cause your site to crash. It also wastes the search engine’s indexing budget on useless pages instead of your content.

Ultimately, your pages may not be crawled and indexed properly, which can negatively impact your position in search results.

Illyes states:

“Sometimes you can accidentally create these new fake URLs, blowing up your URL space from a nice 1000 URLs to a blazing million exciting robots that in turn suddenly attack your servers, melting pipes and whistles left and right. Bad relative links are one relatively common cause. But robotstxt is your friend in this case.”

E-commerce sites most affected

The LinkedIn post did not specifically mention online stores, but the podcast discussion revealed that this is a very important issue for e-commerce platforms.

These websites typically use URL parameters to track, filter, and sort products.

As a result, you may see several different URLs pointing to the same product page, with each URL variant showing available color options, sizes, or information about the customer’s location.

Mitigating the problem

Illyes always recommends using the robots.txt file to solve this problem.

In the podcast, Illyes pointed out possible solutions such as:

  • Creating systems that detect duplicate URLs
  • Better Ways for Site Owners to Tell Search Engines About URL Structure
  • Smarter use of robots.txt to guide search engine bots

Deprecated URL Parameters Tool

In the podcast discussion, Illyes touched on Google’s previous attempts to solve this problem, including the now-discontinued URL Parameters tool in Search Console.

This tool allowed websites to indicate which parameters were important and which could be ignored.

When asked on LinkedIn about the possibility of bringing the tool back, Illyes expressed skepticism about its practical effectiveness.

“Theoretically yes, in practice no,” he said, explaining that the tool suffered from the same problems as the robots.txt file, namely that “people couldn’t figure out how to manage their own parameters.”

Consequences for SEO and web development

This ongoing discussion at Google has several implications for SEO and website development:

  1. Indexing budget:For large sites, URL parameter management can help conserve crawl budget by ensuring important pages are indexed and crawled.
  2. Site Architecture:Developers may need to rethink how they structure URLs, especially for large e-commerce sites with many product variants.
  3. Faceted navigation:E-commerce sites using faceted navigation should be aware of how it affects URL structure and indexability.
  4. Canonical tags:Canonical tags help Google understand which version of a URL should be considered the primary.

Why is this important?

Google is discussing URL parameters across channels, indicating real concerns about search quality.

For industry experts, staying up to date with these technical aspects is crucial to maintaining visibility in search results.

While Google works on solutions, proactive URL management and effective crawler guidelines are recommended.