
Websites contain pages with information that they don’t want to be indexed by Google. There are several ways to prevent Google’s search bots from indexing individual pages. In the past, one of the most popular ways to do tell Google not to index something was to include a line in the robots.txt file. Google recently announced that it would end support for noindex in robots.txt files, and publishers that are affected by the change have until September to prepare.
Much to the surprise of SEO marketers and website owners, Google published a Tweet last week stating they would stop obeying unsupported rules in robot.txt which include the noindex indexing directive.
According to media reports, the Tweet stated, “Today, we’re saying goodbye to undocumented and unsupported rules in robots.txt… If you were relying on these rules, learn about your options in our blog post.”
A robots.txt file contains information that the website publisher wants search engine crawlers to know. These files work well enough, but the protocols are more loosely defined than other systems of relaying information to search bots. An example can be seen in the noindex command that people have used in robots.txt files. It wasn’t an officially supported use of a robotos.txt file, but Google obeyed the command anyway.
Last week, Google announced that they would stop obeying “unsupported rules in robots.txt.” The company plans to make Google’s production robots.txt parser open source. Part of their preparation for this switch is to try and standardize the way people uses robots.txt files before becoming open source.
In the post, the company wrote, “In the interest of maintaining a healthy ecosystem and preparing for potential future open source releases, we’re retiring all code that handles unsupported and unpublished rules (such as noindex) on September 1, 2019.”
The end of noindex support for robots.txt doesn’t mean that there’s no way to tell Google and other search engines not to index a page for search results. On Google’s official blog for webmasters, there are five ways listed to control indexing. Websites affected by Google’s new policy can switch to one of these alternative methods.
Two methods are similar to way using noindex worked on robots.txt files. First, you can use the noindex marker in the correct place. A Googlebot will obey a noindex line they find in the meta tags of a page. And if the website really wants to continue using the robots.txt file to control indexing, they should use the disallow rule.
Website publishers can use the 404 and 410 HTTP status codes to achieve the same results. Both of these codes tell Google that a page doesn’t exist, so whatever the bot sees won’t be used.
Also, since a Googlebot can’t see password protected content, that is another way to quickly prevent Google, or any unauthorized person, from seeing the content you don’t want to be indexed. Finally, webmasters can use the Search Console Remove URL tool to prevent individual pages on the site from showing up in search.
Websites only have a couple of months to get everything in order. If you’re not sure what method your site uses to control indexing, make sure to have some check the site to ensure you don’t have unwanted pages showing up in search.
For more recent news about changes and updates to Google, read this article on Google’s plan for 3D advertising assets.