How to prevent Google from referencing a web page?

Yes, you read correctly. How to prevent Google from referencing a web page? This question may confuse you. Even more so if we are the ones asking it, Redacteur.com, a content writing platform whose mission is to guarantee you quality, SEO-optimized texts for… good SEO for your web pages!

SEO is the key to getting a website to appear at the top of search results. But sometimes it may be necessary to block certain content from being indexed.

In what cases? And how to do it? We tell you everything about the tips you need to know to prevent Google from referencing a web page and the mistakes you should avoid making. Because our objective above all is that you have a high-performance and efficient website.

How to block access to Google's pages on your site?

There are several methods to block access to pages by Googlebot and other robots. What are they and what are their limits?

Use the robots.txt file

The robots.txt file of a website is used to guide search engine robots. The instructions included in this file tell them whether the pages of a website should be crawled or not. To specify that a page should not be indexed, you must use the “disallow” statement .

The robots.txt file has limitations, however. It is especially useful to avoid overloading your server with search engine queries. But it does not 100% prevent a page from being indexed. The instructions transmitted in the robots.txt file are interpreted by the robots as instructions and not mandatory commands . Some bots may not follow them.

Additionally, your URL may be referenced elsewhere on the internet . If search engines detect it, they index it. If you want your page to no longer appear in search results or protect sensitive information, you should choose other, more effective methods.

Use the noindex meta tag

Meta tags are HTML elements that provide details on web pages. They are included in the “head” section of the HTML page.

The “noindex” meta tag tells Google not to reference the page. This solution does not require any particular technical skills. You need to add the following line of code :

<meta name="robots" content="noindex">

For this instruction to be understood, the page must not be blocked by a robots. txt. You can speed up the deindexing process from Search Console by sending a crawl request to Google.

But be careful, this method also has limits: the “noindex” meta tag does not prevent robots from exploring the page . It only prohibits them from including it in search results.

Other meta tags can be integrated to increase the effectiveness of deindexing: “noimageindex” indicates not to explore images, “noarchive” asks robots not to keep pages in archives and “nosnippet” prevents display of the meta description.

WordPress users who do not have access to the code of their pages must use a plugin, such as the Yoast plugin , to deindex their content. Simply answer “No” to the question “Allow search engines to display content in search results?” ".

Protect access to certain pages with htaccess

To prevent Google from accessing and referencing pages, blocking the display of URLs via passwords is the most effective method.

You must block access to the pages in question , by limiting their display using a password, for example using the htaccess file.

The option is to edit the .htaccess configuration file. Used by Apache servers, this file is used to apply rules to directories. For example, it allows you to protect content using passwords. It is also essential for redirecting quality backlinks from an old page to a new one. However, it is a sensitive file to handle. An error can make the entire website inaccessible.

To facilitate access and navigation for your Internet users and limit the risks of mishandling with the .htaccess file, you can also create a private space , accessible by username and password.

Is it possible to make pages completely inaccessible?

The most radical solution to make your web pages inaccessible on Google is to… delete them. This process can be lengthy. This is not just about removing the page from your website, but about removing its URL from search engines. If you only delete the page from your site, you will cause a 404 error that is not appreciated by Googlebot and other robots. When you delete a page, you must redirect its URL by doing a 301 redirect. Please note, if the page disappears from Google, it will still be potentially accessible in the archive.

Google offers a URL removal tool to help deindex pages. However, the blockage is only temporary, limited to 180 days.

To permanently remove a URL, you must use the Obsolete Content Removal Tool. Before completing the application form, you must ensure that you have completed one of the following actions:

  • delete the page from your website
  • or block access to its content via a password or use the “noindex” meta tag.

But above all the page must not have been blocked using the robots.txt file method.

If the request is successful, your page will be permanently removed from Google.

Mistakes to avoid when you want to prevent Google from referencing pages

You have followed the rules to deindex your web page. However, it continues to appear in Google search results. For many content creators, it’s a dream. For you, it's a nightmare. Remember, no method is 100% effective. But maybe you made a mistake?

Forgetting to remove links that point to the deindexed page

The “link juice” – or “link juice” in French - can be the cause of the continued referencing of your page that you wish to deindex. In digital marketing, the more relevant and reliable a page is, the more “juice” it has. And the more “juice” it has, the more it appeals to search engines.

However, a well-referenced page shares its “juice” with the other pages with which it is linked via hypertext links. The algorithm indeed considers that a page recommended by a quality page is necessarily relevant.

And this is where all your deindexing work can be ruined if you failed to remove internal links that point to the page you want to deindex. Because the latter will continue to benefit from the quality “juice” of the pages with which it is linked and therefore… to be referenced by search engines.

Do you want to prevent Googlebot and other robots from indexing your page? You should try to remove all internal links from your website that point to the page to be deindexed.

The ideal would also be to be able to identify backlinks that point to your page and request their removal. But it can be more difficult.

If, for navigation reasons, you wish to deindex a page but keep the links that point to it (as for a legal notice page for example), you can create a link by giving it the “nofollow” attribute to limit the transmission of SEO juice.

Forgetting to remove redirects from the deindexed page

Does your deindexed page contain links pointing to other internal and external content? The principle of “link juice” acts in the same way. The original page benefits from the “juice” of the pages to which it points. And often, you didn't choose your links by chance. These point to quality pages or recognized authority sites with high “juice”. So, if you have neglected to remove links from your deindexed content, it may continue to be referenced by Google via the link destination pages.

To keep your deindexed page as discreet as possible on the web, you must remember to delete all the links it contains pointing to other content.

If you do not want to remove links that point to the page, you can add the “nofollow” attribute to your link to indicate to the search engine that you do not want to give “SEO juice” to the target page.

Wrong page

Don't go to the wrong page when you act on the robots.txt file or via the meta tags: elementary, you will tell us. However, when the content of a website is large, it can be easy to get lost in the directories . And inserting a “noindex” meta tag in the “head” of your home page would be detrimental to the SEO of your website. A little reminder of common sense helps avoid mistakes.

There are several methods to prevent Google from referencing a page. But not all methods are necessarily 100% effective. Blocking or completely removing a URL is never guaranteed. Each method has its limitations and errors and oversights are inevitable, so take your time when you want to get pages deindexed from Google.

Why do you want to prevent Google from referencing a page?

Block poor quality pages

Old pages on your website may be outdated, or have lost their importance, or may offer content similar to that of other pages.

These pages no longer correspond to the image you wish to give of yourself and your activity. They harm the user experience and are penalized by Google. By blocking certain pages, you direct your visitors and search engines to your high value-added content.

Guarantee data confidentiality

You may wish to limit access to certain content.

This is the case if you offer privileged content to certain customers who have subscribed to a premium plan.

This is also the case if you use your website to communicate with your partners in private spaces.

Finally, this is the case for form submission pages through which Internet users send you personal information.

Manage crawl traffic (crawl budget)

Blocking Google from referencing web pages helps manage crawl traffic . This can prevent your server from being overwhelmed by multiple queries from search engine robots as well as prevent robots from indexing unnecessary content.

This is all the more important since the Google Helpful Content update which penalizes so-called “zombie” pages , these pages which do not provide added value to Internet users.

Our tip for not having to deindex your internet pages

There is another way to compensate for obsolete content with low added value: update the texts, strengthen their quality and optimize their SEO.

Reworking or replacing existing content strengthens the performance and quality of your website while avoiding the tedious work of deindexing web pages.


Post a Comment

0 Comments

Close Menu