How to deindex a web page?

 Do you have low quality or useless web pages, which impact your crawl budget and harm your natural referencing? Are you looking for a clean and efficient method to deindex a web page  ?

Here are several tips for removing your web pages from search engine indexes!



Why deindex a web page?

While low-quality web pages harm the user experience, they also send a negative message to search engine algorithms. To compensate for this, certain pages of your website must no longer be referenced.

The problem with the Internet is that you can't delete a page URL with the snap of your fingers. You must follow a very specific process, starting with the deindexing of your page on search engines.

There are several scenarios that may cause you to need to deindex a web page  :

  • You have pages that contain internal duplicate content;
  • Some pages were indexed in error;
  • You have unnecessary pages that are harming your SEO;
  • Some pages pose a legal problem.

What is an indexed page?

For a web page to appear in search engine results, crawlers that visit and analyze websites must add the site page to their index.

Indexing of a page by crawlers is only possible if:

  • The page you wish to reference can be crawled by robots and therefore not blocked by the robots.txt file.
  • That it is indexable, that is to say that it meets all the technical indexing criteria.

But be careful: an indexable page is not necessarily indexed by Google! The search engine can perfectly decide not to add it to its index for X reasons (quality of the page to be reviewed, quality of the site as a whole, etc.).

Additionally, Google assigns a crawl budget to each site. This budget corresponds to the number of pages that the indexing robot (Googlebot for Google) will explore. It is defined based on several crawling criteria such as page loading speed, page depth, content quality, etc.

So, to increase your crawl budget and/or dereference unwanted pages, deindexing is sometimes necessary…


Robots.txt: to prevent indexing

The robots.txt file as its name suggests is only intended to give information about your site to crawlers. It is he who will tell them which web page to crawl or not. So, the robots.txt file will not help you deindex a web page, only to ensure that it is not indexed.

If you want to prevent new web pages or a new site from being indexed , here are the steps to follow:

  • Check the presence of the robots.txt file at the root of your server by typing your domain name followed by /robots.txt from your browser (normally it is automatically generated by your CMS when you create your site): https:// yoursite.com/robots.txt
  • If the file does not appear, you will need to create it. Warning: follow Google's instructions carefully when creating your robots.txt file .

Then, three commands are available:

  • User-agent to designate which crawler has permission or not to browse your website.
  • Allow to allow crawling of the page.
  • Disallow to prohibit it.

Once your file is finished, upload it to your website. Importing your file will depend on your server and the architecture of your website. If you encounter any difficulty, consult your host's documentation or contact them.

When your file is uploaded, test it to make sure it's publicly accessible and properly tagged using the Search Console testing tool .

The “noindex” tag: Google’s preferred method

The “noindex” meta tag is the best way to prevent a web page from showing up in Google search results. When Googlebot returns to crawl your site, the “noindex” tag will tell it not to index or deindex a web page.

Not only is it the preferred method of Google, webmasters and developers , but it is also the easiest to implement, because it does not require any particular technical knowledge.

  • On the page or group of HTML pages that you want to exclude from the SERP, add the following code in the <head> of your site: <meta name=”robots” content=”noindex”>.
  • If you don't want crawlers to not index your page and follow your links, add this directive: <meta name=”robots” content=”noindex,nofollow”>.
  • To ask robots not to index an image: <meta name=”robots” content=”noimageindex”>.

Make sure your robots.txt file does not block access to the pages in question!

For your deindexing request to be effective, you must wait for the robots to pass. However, you can speed up the process from Search Console by sending a crawl request to Google.

The X-Robots-Tag noindex HTTP header to deindex files without source code


Some files (PDF, images, Word documents, etc.) and web pages and files do not contain source code. Only the implementation of the X-Robots-Tag noindex HTTP header can give deindexing instructions to the robots.

Please note, this technique requires a certain expertise and if you do not have the necessary knowledge, it is better to call on a developer. Improper handling can cause serious malfunctions on your site!

Start by modifying your .htaccess file by integrating the following codes:

To deindex all your PDFs:

<Files ~ “\.pdf$”>
Header set X-Robots-Tag “noindex”
</Files>

To deindex all your image files:

<Files ~ “\.(png|jpe?g|gif)$”>
Header set X-Robots-Tag “noindex”
</Files>

HTTP 404 and HTTP 410 codes to deindex deleted pages

When you delete web pages, they are not immediately deindexed. If nothing replaces your pages, add one of the two codes to confirm their deindexing to Google  :

  • Code 404 (not found): the resource does not exist.
  • Code 410 (gone): the resource does not exist and will not be replaced.

The canonical tag to deindex similar content If you have the same or similar content , you risk being penalized for duplicate content . By adding a canonical tag , you indicate to Google that only the page in question should be taken into account.

Add this code to the <head> part of all duplicate content pages on your website by integrating the URL of your main page: < link rel=«canonical» href=« https://yoursite.com/page- example/» / >.

Deindex a page without losing its backlinks


Backlinks are essential for your SEO optimization and it would be a shame to have quality inbound links!

To avoid losing your backlinks, set up a 301 redirect from the old page to the new one. Please note, permanent redirection is carried out in the .htaccess file on your server!

For a page:
RedirectPermanent /repertoire/page-a-rediriger.html http://www.example.net/repertoire/page-de-destination.html

For a directory:
RedirectPermanent /repertoire http://www.nom-de-domaine.com/repertoire-de-destination

For a domain:
RedirectPermanent / http://www.nom-de-domaine.com/

Deindex a web page with WordPress

Since WordPress users do not have access to the <head> part of their pages, the simplest solution is to install the Yoast plugin .

Once the installation is complete, go to the page to deindex, then in the Yoast tab, click “Advanced”. Select “No” to the question “Allow search engines to display Article content in search results?” ". Save, and that's it!

Our tip for deindexing a page

You now know the techniques to deindex a web page. If you have many pages to deindex and want to speed up their deindexing, create a sitemap file in which you list all the URLs to be deindexed and declare the file in Search Console.

Post a Comment

0 Comments

Close Menu