How To Stop Google From Crawling My Site

How To Stop Google From Crawling My Site

So here is the scenario. You just launched your website and are afraid it might get crawled by search engines. Don’t panic; I am here to help. I have done a lot of research on this and have consulted with experts on this matter. To make it easy for you, I have written this tutorial to explain how to stop Google from crawling a WordPress site using four proven methods. You can apply some of them or use all of them. It is entirely your choice.

4 ways to stop Google from crawling your website

If your site is live, don’t worry. It takes up to 24 – 48 hours for search engines to crawl and index your website. Sufficient time to implement the different methods with which your site will not appear in search results. 

Why Stop Google From Crawling a Site?

As a web developer, I am sharing my personal experience of why I do not want search engines to crawl and index a site.

Whenever I build a new site, my primary objective is to stop the bots from crawling and indexing it. Why? Well, the site is usually in the development phase. So, during this time, the client wants full access to check the site design and functionality and provide relevant feedback.

This phase may take a few days or even months. During this period, search engines may crawl and index the website. Because of this, I get a lot of complaints from clients telling me that their dev site is appearing in Google search results.

Why is this bad?

  1. A dev site appearing in Google search engines means users may find the website.
  2. Users may see broken functionality, dummy text, poor loading time, etc.
  3. The company will lose its users, customers, and trust. It will harm the company’s reputation.
  4. There could be some sensitive data that a company may not want to appear in search results.

4 Ways To Stop Google From Crawling

Here are four ways you need to follow to stop Google from crawling a WordPress website.

1.) Discourage Search Engines From Indexing This Site

Let us begin with the WordPress built-in method to stop search engines from indexing a site.

To hide your website from search engines, log in to your WordPress admin panel and navigate to Settings → Reading → check the “Discourage search engines from indexing this site” check box at the bottom.

It is a built-in feature of WordPress, but it is not reliable and here is why. If you look closely, there is a note: “It is up to search engines to honour this request”.

How to discourage search engine from indexing this site in WordPress setings

I am confused as to why WordPress even has this feature if it is not 100% effective. It feels as if they wanted to make their users happy but didn’t put enough effort into making it happen.

Nevertheless, even though it is not enough, it is important and should be implemented.

2.) Add the Disllow Rule in the Robots.txt File Manually

Every site has a robots.txt file that contains permission for search engines to crawl and index all or some parts of your website. Using the robots.txt file, you can add a “Disallow: /” rule below the User-agent: *. 

The rule tells search bots that they do not have permission to crawl and index your entire site.

I believe this is what the Discourage Search Engine should have done in step 1. It should automatically update the robots.txt file and add this rule. 

Not everyone is a WordPress wiz, but don’t worry. I will explain step-by-step how to update the file from the server and by using a plugin.

Edit robots.txt file from the server

  1. Before you do anything, create a backup of your entire site.
  2. You need to access your server, find the file manager folder, and edit the robots.txt file to add this line of code.
  3. Download the robots.txt file or create a duplicate. If something goes wrong, you will have a backup.

Edit the robots.txt using an All-in-one SEO Plugin
You can also use SEO plugins to prevent search engines from indexing the site. One of my favorite plugins is All-in-one SEO.

  1. Install and activate the All-in-One SEO plugin.
  2. Go to All-in-One SEO → Tools, and you will land on the Robots.txt editor page.
  3. Select the Enable Custom Robots.txt option, and enter the following code in the field:
User Agent = *
Disallow: /

Once you save the changes, you will see the code added in the Robots.txt Preview section below. To remove the code, uncheck the Enable Custom Robots.txt or delete the rule.

3.) Password Protect Your Site To Stop Google From Crawling It

From the server, you can password-protect your root directory. Search engines will not access your site, making it impossible to be crawled and index. But don’t do this if your site is live because no one will be able to access it.

Login to your server and navigate to Password Protected Directories. Enter the “/” in the root directory field to target your whole site and set the username and password.

4.) Add “Noindex” Meta

Google recommends using the noindex meta. You can read more about it here. Adding the noindex to your site means that when the search engines crawl your site, they will see that they do not have permission to index it. For this to work, robots.txt must NOT block the site.

Here is what Google says:

Important: For the noindex directive to be effective, the page must not be blocked by a robots.txt file, and it has to be otherwise accessible to the crawler. If the page is blocked by a robots.txt file or the crawler can’t access the page, the crawler will never see the noindex directive, and the page can still appear in search results, for example if other pages link to it.

How Do I Stop Google From Indexing a Particular Page From My Website?

The above four methods explain how to stop Google from crawling your site. However, you may need to stop Google from indexing a particular page on your website.

Sometimes, the client may request that a particular page should not show up in search results. If it is a newly created page, there are three ways to stop Google from indexing it:

  1. The robots.txt file
  2. The All-in-one SEO
  3. Password Protect

However, if the page has been crawled and indexed by Google, it needs to be removed from Google first. It usually involves a removal request in the Google Search Console.

Let us explore all of these methods one-by-one.

1.) Add a Single Page Disallow Rule in Robots.txt to Stop Google From Crawling

Access the robots.txt file as explained in step 2 above. Once you edit the file, add the disallow rule. The disallow rule in step 2 blocks the entire website from the Google search engine.

To stop Google from crawling and indexing a particular page, you need to add the following rule:

Disallow: /your-page-url/

For example, if you want to hide the About page, this will be the URL.

Disallow: /about-us/

2.) Noindex Tag Using the All-in-one SEO Plugin

Using a plugin saves you from manually editing the robots.txt file. If you are not a WordPress developer, your first instinct would be to go with a plugin.

Luckily, the All-in-one SEO plugin provides an easy way to stop a page from getting indexed.

Here is what you need to do:

  1. Edit the page you want to hide from Google.
  2. Find the AIOSEO settings and click on the Advanced tab.
  3. Uncheck the Robots Setting to reveal more options.
  4. Check the no-index option.
  5. Save your changes.
Noindex meta tag using the All-in-one SEO plugin

The plugin will add a line of code to the page, telling Google that this page is off-limits for indexing.

If you want to see the code, view the page on the front end and right-click the mouse button.

Select the View Page Source option. It will open a new tab showing the web page’s HTML code. Look in the <head> section, and you should see the following line of code added or something similar:

<meta name="robots" content="noindex"/>

3.) Password Protect

WordPress has a built-in feature that lets you set a password to the page. So, only users who have the password can view the page. If the Google search bot does come across the page, it cannot crawl or index it because it is password-protected.

4.) Google Search Console Removal Request

If the page is appearing in Google search results, the first thing you need to do is request a removal. You must set up a Google Search Console account first and verify the ownership of the website.

After that, go to Indexing -> Removal and select the Temporary Removals tab. Click on the New Request button and enter the URL of the page you want to remove. Click on the next button to submit the request. 

The request will block the page for six months from Google search results. During this time, you can implement any of the above methods to stop Google from indexing a particular page on your website.

Conclusion

The above steps are the best ways to stop search engines from crawling a WordPress site. You can implement more than one of these strategies if required. Personally, I don’t implement the Password Protected method because it is a hassle to always enter the login details to access your own site. If you believe there is a better way or have your method that you would like to share, please leave a comment in the section below.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top