security

Blog Post

The other day, I ran into an old friend and colleague of mine, who told me that an unfinished site on which he was working showed up in Google search results. Fortunately, he was able to remove the entry from Google's search index after submitting a request to Google, but I imagine his client was disappointed by the situation and unhappy while waiting for Google to process the removal request. My friend was surprised that the site appeared in Google's index, because he had set up his robots.txt file to block search engines from the site.

Learning the hard way

I can't claim any moral high ground on this issue; I made a similar mistake years ago on one of my first professional web projects. I wrongly assumed that, if there were no sites linking to the development copy of the site on which I was working, search engines would have no way to find and crawl the development site. Wrong! I got a nasty surprise when my client sent me an angry e-mail asking why his half-complete new site was showing up in Google. I had to rush to send a removal request to Google, check several other major search engines to confirm that they had not indexed the site, and apologize profusely to the client. Fortunately he forgave me, and the development site dropped out of Google's search index.

Resource (external link)

Google help article explaining three methods of blocking Google from indexing a page (.htaccess password protection, robots.txt file, and noindex meta tags). Provides a brief overview of how to implement each method and the relative effectiveness of each.

Resource (external link)

This brief tutorial explains both how to password-protect a web site using .htaccess and .htpasswd and why you would want to do so.

Resource (external link)

Easy-to-use tool for generating .htpasswd files and the code to add to .htaccess files.

Resource (external link)

This article file permissions settings for a Drupal site on Linux or Windows. Very useful for getting started with basic Linux file security.