Password-protecting development and staging sites with .htaccess and .htpasswd
The other day, I ran into an old friend and colleague of mine, who told me that an unfinished site on which he was working showed up in Google search results. Fortunately, he was able to remove the entry from Google's search index after submitting a request to Google, but I imagine his client was disappointed by the situation and unhappy while waiting for Google to process the removal request. My friend was surprised that the site appeared in Google's index, because he had set up his robots.txt file to block search engines from the site.
Learning the hard way
I can't claim any moral high ground on this issue; I made a similar mistake years ago on one of my first professional web projects. I wrongly assumed that, if there were no sites linking to the development copy of the site on which I was working, search engines would have no way to find and crawl the development site. Wrong! I got a nasty surprise when my client sent me an angry e-mail asking why his half-complete new site was showing up in Google. I had to rush to send a removal request to Google, check several other major search engines to confirm that they had not indexed the site, and apologize profusely to the client. Fortunately he forgave me, and the development site dropped out of Google's search index.
Methods of blocking search engines from your site
People use a number of methods for trying to keep the search engines away from sites under development or staging copies of their sites, most of them much more intelligent than my early method, which I'm going to call "wishful naiveté." Common methods include using a robots.txt file to instruct search engines not to crawl certain files and directories; using a "noindex" meta tag in the source code of each page you want the search engine to ignore; or password-protecting a page or site, for example by using HTTP authentication.
One of my big problems with relying on robots.txt or noindex meta tags to keep search engines away from a site is that those methods are not really enforceable on your end. To paraphrase Google's help documentation, you are basically putting up a "no trespassing" sign and hoping every search engine that comes across your site chooses to respect your request. If some of them do not want to play nicely, the robots.txt and noindex meta tag methods will not prevent them from crawling and indexing your pages. Moreover, even search engines that largely play by the rules, such as Google, may index basic information about your pages, such as URLs, even if they do not index the page content per robots.txt:
While Google won't crawl or index the content of pages blocked by robots.txt, we may still index the URLs if we find them on other pages on the web. As a result, the URL of the page and, potentially, other publicly available information such as anchor text in links to the site, or the title from the Open Directory Project (www.dmoz.org), can appear in Google search results. (from Google Webmaster Tools help documentation)
Therefore, if you rely on robots.txt or noindex meta tags, the half-finished site you are building for a client could show up in search results pages (as happened to my friend).
Advantages of using password-protection to keep out search engines
Password-protecting your site has the advantage of enforcing your intentions: without the password, search engines are unable to see your site. At all. The good news is that implementing password protection is actually very easy if you are using an Apache server, because doing so only requires a few lines of code, and there are even free online tools to help you generate that code. Since this protection occurs at the Apache server level, this method works whether your site consists of static HTML files or is built in a content management system. In other words, the authentication occurs BEFORE the static files or application layer are served.
In essence, this method consists of adding a few lines of code to the .htaccess file that lives in the root of the site you want to protect, and also creating a file named .htpasswd, which you should put in a directory on your server that cannot be accessed by a web browser. The tutorials in the Related Resources links on the right provide step-by-step instructions for setting this up. The Dynamic Drive site actually provides a tool that will help you generate the code itself; I highly recommend you give it a try.
Closing thoughts
I use this technique not only for sites that are incomplete, but also for sites on which I do ongoing code maintenance. Specifically, for any site where I perform ongoing upgrades and enhancements, I like to maintain a development copy, a staging copy, and a production copy. The development copy usually lives on my laptop, but the staging copy needs to live online, where quality assurance reviewers and other stakeholders can check it out and approve the changes. Using .htaccess and .htpasswd ensures that only authorized users can view the changes before they have been officially approved, and keeps the search engines from peeking.
You can probably see by now why it's a no-brainer to use Apache-level password-protection on any sites you are developing. Not only is it easy, but it also helps you avoid the awkward situation that my friend and I both experienced with our respective clients.

