Fri, Nov 7 2003 10:04
bradley
Tip for excluding your box from Google Searches
Some customers may wish to exclude their SBS 2003 installation from the
scope of Web search sites such as Google.com. This may be because you would
prefer to restrict knowledge of your installation only to those who can use
it, or, you may want to keep some portions of your site (e.g. Business Web
site) searchable while keeping other portions under the radar of Web search
sites.
There is a way to do this using the Robots Exclusion Protocol. By placing a
simple text file at the root of your Web site, you can tell Web search
robots which parts of the Web site are open for search.
I've attached two versions of robots.txt that I've whipped up for my SBS
2003 server:
1.. robots.txt - Allows search of your business Web site but hides
SBS-specific sites from search robots.
2.. robots2.txt - (Must be renamed to robots.txt) Denies search of your
entire Web site.
For more information, check out these sources:
http://www.robotstxt.org/wc/robots.html
http://www.searchtools.com/robots/robots-txt.html
http://www.searchengineworld.com/robots/robots_tutorial.htm
Many Web sites implement this functionality. For example, you can check out
http://www.cnn.com/robots.txt.
Please respond to this post if you have any questions or comments - let us
know how this works out for you!
Thanks,
Alan Billharz
Program Manager, SBS 2003
# Place this file at the root of the Default Web Site (%system drive%\inetpub\wwwroot)
# to allow search engines to catalog your Business Web site, but not catalog the other
# SBS-specific Web sites.
#
# Note that you must choose to publish the root of your Web site to allow the search
# engine robot to read this file. In the Configure E-mail and Internet Connection Wizard,
# choose to publish Business Web site (wwwroot).
User-agent: *
Disallow: /_vti_bin/
Disallow: /clienthelp/
Disallow: /exchweb/
Disallow: /remote/
Disallow: /tsweb/
Disallow: /aspnet_client/
Disallow: /images/
Disallow: /_private/
Disallow: /_vti_cnf/
Disallow: /_vti_log/
Disallow: /_vti_pvt/
Disallow: /_vti_script/
Disallow: /_vti_txt/
# Place this file at the root of the Default Web Site (%system drive%\inetpub\wwwroot)
# to prevent all search engines from cataloging your Web site.
#
# Note that you must choose to publish the root of your Web site to allow the search
# engine robot to read this file. In the Configure E-mail and Internet Connection Wizard,
# choose to publish Business Web site (wwwroot).
User-agent: *
Disallow: /
Filed under: Security, Needed Patches/Tweaks