Thread 'robots.txt / snapshot robot'

Message boards : Web interfaces : robots.txt / snapshot robot
Message board moderation

To post messages, you must log in.

AuthorMessage
ProfileAnanas

Send message
Joined: 27 Jun 06
Posts: 305
Germany
Message 16919 - Posted: 28 Apr 2008, 6:13:45 UTC
Last modified: 28 Apr 2008, 6:27:01 UTC

The snapshot robot from snaps.com does not obey robots.txt

Project admins who want to protect their pages from beeing accessed through this snapshot stuff can try this in .htaccess :

deny from 38.98.19

It worked on my domain although I'm not sure wether the excluded IP range is sufficient.


p.s.: I have to correct what I wrote ... show_user.php is just missing in robots.txt so it's not the fault of snaps.com - sorry for that
ID: 16919 · Report as offensive
Nicolas

Send message
Joined: 19 Jan 07
Posts: 1179
Argentina
Message 16927 - Posted: 28 Apr 2008, 17:44:46 UTC - in response to Message 16919.  

The snapshot robot from snaps.com does not obey robots.txt

That domain name doesn't exist.
ID: 16927 · Report as offensive
ProfileAnanas

Send message
Joined: 27 Jun 06
Posts: 305
Germany
Message 16930 - Posted: 28 Apr 2008, 18:19:36 UTC
Last modified: 28 Apr 2008, 18:24:25 UTC

http://www.snap.com

Agent identifier (HTTP_USER_AGENT) are
- "Snapbot/1.0 (Snap Shots, +http://www.snap.com)"
- "Mozilla/5.0 (SnapPreviewBot) Gecko/20061206 Firefox/1.5.0.9"

But it was my fault, the bot does obey robots.txt - I assumed that the BOINC sample robots.txt contained all database driven pages but it does not.
ID: 16930 · Report as offensive

Message boards : Web interfaces : robots.txt / snapshot robot

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.