LebGeeks

A community for technology geeks in Lebanon.

You are not logged in.

#1 May 5 2010

hussam
Member

htaccess and google bot

Hello geeks.
I have drupal installed with clean_url on and I use rewrite rules to make all urls human readable instead of /node/34 and /?q=node/34
At the moment, I block these old URLs using robots.txt  but googlebot won't give up. it keeps trying to crawl pages and fails because of robots.txt
but I was thinking,  can I use htaccess and a rewrite condition rule that checks if useragent is googlebot http://www.useragentstring.com/pages/Googlebot/  and if the requested url contains  /?q=node/ or  /node/ send a 410 error?
like RewriteRule blah [G] only to googlebot and only if the url contains /?q=node/ or  /node/
This will make google think they don't exist instead and giving up instead of a billion 'blocked by robots.txt'.


So far, I came up with RewriteRule ^/\?q=node/(.+) - [G] but it won't work :/

Last edited by hussam (May 5 2010)

Offline

#2 May 5 2010

samer
Admin

Re: htaccess and google bot

Use RewriteCond :)

RewriteCond %{HTTP_user_agent} googlebot^$ [NC] 
RewriteRule ^/\?q=node/(.+)

Offline

#3 May 5 2010

hussam
Member

Re: htaccess and google bot

without the [G]?
RewriteRule ^/\?q=node/(.+) - [G]  is the part that didn't work

useragent part comes later. first I need to make sure the rule works.

Last edited by hussam (May 5 2010)

Offline

#4 May 15 2010

hussam
Member

Re: htaccess and google bot

this fixed it:

RewriteCond %{QUERY_STRING} ^q=node/(.*)$
RewriteRule ^$ - [G]

Basically the /?q=node/ folder doesn't really exist so we use QUERY__STRING instead

Offline

#5 May 16 2010

hussam
Member

Re: htaccess and google bot

next step:

RewriteCond %{HTTP_user_agent} Googlebot^$ [NC]   doesn't work.
Any corrections?

Offline

#6 May 17 2010

hussam
Member

Re: htaccess and google bot

RewriteCond %{HTTP_USER_AGENT} .*Googlebot.*$ [NC]
fixed it.

Offline

Board footer