You are not logged in.
Pages: 1
Topic closed
Hello geeks.
I have drupal installed with clean_url on and I use rewrite rules to make all urls human readable instead of /node/34 and /?q=node/34
At the moment, I block these old URLs using robots.txt but googlebot won't give up. it keeps trying to crawl pages and fails because of robots.txt
but I was thinking, can I use htaccess and a rewrite condition rule that checks if useragent is googlebot http://www.useragentstring.com/pages/Googlebot/ and if the requested url contains /?q=node/ or /node/ send a 410 error?
like RewriteRule blah [G] only to googlebot and only if the url contains /?q=node/ or /node/
This will make google think they don't exist instead and giving up instead of a billion 'blocked by robots.txt'.
So far, I came up with RewriteRule ^/\?q=node/(.+) - [G] but it won't work :/
Last edited by hussam (May 5 2010)
Use RewriteCond :)
RewriteCond %{HTTP_user_agent} googlebot^$ [NC]
RewriteRule ^/\?q=node/(.+)
without the [G]?
RewriteRule ^/\?q=node/(.+) - [G] is the part that didn't work
useragent part comes later. first I need to make sure the rule works.
Last edited by hussam (May 5 2010)
this fixed it:
RewriteCond %{QUERY_STRING} ^q=node/(.*)$
RewriteRule ^$ - [G]
Basically the /?q=node/ folder doesn't really exist so we use QUERY__STRING instead
next step:
RewriteCond %{HTTP_user_agent} Googlebot^$ [NC] doesn't work.
Any corrections?
RewriteCond %{HTTP_USER_AGENT} .*Googlebot.*$ [NC]
fixed it.
Pages: 1
Topic closed