MultiViews and Canonicalising URLs
Home / Web Technology / MultiViews and Canonicalising URLs
- http://projectcerbera.com/blog/archive.html
- http://projectcerbera.com/blog/archive
I tried this:
| Code: |
| # Strip ".html" from end of any incoming request:
RedirectMatch permanent ^/(.*)\.html$ http://projectcerbera.com/$1 |
Another way I found was Dave Shea's Rewrite woes...but I can't figure out what actually worked for him.
_________________
My CV type thing and my Life of Ben (Blog). Nigel Peck's Accessify Forum Requirements.
Last edited by Ben Millard on 14 Feb 2007 02:06 am; edited 1 time in total
_________________
Patrick H. Lauke / webmaster / University of Salford
co-lead: WaSP Accesibility Task Force
take it to the streets ... WaSP Street Team
personal: splintered | photographia | redux
co-author: Web Accessibility - Web Standards and Regulatory Compliance
_________________
Jim O'Donnell
work: Royal Observatory Greenwich
play: eatyourgreens
| Code: |
| RewriteEngine On
RewriteRule ^(.*)+\.html$ $1 [R=301,L] <IfModule mod_headers.c> ErrorHeader unset Content-Location </IfModule> |
_________________
Simon Pieters
Zcorpan: When I use that, all requests end up at http://projectcerbera.com/home/cerbera/public_html/. I tried changing it to this:
| Code: |
| RewriteRule ^(.*)+\.html$ http://projectcerbera.com$1 [R=301,L]
<IfModule mod_headers.c> ErrorHeader unset Content-Location </IfModule> |
(EDIT) Slightly depressing piece of news is that Googlebot and some other spiders have visited while my URLs aren't canonicalised.
Turning off MultiViews and using a set of RewriteConds looks like a possibility:
- Extensioned URLs would be redirected to extensionless URLs.
- Extensionless URLs would be silently rewritten back to extensioned URLs.
_________________
My CV type thing and my Life of Ben (Blog). Nigel Peck's Accessify Forum Requirements.
Doing a 301 Moved Permanently from /foo/bar.html to /foo/bar is easy:
| Code: |
| # Redirect ".html" to extensionless URLs:
RewriteRule ^(.*)\.html$ /$1 [R=301,NC,L] |
This new request for /foo/bar needs the extension added back on so the file /foo/bar.html is retrieved. I'm using this:
| Code: |
| # Rewrite extensionless URLs to ".html" page:
RewriteCond %{REQUEST_URI} !^(.*)/$ RewriteCond %{REQUEST_URI} ^(.*)$ RewriteRule !(\.)(.*)$ %1.html [L,NC] |
Each of these samples works on their own. But I need both to work together for the URLs to be both technology-neutral and canonical. Is there a way to check if the request came from a redirect? Would moving the redirect into the PHP allow me to break the loop by detecting some sort of condition?
_________________
My CV type thing and my Life of Ben (Blog). Nigel Peck's Accessify Forum Requirements.
| Code: |
| RewriteEngine on
RewriteCond %{THE_REQUEST} "^GET /foo/bar.html" RewriteRule (.*) http://www.example.org/foo/bar [R=301] |
It avoids an infiniate loop since "THE_REQUEST" is what the UA asks for, not what the server translates it to on the second pass. I imagine you can do some regexp matching to make it work generally on multiple files.
| Code: |
| RewriteEngine On
# get rid of www. RewriteCond %{HTTP_HOST} ^www\.example\.org$ [NC] RewriteRule ^(.*)$ http://example.org/$1 [R=301,L] # get rid of .html RewriteCond %{THE_REQUEST} ^(GET|POST)\ (.+)\.html(&[^\ ]+)*\ HTTP/ RewriteRule ^(.+)\.html$ /$1 [R=301,L] # get rid of Content-Location <IfModule mod_headers.c> ErrorHeader unset Content-Location </IfModule> |
If you want it to work with query strings then use this instead:
| Code: |
| RewriteEngine On
# get rid of www. RewriteCond %{HTTP_HOST} ^www\.example\.org$ [NC] RewriteRule ^(.*)$ http://example.org/$1 [R=301,L] # get rid of .html RewriteCond %{THE_REQUEST} ^(GET|POST)\ (.+)\.html(\?.*)?(&[^\ ]+)*\ HTTP/ RewriteRule ^(.+)\.html(\?.*)?$ /$1$2 [R=301,L] # get rid of Content-Location <IfModule mod_headers.c> ErrorHeader unset Content-Location </IfModule> |
_________________
Simon Pieters
However, I get infinite redirect loops when I use it on Project Cerbera (Apache 1.3.37 (seriously, I'm not making a "1337" joke)). Specifically, requesting a URL with a .html extension matching a file which exists.
Apparently, the API Phases in 1.3 are different to API Phases in 2.0. So my best guess is we're doing something in a way which is incompatible with Apache 1.3's API phases?
I tried removing everything from my .htaccess apart from the essentials (like enabling MultiViews). No errors. But then adding either of zcorpan's samples, Project Cerbera loops. So it's not something else in there causing the problem.
Thanks to everyone so far. It's all helpful and we're getting closer each time!
_________________
My CV type thing and my Life of Ben (Blog). Nigel Peck's Accessify Forum Requirements.
Last edited by Ben Millard on 16 Feb 2007 09:49 am; edited 1 time in total
| Code: |
| RewriteCond %{REQUEST_URI} ^(.+)\.html(\?.*)?$
RewriteRule ^(.+)\.html(\?.*)?$ http://projectcerbera.com/$1$2 [R=301,L] |
- /sitemap.html
- /gta1/utilities.html - old location, a Redirect operates on it.
- /tutorials/gta1/info/important-editors.html
- /tutorials/sa_handling-definition.html - old URL scheme, a RedirectMatch works on it.
However, there's a slightly undesirable side-effect. When visiting a folder, like /tutorials/gta1/info/, you get redirected to /tutorials/gta1/info/index. I've gotten around this by adding a preceeding RewriteCond:
| Code: |
| # get rid of .html
RewriteCond %{REQUEST_URI} !^(.+)index\.html(\?.*)?$ RewriteCond %{REQUEST_URI} ^(.+)\.html(\?.*)?$ RewriteRule ^(.+)\.html(\?.*)?$ http://projectcerbera.com/$1$2 [R=301,L] |
_________________
My CV type thing and my Life of Ben (Blog). Nigel Peck's Accessify Forum Requirements.
...just kidding.
The whole shebang should look like this:
| Code: |
| RewriteEngine On
# get rid of www. RewriteCond %{HTTP_HOST} ^www\.projectcerbera\.com$ [NC] RewriteRule ^(.*)$ http://projectcerbera.com/$1 [R=301,L] # get rid of .html RewriteCond %{REQUEST_URI} \.html RewriteRule ^(.+)\.html(\?.*)?$ http://projectcerbera.com/$1$2 [R=301,L] # redirect index to ./ RewriteCond %{THE_REQUEST} index RewriteRule ^(.*)index\.html(\?.*)?$ http://projectcerbera.com/$1$2 [R=301,L] # get rid of Content-Location <IfModule mod_headers.c> ErrorHeader unset Content-Location </IfModule> |
_________________
Simon Pieters
| Code: |
| # Get rid of 'www.':
RewriteCond %{HTTP_HOST} ^www\.projectcerbera\.com$ [NC] RewriteRule ^(.*)$ http://projectcerbera.com/$1 [R=301,L] # Get rid of '.html': RewriteCond %{THE_REQUEST} \.html RewriteRule ^(.+)\.html(\?.*)?$ http://projectcerbera.com/$1$2 [R=301,L] # Get rid of 'index' in directories: RewriteCond %{THE_REQUEST} index RewriteRule ^(.*)index\.html(\?.*)?$ http://projectcerbera.com/$1$2 [R=301,L] # get rid of Content-Location <IfModule mod_headers.c> ErrorHeader unset Content-Location </IfModule> |
Homepage:
- / - stays as this. Pass.
- /index - redirects to /. Pass.
- /index.html - redirects to /. Pass.
- /blog - redirects to /blog/. Pass.
- /blog/ - stays as this. Pass.
- /blog/index - redirects to /blog/. Pass.
- /blog/index.html - redirects to /blog/. Pass.
- blog/2007/02 - stays as this. Pass.
- blog/2007/02.html - redirects to /blog/2007/02. Pass.
- /gta1/utilities.html - redirects to /tutorials/gta1/info/important-editors. Pass.
- /misc/authorbiography.html - redirects to /misc/ben-millard. Pass.
_________________
My CV type thing and my Life of Ben (Blog). Nigel Peck's Accessify Forum Requirements.
This should fix it:
| Code: |
| # Get rid of 'www.':
RewriteCond %{HTTP_HOST} ^www\.projectcerbera\.com$ [NC] RewriteRule ^(.*)$ http://projectcerbera.com/$1 [R=301,L] # Get rid of '.html': RewriteCond %{THE_REQUEST} \.html RewriteCond %{THE_REQUEST} !\?.*\.html RewriteRule ^([^\?]+)\.html(\?.*)?$ http://projectcerbera.com/$1$2 [R=301,L] # Get rid of 'index' in directories: RewriteCond %{THE_REQUEST} index RewriteCond %{THE_REQUEST} !\?.*\index RewriteRule ^([^\?]*)index\.html(\?.*)?$ http://projectcerbera.com/$1$2 [R=301,L] # get rid of Content-Location <IfModule mod_headers.c> ErrorHeader unset Content-Location </IfModule> |
_________________
Simon Pieters
All times are GMT
You cannot post new topics in this forumYou cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


