I recently finished building a website and while trying to get the site indexed by Google I seem to be getting some weird happenings and was hoping someone could shed some light on this as my Google-fu has revealed nothing.
The server stack I'm running is made up of:
Debian 7 / Apache 2.2.22 / MySQL 5.5.31 / PHP 5.4.4-14
The problem I'm having is Google seems to want to index some odd URLs and is currently ranking them higher than actual legitimate pages. I will list the odd ones here:
www.mydomain.com/srv/www/mydomain?srv/www/mydomain
www.mydomain.com/srv/www?srv/www
www.mydomain.com/srv/www?srv/www/index
Webmaster tools now tell me 'this is an important page blocked by robots.txt' because as soon as I found the issue, I put some 301 redirects into the htaccess
file to send these requests to the homepage and blocked the addresses in the robots file.
Also, I have submitted an XML sitemap with all the correct URLs to webmaster tools.
All the website files are stored in:
/srv/www/mydomain/public_html/
Now, I think this has something to do with the way I've set up my .htaccess mod-rewrite rules, but I can't seem to get my head around what is doing it. It could also be my Apache vhosts configuration. I will include both below:
.htaccess mod-rewrite
rules:
<IfModule mod_rewrite.c>
RewriteEngine on
# Redirect requests for all non-canonical domains
# to same page in www.mydomain.com
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www\.mydomain\.com$
RewriteRule (.*) http://www.mydomain.com/$1 [R=301,L]
# Remove .php file extension
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.php -f
RewriteRule ^(.*)$ $1.php
# redirect all traffic to index
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^ index [L]
# Remove 'index' from URL
RewriteCond %{THE_REQUEST} ^[A-Z]{3,}\s(.*)/index [NC]
RewriteRule ^ / [R=301,L]
</IfModule>
Apache Vhost:
<VirtualHost *:80>
ServerAdmin webmaster@mydomain.com
ServerName mydomain.com
ServerAlias www.mydomain.com
DocumentRoot /srv/www/mydomain/public_html/
ErrorLog /srv/www/mydomain/logs/error.log
CustomLog /srv/www/mydomain/logs/access.log combined
</VirtualHost>
Also, if it might be relevant, my PHP page handling is:
# Declare the Page array
$Page = array();
# Get the requested path and trim leading slashes
$Page['Path'] = ltrim($_SERVER['REQUEST_URI'], '/');
# Check for query string
if (strpos($Page['Path'], '?') !== false) {
# Seperate path and query string
$Page['Query'] = explode('?', $Page['Path'])['1'];
$Page['Path'] = explode('?', $Page['Path'])['0'];
}
# Check a path was supplied
if ($Page['Path'] != '') {
# Select page data from the directory
$Page['Data'] = SelectData('Directory', 'Path', '=', $Page['Path']);
# Check a page was returned
if ($Page['Data'] != null) {
# switch through allowed page types
switch ($Page['Data']['Type']) {
# There are a bunch of switch cases here that
# Determine what page to serve based on the
# page type stored in the directory
}
# When no page is returned
} else {
# 404
$Page = Build404ErrorPage($Page);
}
# When no path supplied
} else {
# Build the Home page
$Page = BuildHomePage($Page);
}
Can anyone see anything here that would be causing this?