Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Search engines will now look at /htmlmap, which (after generate-sitemaps has been run) serves one or more pre-generated (and thus served with minimal impact on your hardware) HTML files linking directly to items, collections and communities in your DSpace instance. Crawlers will not have to work their way through any browse screens, which are intended more for human consumption, and more expensive for the server.

Announce your sitemap in robots.txt

Many search engines will automatically discover your sitemap if you announce it in your robots.txt file:

Code Block

Sitemap: http://my.dspace.repo.ac.uk/htmlmap

This directive is user agent independent so it can be placed anywhere in the robots.txt file.

Create a good robots.txt

DSpace 1.5 and 1.5.1 ship with a bad robots.txt file. Delete it, or specifically the line that says Disallow: /browse. If you do not, your site will not be correctly indexed.

...