Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: add info on server-side rendering

...

  1. Keep your DSpace up to date. We are constantly adding new indexing improvements in new releases
  2. Ensure your DSpace is visible to search engines.
  3. Enable Ensure the sitemaps feature – this does not require e.g. registering with Google Webmaster tools.is enabled. (enabled by default)
  4. Ensure server-side rendering is enabled in the UI.  (enabled by default)
  5. Ensure your robots.txt allows access to item "splash" pages and full text.
  6. Ensure item metadata appears in HTML headers correctly.
  7. Avoid redirecting file downloads to Item landing pages
  8. Turn OFF any generation of PDF cover pages
  9. As an aside, it's worth noting that OAI-PMH is generally not useful to search engines.  OAI-PMH has its own uses, but do not expect search engines to use it.

...

We are constantly adding new indexing improvements to DSpace.  In order to ensure your site gets all of these improvements, you should strive to keep it up-to-date. For example:

  • As of DSpace 7.0, Sitemaps are enabled by default (see next section)
  • As of DSpace 5.0, the DSpace robots.txt file now includes references to Sitemaps by default (see DS-1936), and also blocks known bad bots (see DS-2335).
  • As of DSpace 4.0, DSpace has provided several enhancements, which were requested by the Google Scholar team. These included providing users (and web indexers) a way to browse content by the date it was added to DSpace (see DS-1482), ensuring the "dc.date.issued" field is set more accurately (see DS-1481), and enhancing the logic behind the "citation_pdf_url" HTML <meta> tag (see DS-1483)
  • As of DSpace 1.7, DSpace has improved how its Item-level metadata is made available to Google Scholar. For the 1.7.0 release, the DSpace Developers worked directly with the Google Scholar developers, to ensure DSpace is generating the "citation_*" HTML "<meta>" tags (i.e. Highwire Press tags) that Google Scholar recommends in their Indexing Guidelines.
  • As of DSpace 1.5, DSpace has support for sitemaps (both simple HTML pages of links, as well as the sitemaps.org protocol). It also includes item metadata in the HTML HEAD element of item display pages, ensuring that the metadata can be effectively indexed no matter what changes you might have made to your DSpace's layout or style.
  • As of DSpace 1.4, DSpace has support for the "if-modified-since" HTTP header. This basically means that if an item (or bitstream therein) has not changed since the last time a search engine's crawler indexed it, that item/bitstream does not have to be re-retrieved, sparing your server.

...

...

Ensure the sitemaps feature is enabled

As of DSpace 7, sitemaps are enabled by default and automatically update on a daily basis.  This is the recommended setup to prefer proper indexing. So, there's nothing you need to do unless you wish to either change their schedule, or disable them.

...

You can configure the list of "all search engines" all search engines" by setting the value of sitemap.engineurls in dspace.cfg.by setting the value of sitemap.engineurls in dspace.cfg.

Ensure Server-side rendering is enabled in the UI

Server-side rendering is enabled by default.  So, you don't need to do anything, unless you've accidentally turned it off.

The DSpace UI is built on Angular.io, which is a JavaScript (TypeScript) based web framework.  As some search engines do not support JavaScript, you MUST ensure the UI's server-side rendering is enabled.  This allows the UI to send plain HTML to search engine spiders (or other clients) which do not support JavaScript.

For information on enabling, see "Universal (Server-side Rendering) settings" in User Interface Configuration

You can test whether server-side rendering is enabled by temporarily disabling JavaScript in your browser (usually this is in the settings of the Developer Tools) and attempting to access your DSpace site.   All basic browse/search functionality should work with JavaScript disabled. (However, all dynamic menus or actions obviously will not work, as all pages will be static HTML.)

Create a good robots.txt

The trick here is to minimize load on your server, but without actually blocking anything vital for indexing. Search engines need to be able to index item, collection and community pages, and all bitstreams within items – full-text access is critically important for effective indexing, e.g. for citation analysis as well as the usual keyword searching.

...

In addition to Dublin Core <meta> tags in the HTML HEAD, DSpace also includes Google Scholar specific metadata fields in each item's HTML display page.

Code Block
<meta property="citation_authors" content="Tansley, Robert; Donohue, Timothy"/>
<meta nameproperty="citation_authorstitle" />
<meta content="Ensuring your DSpace is indexed" name="citation_title" />

These meta tags are the "Highwire Press tags" which Google Scholar recommends.  If you have heavily customized your metadata fields, or wish to change the default "mappings" to these Highwire Press tags, they are configurable in [dspace]/config/crosswalks/google-metadata.properties

...