Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: remove/merge duplicate sections on SSR

...

  1. Keep your DSpace up to date. We are constantly adding new indexing improvements in new releases
  2. Ensure your DSpace is visible to search engines.
  3. Ensure the user interface is using server-side rendering (enabled by default)
  4. Ensure the sitemaps feature is enabled. (enabled by default)
  5. Ensure server-side rendering is enabled in the UI.  (enabled by default)Ensure your robots.txt allows access to item "splash" pages and full text.
  6. Ensure item metadata appears in HTML headers correctly.
  7. Avoid redirecting file downloads to Item landing pages
  8. Turn OFF any generation of PDF cover pages
  9. As an aside, it's worth noting that OAI-PMH is generally not useful to search engines.  OAI-PMH has its own uses, but do not expect search engines to use it.

...

Ensure the user interface is using server-side rendering

In DSpace 7, server-side rendering is enabled by default. However, it's important to ensure you do not disable it in production.

Because the DSpace user interface is based on Angular.io (which is a javascript framework), you MUST have server-side rendering enabled (which is the default) for search engines to fully index your side.  Server-side rendering allows your site to still function even when Javascript is turned off  in a user's browser.  Some web crawlers do not support Javascript, so they will only interact with this server-side rendered content.

DSpace use Angular Universal for server-side rendering, and itDSpace use Angular Universal for server-side rendering, and it's enabled by default in Production mode via this configuration in src/environments/environment.commonproduction.ts:

Code Block
// Angular Universal Settings
universal: {
  preboot: true,
  ...
},

Per the frontend Installation instructions, you must also be running your production frontend/UI via either yarn run serve:ssr or yarn start.

Ensure the sitemaps feature is enabled

As of DSpace 7, sitemaps are enabled by default and automatically update on a daily basis.  This is the recommended setup to prefer proper indexing. So, there's nothing you need to do unless you wish to either change their schedule, or disable them.

In the dspace.cfg, the Sitemap generation schedule is controlled by this setting

For information, see "Universal (Server-side Rendering) settings" in User Interface Configuration

You can test whether server-side rendering is enabled by temporarily disabling JavaScript in your browser (usually this is in the settings of the Developer Tools) and attempting to access your DSpace site.   All basic browse/search functionality should work with JavaScript disabled. (However, all dynamic menus or actions obviously will not work, as all pages will be static HTML.)

Ensure the sitemaps feature is enabled

As of DSpace 7, sitemaps are enabled by default and automatically update on a daily basis.  This is the recommended setup to prefer proper indexing. So, there's nothing you need to do unless you wish to either change their schedule, or disable them.

In the dspace.cfg, the Sitemap generation schedule is controlled by this setting

Code Block
# By default, sitemaps regenerate daily at 1:15am server time
sitemap.cron = 0 15 1 * * 
Code Block
# By default, sitemaps regenerate daily at 1:15am server time
sitemap.cron = 0 15 1 * * ?

You can modify this schedule by using the Cron syntax defined at https://www.quartz-scheduler.org/api/2.3.0/org/quartz/CronTrigger.html .  Any modifications can be placed in your local.cfg.

...

Code Block
# The URL to the DSpace sitemaps
# XML sitemap is listed first as it is preferred by most search engines
Sitemap: [dspace.ui.url]/sitemap_index.xml
Sitemap: [dspace.ui.url]/sitemap_index.html

The generate-sitemaps command

...

Optionmeaning

-h

--help

Explain the arguments and options.

-s

--no_sitemaps

Do not generate a sitemap in sitemaps.org format.

-b

-no_htmlmap

Do not generate a sitemap in htmlmap format.

-a

--ping_all

Notify all configured search engines that new sitemaps are available.

-p URL

--ping URL

Notify the given URL that new sitemaps are available.  The URL of the new sitemap will be appended to the value of URL.

You can configure the list of "all search engines" by setting the value of sitemap.engineurls in dspace.cfg.

Ensure Server-side rendering is enabled in the UI

Server-side rendering is enabled by default.  So, you don't need to do anything, unless you've accidentally turned it off.

The DSpace UI is built on Angular.io, which is a JavaScript (TypeScript) based web framework.  As some search engines do not support JavaScript, you MUST ensure the UI's server-side rendering is enabled.  This allows the UI to send plain HTML to search engine spiders (or other clients) which do not support JavaScript.

For information on enabling, see "Universal (Server-side Rendering) settings" in User Interface Configuration

You can test whether server-side rendering is enabled by temporarily disabling JavaScript in your browser (usually this is in the settings of the Developer Tools) and attempting to access your DSpace site.   All basic browse/search functionality should work with JavaScript disabled. (However, all dynamic menus or actions obviously will not work, as all pages will be static HTML.)configure the list of "all search engines" by setting the value of sitemap.engineurls in dspace.cfg.

Create a good robots.txt

The trick here is to minimize load on your server, but without actually blocking anything vital for indexing. Search engines need to be able to index item, collection and community pages, and all bitstreams within items – full-text access is critically important for effective indexing, e.g. for citation analysis as well as the usual keyword searching.

...