All Versions
DSpace Documentation
...
Anyone who has analyzed traffic to their DSpace site (e.g. using Google Analytics or similar) will notice that a significant (and in many cases a majority) of visitors arrive via a search engine such as Google or Yahooother search engines. Hence, to help maximize the impact of content and thus encourage further deposits, it is important to ensure that your DSpace instance is indexed effectively.
DSpace comes with tools that ensure major search engines (Google, Bing, Yahoo, Google Scholar) are able to easily and effectively index all your content. However, many of these tools provide some basic setup. Here's how to ensure your site is indexed.
| Info | ||
|---|---|---|
| ||
DSpace now has a basic Search Engine Optimization (SEO) validator which can provide you feedback on how well your site may align with the below SEO policies. At this time, this validation tool can only check three things: (1) your site is using server-side-rendering (SSR), (2) your site has sitemaps enabled and they appear to be working, (3) your site has a robots.txt which links to your sitemaps. This validation tool can be found in the Admin User Interface on the "Health" page. Look for the section named "SEO". If everything looks good, you'll see a green checkbox. If there is feedback to address, you'll see a red warning with details on what needs to be addressed. |
For the optimum indexing, you should:
Ensure your proxy is passing X-Forwarded headers to the User Interface
We are constantly adding new indexing improvements to DSpace. In order to ensure your site gets all of these improvements, you should strive to keep it up-to-date. For example:
For the optimum indexing, you should:
Check SEO Validator status to detect any obvious issues
Ensure your proxy is passing X-Forwarded headers to the User Interface
DSpace now has a basic Search Engine Optimization (SEO) validator which can provide you feedback on how well your site may align with the some of these Search Engine Optimization policies.
At this time, this validation tool can only check three things:
This validation tool can be found in the Admin User Interface on the "Health" page. Look for the section named "SEO". If everything looks good, you'll see a green checkbox similar to this:
If there are issues detected, you'll see a red warning with details on what needs to be addressed.
If issues are detected, you should use the documentation on this wiki page to address the detected issues.
| Note |
|---|
Even if you see a green checkmark on this page, you should still review all the Search Engine Optimization guidelines on this page. As noted above, this validator cannot detect all possible SEO issues, so manual verification is still required. |
We are constantly adding new indexing improvements to DSpace. In order to ensure your site gets all of these improvements, you should strive to keep it up-to-date. For example:
...
...
| Code Block |
|---|
User-agent: * # Disable access to Discovery search and filters; admin pages; processes Disallow: /search Disallow: /admin/* Disallow: /processes |
This is not OK, as the two lines at the bottom will be completely ignored.
| Code Block |
|---|
User-agent: *
# Disable access to Discovery search and filters; admin pages; processes
Disallow: /search
Disallow: /admin/*
Disallow: /processes |
To identify if a specific user agent has access to a particular URL, you can use this handy robots.txt tester.
For more information on the robots.txt format, please see the Google Robots.txt documentation.
It's possible to greatly customize the look and feel of your DSpace, which makes it harder for search engines, and other tools and services such as Zotero, Connotea and SIMILE Piggy Bank, to correctly pick out item metadata fields. To address this, DSpace includes item metadata in the <head> element of each item's HTML display page.
| Code Block |
|---|
<meta name="DC.type" content="Article" />
<meta name="DCTERMS.contributor" content="Tansley, Robert" /> |
...
This is not OK, as the two lines at the bottom will be completely ignored.
| Code Block |
|---|
User-agent: *
# Disable access to Discovery search and filters; admin pages; processes
Disallow: /search
Disallow: /admin/*
Disallow: /processes |
To identify if a specific user agent has access to a particular URL, you can use this handy robots.txt tester.
For more information on the robots.txt format, please see the Google Robots.txt documentation.
...
These meta tags are the "Highwire Press tags" which Google Scholar recommends. If you have heavily customized your metadata fields, or wish to change the default "mappings" to these Highwire Press tags, you may do so by modifying https://github.com/DSpace/dspace-angular/blob/main/src/app/core/metadata/metadatahead-tag.service.ts (see for example the "setCitationAuthorTags()" method in that service class)
...