Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
User-agent: *
# Disable access to Discovery search and filters
Disallow: /discover 
Disallow: /search-filter

# This should be the FULL URL to your HTML Sitemap.  
# Make sure to replace "[dspace.url]" with the value of your 'dspace.url' setting in your dspace.cfg file.
Sitemap: http://[dspace.url]/htmlmap

# If you have configured DSpace (Solr-based) Statistics to be publicly accessible,
# then you likely do not want this content to be indexed
# Disallow: /displaystats

# Uncomment the following line ONLY if sitemaps.org or HTML sitemaps are used
# and you have verified that your site is being indexed correctly.
# Disallow: /browse

# You also may wish to disallow access to the following paths, in order
# to stop web spiders from accessing user-based content:
# Disallow: /advanced-search
# Disallow: /contact
# Disallow: /feedback
# Disallow: /forgot
# Disallow: /login
# Disallow: /register
# Disallow: /search

Note that for your additional disallow statements to be recognized under the User-agent: * group, they can not be separated by white lines from the declared user-agent: * block. A white line indicates the start of a new user agent block. Without a leading user-agent declaration on the first line, blocks are ignored. Comment lines are allowed and will not break the user-agent block.

This is OK:

Code Block
User-agent: *
# Disable access to Discovery search and filters
Disallow: /discover 
Disallow: /search-filter
Disallow: /displaystats
Disallow: /advanced-search

This is not OK, as the two lines at the bottom will be completely ignored.

Code Block
User-agent: *
# Disable access to Discovery search and filters
Disallow: /discover 
Disallow: /search-filter
 
Disallow: /displaystats
Disallow: /advanced-search

To identify if a specific user agent has access to a particular URL, you can use this handy robots.txt tester.

Ensure Item Metadata appears in the HTML HEAD

...

Avoid redirecting file downloads to Item landing pages

Some DSpace sites make the mistake of redirecting some or all Make sure that you never redirect "direct file downloads" (i.e. users who directly jump to downloading a file, often from a search engine results page) to the associated Item's splash/landing page.  This may be done  In the past, some DSpace sites have added these custom URL redirects in order to facilitate capturing analytics/ statistics via Google Analytics or similar Javascript based analytics.

While these URL redirects may seem harmless, such URLs they may be flagged as cloaking or spam by Google, Google Scholar and other major search engines. This , in turn, may hurt your site's search engine ranking or even cause your entire site to be flagged for removal from the search engine.

If you have these URL redirects in place, it is highly recommended to remove them immediately. If you created these redirects to facilitate capturing download statistics in Google Analytics, you should instead consider upgrading to DSpace 5.0 or above, which is able to automatically record bitstream downloads in Google Analytics (see DS-2088) without the need for any URL redirects.

...