Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Add notes on aggressive bots

...

The DSpace frontend (UI) will often require several CPUs, especially if you wish to use "cluster mode" (see below) to better scale your application.  A smaller application may be able to use 4-6 CPU cores, while highly active sites may require additional CPU power. CPU is most often necessary for the frontend's Angular Serve Side Rendering (again see "cluster mode" notes below) and for any batch processing / command line scripts on backend.

Blocking Aggressive Bots

DSpace itself doesn't have built-in tools to allow you to block various aggressive bots.  However, with the growth of Artificial Intelligence, many DSpace sites have experienced aggressive harvesting of their metadata / files by various AI-related bots.  These aggressive bots can cause performance problems with DSpace sites if not dealt with in some way.

If you are experiencing this sort of aggressive harvesting, we'd recommend looking at the following resources (none of these are managed by DSpace, but all are community efforts by various library or repository-based developers):

Performance Tuning the Frontend (UI)

...

Limit which pages are processed via Server Side Rendering (SSR)

While enabling Server Side Rendering (SSR) is extremely important for Search Engine Optimization, it can also be very resource intensive for large pages or highly active sites.  Server Side Rendering involves building the entire HTML for the page in Node.js (on your server) before sending the page back to the client/user.  Most humans only encounter SSR briefly, when they initially visit your site.  However, bots may only interact with SSR, especially if they are unable to process Javascript.  This is true even for Google Scholar, whose bots will only use SSR generated pages to index your site.

In order to maximum the performance of SSR, by default, DSpace will minimize the pages and Angular components that are processed during server side rendering.  You may wish to review the default settings to ensure they are appropriate for your site.  See the Server Side Rendering (SSR) Settings

Turn on (or increase) caching of Server Side Rendered pages

...