Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

<?xml version="1.0" encoding="utf-8"?>
<html>

Overview

Info
titleOnly Applied to DSpace pre-1.4

The conclusions of the following analysis have been applied in the

...

1.4.2 of DSpace. Big performance improvements are obtained using Postgresql Vacuum/Analyze after a big batch import.

Over the course of the AIHT project, we noticed that as a DSpace repository grew to very large sizes ingestion time increased dramatically, resulting in poor performance during the batch import of large numbers of objects. I was given the task to analyze DSpace's SQL usage, in particular to find locations where SQL queries were inappropriately slow. Because of the large, distributed scope of the DSpace project, reconstruction and direct analysis of SQL queries being made by the repository is a slow and inexact science. A single object ingestion alone invokes around three hundred queries. On the other hand, because DSpace makes such heavy use of the database during batch ingestion, profiling and amortized analysis is a useful strategy for learning which kinds of SQL queries need special attention.

...