Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Many of these concerns can be rectified if we imagine 're-homing' the functionality of MediaFilter into a set of curation tasks. For example, the 'Admin Only Operation' disappears since all curation tasks can be invoked not only via administrative command-line, but in the admin UI, or indeed in workflow, submission, etc. Properly rewritten curation code can also break out all configuration into modular config files.

'Outsourcing' - Tika Framework

Many core MediaFilter operations are not unique to institutional repositories. Text extraction, for example, is practiced widely by applications that retrieve content on the web and need to index it. DSpace may thus leverage existing art where appropriate. We are evaluating the Apache Tika framework in this light. Tika is part of the larger ecosystem that grew around Lucene, Nutch, Hadoop, SOLR, etc and is concerned with content analysis and data extraction from documents. It has been integrated, e.g. into JackRabbit (the reference implementation of Java Content Repository JSR), and other digital asset management systems. This could help address the 'High Code Maintenance' issue: the Tika community can shoulder the burden of ensuring the latest and best components.