Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Moved misplaced introduction to creating custom filters

...

The media filter plugin configuration filter.plugins in dspace.cfg contains a list of all enabled media/format filter plugins (see Configuring Media Filters for more information). By modifying the value of filter.plugins you can disable or enable MediaFilter plugins.  The filter.plugins setting can be set multiple times to enable multiple filters.  Each filter must be enabled via its name (see "Name" column in the table above).

Code Block
# Enable the default Text Extractor (for 7.3 or above)
filter.plugins = Text Extractor

# Enable the JPEG thumbnail creator
filter.plugins = JPEG Thumbnail

# Enable the PDF thumbnail creator
filter.plugins = PDFBox JPEG Thumbnail


Executing (via Command Line)

...

  • Help : [dspace]/bin/dspace filter-media -h
    • Display help message describing all command-line options.
  • Force mode : [dspace]/bin/dspace filter-media -f
    • Apply filters to ALL bitstreams, even if they've already been filtered. If they've already been filtered, the previously filtered content is overwritten.
  • Identifier mode : [dspace]/bin/dspace filter-media -i 123456789/2
    • Restrict processing to the community, collection, or item named by the identifier - by default, all bitstreams of all items in the repository are processed. The identifier must be a Handle, not a DB key. This option may be combined with any other option.
  • Maximum mode : [dspace]/bin/dspace filter-media -m 1000
    • Suspend operation after the specified maximum number of items have been processed - by default, no limit exists. This option may be combined with any other option.
  • Plugin mode : [dspace]/bin/dspace filter-media -p "PDF Text Extractor","Word Text Extractor"
    • Apply ONLY the filter plugin(s) listed (separated by commas). By default all named filters listed in the filter.plugins field of dspace.cfg are applied. This option may be combined with any other option. WARNING: multiple plugin names must be separated by a comma (i.e. ',') and NOT a comma followed by a space (i.e. ', ').
  • Skip mode : [dspace]/bin/dspace filter-media -s 123456789/9,123456789/100
    • SKIP the listed identifiers (separated by commas) during processing. The identifiers must be Handles (not DB Keys). They may refer to items, collections or communities which should be skipped. This option may be combined with any other option. WARNING: multiple identifiers must be separated by a comma (i.e. ',') and NOT a comma followed by a space (i.e. ', ').
    • NOTE: If you have a large number of identifiers to skip, you may maintain this list, one identifier per line, within a separate file (e.g. filter-skiplist.txt). Use the following format to call the program.
      • [dspace]/bin/dspace filter-media -s $(paste -sd, - < filter-skiplist.txt)
  • Verbose mode : [dspace]/bin/dspace filter-media -v
    • Verbose mode - print Print all extracted text and other filter details to STDOUT.

Creating Custom MediaFilters

Adding your own filters is done by creating a class which implements the org.dspace.app.mediafilter.FormatFilter interface. See the Creating a new Media/Format Filter topic and comments in the source file FormatFilter.java for more information. In theory filters could be implemented in any programming language (C, Perl, etc.) However, they need to be invoked by the Java code in the Media Filter class that you create.

...

Creating a simple Media Filter

...