Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Name

Java Class

Function

Enabled by Default?

HTML Text Extractor

org.dspace.app.mediafilter.HTMLFilter

extracts the full text of HTML documents for full text indexing. (Uses Swing's HTML Parser)

true

JPEG Thumbnail

org.dspace.app.mediafilter.JPEGFilter

creates thumbnail images of GIF, JPEG and PNG files

true

Branded Preview JPEG

org.dspace.app.mediafilter.BrandedPreviewJPEGFilter

creates a branded preview image for GIF, JPEG and PNG files

false

PDF Text Extractor

org.dspace.app.mediafilter.PDFFilter

extracts the full text of Adobe PDF documents (only if text-based or OCRed) for full text indexing. (Uses the Apache PDFBox tool)

true

XPDF Text Extractor

org.dspace.app.mediafilter.XPDF2Text

extracts the full text of Adobe PDF documents (only if text-based or OCRed) for full text indexing (Uses the XPDF command line tools available for Unix.) See XPDF Filter Configuration for details on installing/enabling.

false

Word Text Extractor

org.dspace.app.mediafilter.WordFilter

extracts the full text of Microsoft Word or Plain Text documents for full text indexing. (Uses the "Microsoft Word Text Mining" tools.)

true

PowerPoint Text Extractor

org.dspace.app.mediafilter.PowerPointFilter

extracts the full text of slides and notes in Microsoft PowerPoint and PowerPoint XML documents for full text indexing (Uses the Apache POI tools.)

true

ImageMagick Image Thumbnail Generator

org.dspace.app.mediafilter.ImageMagickImageThumbnailFilter

uses ImageMagick to generate thumbnails for image bitstreams. Requires installation of ImageMagick on your server. See ImageMagick Media Filters.false
ImageMagick PDF Thumbnail Generatororg.dspace.app.mediafilter.ImageMagickPdfThumbnailFilteruses ImageMagick and Ghostscript to generate thumbnails for PDF bitstreams. Requires installation of ImageMagick and Ghostscript on your server. See  ImageMagick Media Filters.false

Please note that the filter-media script will automatically update the DSpace search index by default (see Legacy methods for re-indexing content) This is the recommended way to run these scripts. But, should you wish to disable it, you can pass the -n flag to either script to do so (see Executing (via Command Line) below).

...