Page History
...
Below is a listing of all currently available Media Filters, and what they actually do:
Name | Java Class | Function | Default input formats | Enabled by Default? |
---|---|---|---|---|
PDF Text Extractor |
| extracts the full text of Adobe PDF documents (only if text-based or OCRed) for full text indexing. (Uses the Apache PDFBox tool) | Adobe PDF | yes |
HTML Text Extractor |
| extracts the full text of HTML documents for full text indexing. (Uses Swing's HTML Parser) | HTML, Text | yes |
Word Text Extractor |
| extracts the full text of Microsoft Word or Plain Text documents for full text indexing. (Uses the "Microsoft Word Text Mining" tools.) See also PoiWordFilter, below. | Microsoft Word | yes |
Word Text Extractor |
| extracts the full text of Microsoft Word |
or Microsoft Word XML |
documents for full text indexing. (Uses the "Apache POI" tools.) Disabled by default. Uncomment PoiWordFilter and comment WordFilter in dspace.cfg if you wish to use this one. | Microsoft Word, Microsoft Word XML | yes | ||
Excel Text Extractor | org.dspace.app.mediafilter.ExcelFilter | extracts the full text of Microsoft Excel documents for full text indexing. (Uses the "Apache POI" tools.) | Microsoft Excel, Microsoft Excel XML | yes |
PowerPoint Text Extractor |
| extracts the full text of slides and notes in Microsoft PowerPoint and PowerPoint XML documents for full text indexing (Uses the Apache POI tools.) | Microsoft Powerpoint, Microsoft Powerpoint XML | yes |
PDFBox JPEG Thumbnail | org.dspace.app.mediafilter.PDFBoxThumbnail | creates thumbnail images of the first page of PDF files | Adobe PDF | yes |
JPEG Thumbnail |
| creates thumbnail images of GIF, JPEG and PNG files | BMP, GIF, JPEG, image/png | yes |
Branded Preview JPEG |
| creates a branded preview image for GIF, JPEG and PNG files | BMP, GIF, JPEG, image/png | no |
ImageMagick Image Thumbnail Generator |
| Uses ImageMagick to generate thumbnails for image bitstreams. Requires installation of ImageMagick on your server. See ImageMagick Media Filters. | BMP, GIF, image/png, JPG, TIFF, JPEG, JPEG 2000 | no |
ImageMagick PDF Thumbnail Generator | org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter | Uses ImageMagick and Ghostscript to generate thumbnails for PDF bitstreams. Requires installation of ImageMagick and Ghostscript on your server. See ImageMagick Media Filters. | Adobe PDF | no |
Please note that the filter-media
script will automatically update the DSpace search index by default.
...