Page History
...
Name | Java Class | Function | Default input formats | Enabled by Default? |
---|---|---|---|---|
PDF Text Extractor |
| extracts the full text of Adobe PDF documents (only if text-based or OCRed) for full text indexing. (Uses the Apache PDFBox tool) | Adobe PDF | yes |
HTML Text Extractor |
| extracts the full text of HTML documents for full text indexing. (Uses Swing's HTML Parser) | HTML, Text | yes |
Word Text Extractor |
| extracts the full text of Microsoft Word or Plain Text documents for full text indexing. (Uses the "Microsoft Word Text Mining" tools.) See also PoiWordFilter, below. | Microsoft Word | yes |
Word Text Extractor |
| extracts the full text of Microsoft Word or and Microsoft Word XML documents for full text indexing. (Uses the "Apache POI" tools.) Disabled by default. Uncomment PoiWordFilter and comment WordFilter in dspace.cfg if you wish to use this one. | Microsoft Word, Microsoft Word XML | yesno |
Excel Text Extractor | org.dspace.app.mediafilter.ExcelFilter | extracts the full text of Microsoft Excel documents for full text indexing. (Uses the "Apache POI" tools.) | Microsoft Excel, Microsoft Excel XML | yes |
PowerPoint Text Extractor |
| extracts the full text of slides and notes in Microsoft PowerPoint and PowerPoint XML documents for full text indexing (Uses the Apache POI tools.) | Microsoft Powerpoint, Microsoft Powerpoint XML | yes |
PDFBox JPEG Thumbnail | org.dspace.app.mediafilter.PDFBoxThumbnail | creates thumbnail images of the first page of PDF files | Adobe PDF | yes |
JPEG Thumbnail |
| creates thumbnail images of GIF, JPEG and PNG files | BMP, GIF, JPEG, image/png | yes |
Branded Preview JPEG |
| creates a branded preview image for GIF, JPEG and PNG files | BMP, GIF, JPEG, image/png | no |
ImageMagick Image Thumbnail Generator |
| Uses ImageMagick to generate thumbnails for image bitstreams. Requires installation of ImageMagick on your server. See ImageMagick Media Filters. | BMP, GIF, image/png, JPG, TIFF, JPEG, JPEG 2000 | no |
ImageMagick PDF Thumbnail Generator | org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter | Uses ImageMagick and Ghostscript to generate thumbnails for PDF bitstreams. Requires installation of ImageMagick and Ghostscript on your server. See ImageMagick Media Filters. | Adobe PDF | no |
...