Page History

...

Name	Java Class	Function	Enabled by Default?
HTML Text Extractor	`org.dspace.app.mediafilter.HTMLFilter`	extracts the full text of HTML documents for full text indexing. (Uses Swing's HTML Parser)	true
JPEG Thumbnail	`org.dspace.app.mediafilter.JPEGFilter`	creates thumbnail images of GIF, JPEG and PNG files	true
Branded Preview JPEG	`org.dspace.app.mediafilter.BrandedPreviewJPEGFilter`	creates a branded preview image for GIF, JPEG and PNG files (disabled by default)	false
PDF Text Extractor	`org.dspace.app.mediafilter.PDFFilter`	extracts the full text of Adobe PDF documents (only if text-based or OCRed) for full text indexing. (Uses the Apache PDFBox tool)	true
XPDF Text Extractor	`org.dspace.app.mediafilter.XPDF2Text`	extracts the full text of Adobe PDF documents (only if text-based or OCRed) for full text indexing (Uses the XPDF command line tools ( http://www.foolabs.com/xpdf/^{Image Removed} ) available for Unix.) See XPDF Filter Configuration for details on installing/enabling.	false
Word Text Extractor	`org.dspace.app.mediafilter.WordFilter`	extracts the full text of Microsoft Word or Plain Text documents for full text indexing. (Uses the "Microsoft Word Text Mining" tools.)	true
PowerPoint Text Extractor	`org.dspace.app.mediafilter.PowerPointFilter`	extracts the full text of slides and notes in Microsoft PowerPoint and PowerPoint XML documents for full text indexing (Uses the Apache POI tools.)	true

Please note that the filter-media script will automatically update the DSpace search index by default (see ReIndexing Content (for Browse or Search)) This is the recommended way to run these scripts. But, should you wish to disable it, you can pass the -n flag to either script to do so (see Executing (via Command Line) below).

...

All Versions

DSpace Documentation

Page tree

Versions Compared

Old Version 7

New Version 8

Key