All Versions
- DSpace 7.x (Current Release)
- DSpace 8.x (Unreleased)
- DSpace 6.x (EOL)
- DSpace 5.x (EOL)
- More Versions...
...
Name | Java Class | Function | Enabled by Default? |
---|---|---|---|
HTML Text Extractor | | extracts the full text of HTML documents for full text indexing. (Uses Swing's HTML Parser) | true |
JPEG Thumbnail | | creates thumbnail images of GIF, JPEG and PNG files | true |
Branded Preview JPEG | | creates a branded preview image for GIF, JPEG and PNG files (disabled by default) | false |
PDF Text Extractor | | extracts the full text of Adobe PDF documents (only if text-based or OCRed) for full text indexing. (Uses the Apache PDFBox tool) | true |
XPDF Text Extractor | | extracts the full text of Adobe PDF documents (only if text-based or OCRed) for full text indexing (Uses the XPDF command line tools ( http://www.foolabs.com/xpdf/) available for Unix.) See XPDF Filter Configuration for details on installing/enabling. | false |
Word Text Extractor | | extracts the full text of Microsoft Word or Plain Text documents for full text indexing. (Uses the "Microsoft Word Text Mining" tools.) | true |
PowerPoint Text Extractor | | extracts the full text of slides and notes in Microsoft PowerPoint and PowerPoint XML documents for full text indexing (Uses the Apache POI tools.) | true |
Please note that the filter-media
script will automatically update the DSpace search index by default (see ReIndexing Content (for Browse or Search)) This is the recommended way to run these scripts. But, should you wish to disable it, you can pass the -n flag to either script to do so (see Executing (via Command Line) below).
...
Available Command-Line Options:
...
[dspace
\]/bin/dspace
filter-media
\ -h
}}[dspace
\]/bin/dspace
filter-media
\ -f
}}[dspace
\]/bin/dspace
filter-media
\ -i
123456789/2
}}[dspace
\]/bin/dspace
filter-media
\ -m
1000
}}[dspace
\]/bin/dspace
filter-media
\ -n
}}index-update
elsewhere.[dspace
\]/bin/dspace
filter-media
\ -p
"PDF
Text
Extractor","Word
Text
Extractor"
}}[dspace
\]/bin/dspace
filter-media
\ -s
123456789/9,123456789/100
}}[dspace
\]/bin/dspace
filter-media
\ -s
`less
filter-skiplist.txt`
}}[dspace
\]/bin/dspace
filter-media
\ -v
}}org.dspace.app.mediafilter.FormatFilter
interface. See the Creating a new Media/Format Filter topic and comments in the source file FormatFilter.java
for more information. In theory filters could be implemented in any programming language (C, Perl, etc.) However, they need to be invoked by the Java code in the Media Filter class that you create....