Test Data
Set 1: Digital Corpora govdocs1
Set 2: OpenPlanets
Set 3: Random binary data created from a stable set of filesizes
The govdocs dataset includes (…), (…some characteristics, e.g. N PDF documents, varying in size from X to Y)
The OpenPlanets dataset …
(Description of fixture processing, generation of bagits)
The generated binary data set
...