This product contains a subset of the Parquet files published in https://github.com/apache/parquet-testing. It includes both correct (data) and bad (bad_data) Parquet files. The data/geospatial subdirectory contains test files for the new GEOMETRY logical type.
This directory contains binary files used to verify shredded variant readers.
cases.json - a JSON list of test cases. Each case is an error case, a single record variant case, or a multi-record variant case.
Each JSON object in the list represents a single case and includes:
case_number - a number to identify the case and its data filestest - name of the test from which the case was generated. Multiple cases can be generated from a single test. For instance, testShreddedVariantPrimitives is used to generate a case for each variant primitive.Binary files for each case are named using the case number. Variant binary files are also named using the row number.
Error cases have the following fields:
error_message - a message describing why the case is an errorSingle record cases have the following fields:
parquet_file - path of the Parquet file to be read for the casevariant_file - path of the binary variant file to be read for the casevariant - string representation of the variant for the caseMulti-record cases have the following fields:
parquet_file - path of the Parquet file to be read for the case, containing multiple recordsvariant_files - path of each binary variant file, one for each record in the Parquet file (may be null for a null variant)variants - string representation of the variants for the caseEach *.variant.bin file contains a single variant serialized by concatenating the serialized bytes of the variant metadata followed by the serialized bytes of the variant value.
Each Parquet file contains one or more rows. Each row corresponds to a variant file (by ID) for the test case and consists of an id field and a var field.
For more information, see the original test cases.