Uploaded image for project: 'Parquet'

Parquet

Source changes - FishEye

Shows the 20 most recent commits for Parquet.

Parth Chandra <pchandra@maprtech.com> committed 1766ffc4960e8f7c1efc981a9302688a8c6cd427 (1 file)
Reviews: none

DRILL-5349: Fix TestParquetWriter unit tests when synchronous parquet reader is used.
close apache/drill#780

Paul Rogers <progers@maprtech.com> committed 79811db5aa8c7f2cdbe6f74c0a40124bea9fb1fd (23 files)
Reviews: none

DRILL-5284: Roll-up of final fixes for managed sort
See subtasks for details.

* Provide detailed, accurate estimate of size consumed by a record batch
* Managed external sort spills too often with Parquet data
* Managed External Sort fails with OOM
* External sort refers to the deprecated HDFS fs.default.name param
* Config param drill.exec.sort.external.batch.size is not used
* NPE in managed external sort while spilling to disk
* External Sort BatchGroup leaks memory if an OOM occurs during read
* DRILL-5294: Under certain low-memory conditions, need to force the sort to merge
two batches to make progress, even though this is a bit more than
comfortably fits into memory.

close #761

drill 1.10.0
Parth Chandra <pchandra@maprtech.com> committed 152c87aa6cad84c8752b4a87967c7826cc90dbaa (2 files)
Reviews: none

DRILL-5351: Minimize bounds checking in var len vectors for Parquet reader
close #781

Arina Ielchiieva <arina.yelchiyeva@gmail.com> committed f80d77e6bb60afc562331192ac5d08545c1f1c02 (1 file)
Reviews: none

DRILL-5040: Parquet writer unable to delete table folder on abort
close apache/drill#744

Parth Chandra <pchandra@maprtech.com> committed ddcf89548bd33c0cd3e062f1f6d5027eed822372 (1 file)
Reviews: none

DRILL-5240: Parquet - fix unnecessary object creation while checking for null values in nullable var length columns
This closes #740

Paul Rogers <progers@maprtech.com> committed 38f816a45924654efd085bf7f1da7d97a4a51e38 (2 files)
Reviews: none

DRILL-5157: Multiple Snappy versions on class path
Multiple Snappy versions on class path; causes unit test failures.

This fix updates the Snappy library and adds dependency management to
exclude older versions brought in by Avro and Parquet.

Parth Chandra <parthc@apache.org> committed 052010108a47856f9b1a3c0c470b6572948dc749 (12 files)
Reviews: none

DRILL-5207: Improve Parquet Scan pipelining. Add a configurable AsyncPageReader Queue. Enforce total size of parquet row group. Do not initialize BufferedDirectBufInputStream buffer in init. Wait for first read. Change default size of BufferedDirectBufInputStream. Do not invoke getOptions too many times in Parquet reader. Add metrics for processing time, and decoding time for varlen and fixedlen columns.
This closes #723

Vitalii Diravka <vitalii.diravka@gmail.com> committed eef3b3fb6f4e76e95510253d155d0659e387fc99 (3 files)
Reviews: none

DRILL-4996: Parquet Date auto-correction is not working in auto-partitioned parquet files generated by drill-1.6
- Changed detection approach of corrupted date values for the case, when parquet files are generated by drill:
  the corruption status is determined by looking at the min/max values in the metadata;
- Appropriate refactoring of TestCorruptParquetDateCorrection.

This closes #687

Vitalii Diravka <vitalii.diravka@gmail.com> committed 4a0fd56c106550eee26ca68eaed6108f0dbad798 (7 files)
Sudheesh Katkam <sudheesh@apache.org> committed ae0608b5c4b5ef9f897d6bc3a51f00f0b985bd60 (15 files)
Reviews: none

Revert "DRILL-4373: Drill and Hive have incompatible timestamp representations in parquet - added sys/sess option "store.parquet.int96_as_timestamp"; - added int96 to timestamp converter for both readers; - added unit tests;"
This reverts commit 7e7214b40784668d1599f265067f789aedb6cf86.

drill 1.9.0
Parth Chandra <parthc@apache.org> committed 4b1902c042d3e8f426f54ec04b78813ac64aa120 (2 files)
Reviews: none

DRILL-5009: Skip reading of empty row groups while reading Parquet metadata
+ We will no longer attempt to scan such row groups.

closes #651

Jinfeng Ni <jni@apache.org> committed 9411b26ece34ed8b2f498deea5e41f1901eb1013 (44 files)
Reviews: none

DRILL-1950: Parquet rowgroup level filter pushdown in query planning time.
Implement Parquet rowgroup level filter pushdown. The filter pushdown is performed in
in Drill physical planning phase.

Only a local filter, which refers to columns in a single table, is qualified for filter pushdown.

A filter may be qualified if it is a simple comparison filter, or a compound "and/or" filter consists of
simple comparison filter. Data types allowed in comparison filter are int, bigint, float, double, date,
timestamp, time. Comparison operators are =, !=, <, <=, >, >=. Operands have to be a column of the above
data types, or an explicit cast or implicit cast function, or a constant expressions.

This closes #637

drill 1.9.0
Jinfeng Ni <jni@apache.org> committed 0d4319b25274ba4661a40766302bab318d20709b (1 file)
Reviews: none

DRILL-1950: Update parquet metadata cache format to include both min/max and additional column type information.
    Parquet meta cache format change:
    1. include both min/max in ColumnMetaData if column statistics is available,
    2. include precision/scale/repetitionLevel/definitionLevel in ColumnTypeMetaData (precision/scale/definitionLevel is for future use).

Serhii-Harnyk <serhii.harnyk@gmail.com> committed 03928af0b5cafd52e5b153aa852e5642b505f2c6 (23 files)
Reviews: none

DRILL-5032: Drill query on hive parquet table failed with OutOfMemoryError: Java heap space
close apache/drill#654

drill 1.10.0
Vitalii Diravka <vitalii.diravka@gmail.com> committed cf29b917af4002a1ba0ef750ff3f5a553e910f8e (13 files)
Reviews: none

DRILL-4980: Upgrading of the approach of parquet date correctness status detection - Parquet writer version is added; - Updated the detection method of parquet date correctness.
This closes #644

Parth Chandra <parthc@apache.org> committed f9a443d8a3d8e81b7e76f161b611003d16a53a4d (19 files)
Reviews: none

DRILL-4800: Add AsyncPageReader to pipeline PageRead Use non tracking input stream for Parquet scans. Make choice between async and sync reader configurable. Make various options user configurable - choose between sync and async page reader, enable/disable fadvise Add Parquet Scan metrics to track time spent in various operations

drill 1.9.0