Description
Parquet nested types are using an extra wrapper object (ArrayWritable) as a wrapper of map and list elements. This extra object is not needed and causing unnecessary memory allocations.
An example of code is on HiveCollectionConverter.java:
public void end() { parent.set(index, wrapList(new ArrayWritable( Writable.class, list.toArray(new Writable[list.size()])))); }
This object is later unwrapped on AbstractParquetMapInspector, i.e.:
final Writable[] mapContainer = ((ArrayWritable) data).get(); final Writable[] mapArray = ((ArrayWritable) mapContainer[0]).get(); for (final Writable obj : mapArray) { ... }
We should get rid of this wrapper object to save time and memory.
Attachments
Attachments
Issue Links
- links to