Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-8120 Umbrella JIRA tracking Parquet improvements
  3. HIVE-9605

Remove parquet nested objects from wrapper writable objects

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.2.0
    • 1.3.0, 2.0.0
    • None
    • None

    Description

      Parquet nested types are using an extra wrapper object (ArrayWritable) as a wrapper of map and list elements. This extra object is not needed and causing unnecessary memory allocations.

      An example of code is on HiveCollectionConverter.java:

      public void end() {
          parent.set(index, wrapList(new ArrayWritable(
              Writable.class, list.toArray(new Writable[list.size()]))));
      }
      

      This object is later unwrapped on AbstractParquetMapInspector, i.e.:

      final Writable[] mapContainer = ((ArrayWritable) data).get();
      final Writable[] mapArray = ((ArrayWritable) mapContainer[0]).get();
      for (final Writable obj : mapArray) {
        ...
      }
      

      We should get rid of this wrapper object to save time and memory.

      Attachments

        1. HIVE-9605.3.patch
          35 kB
          Sergio Peña
        2. HIVE-9605.4.patch
          28 kB
          Sergio Peña
        3. HIVE-9605.5.patch
          28 kB
          Sergio Peña
        4. HIVE-9605.6.patch
          30 kB
          Sergio Peña

        Issue Links

          Activity

            People

              spena Sergio Peña
              spena Sergio Peña
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: