Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-25553

Support Map data-type natively in Arrow format

Details

    Description

      Currently ArrowColumnarBatchSerDe converts map datatype as a list of structs data-type (where stuct is containing the key-value pair of the map). This causes issues when reading Map datatype using llap-ext-client as it reads a list of structs instead. 

      HiveWarehouseConnector which uses the llap-ext-client throws exception when the schema (containing Map data type) is different from actual data (list of structs).

       

      Fixing this issue requires upgrading arrow version (where map data-type is supported), modifying ArrowColumnarBatchSerDe and corresponding Serializer/Deserializer to not use list as a workaround for map and use the arrow map data-type instead. 

      Attachments

        Issue Links

          There are no Sub-Tasks for this issue.

          Activity

            pvary kgyrtkirk ShubhamChaurasia These changes are backward incompatible (not using list to store map). 

            But since this is being used internally by llap (and creating hive tables with arrow format is not supported?), it should not cause any issues.

             

            Let me know if you have any concerns.

            adeshrao Adesh Kumar Rao added a comment - pvary kgyrtkirk ShubhamChaurasia  These changes are backward incompatible (not using list to store map).  But since this is being used internally by llap (and creating hive tables with arrow format is not supported?), it should not cause any issues.   Let me know if you have any concerns.

            Thanks warriersruthi for the contribution! Patch merged to master!

            sankarh Sankar Hariappan added a comment - Thanks warriersruthi for the contribution! Patch merged to master!

            reverted from master:

            sankarh why did you merged the changes even thru the PR was marked as tests-failed? it didn't even had a green testrun!
            http://ci.hive.apache.org/job/hive-precommit/job/PR-2689/

            kgyrtkirk Zoltan Haindrich added a comment - reverted from master: it was committed without a clean testrun 5 tests were broken by these changes one of the test is clearly arrow related(org.apache.hadoop.hive.ql.io.arrow.TestSerializer) http://ci.hive.apache.org/job/hive-precommit/job/master/lastCompletedBuild/testReport/junit/org.apache.hadoop.hive.ql.io.arrow/TestSerializer/Testing___split_06___PostProcess___testEmptyComplexStruct/ sankarh why did you merged the changes even thru the PR was marked as tests-failed? it didn't even had a green testrun! http://ci.hive.apache.org/job/hive-precommit/job/PR-2689/
            sankarh Sankar Hariappan added a comment - - edited

            kgyrtkirk My bad, I noticed the green tick in the title and assumed the tests are passed but missed the "tests-failed" tag.
            Thanks for reverting the patch!

            warriersruthi, Could you pls resubmit the patch and fix those test failures?

            sankarh Sankar Hariappan added a comment - - edited kgyrtkirk My bad, I noticed the green tick in the title and assumed the tests are passed but missed the "tests-failed" tag. Thanks for reverting the patch! warriersruthi , Could you pls resubmit the patch and fix those test failures?

            Merged PR #2751 to master. Thanks warriersruthi!

            sankarh Sankar Hariappan added a comment - Merged PR #2751 to master. Thanks warriersruthi !

            People

              warriersruthi Sruthi Mooriyathvariam
              adeshrao Adesh Kumar Rao
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h
                  3h