Hive
  1. Hive
  2. HIVE-5663

Refactor ORC RecordReader to operate on direct & wrapped ByteBuffers

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.13.0
    • Fix Version/s: 0.13.0
    • Component/s: File Formats
    • Labels:
    • Environment:

      Ubuntu LXC

    • Release Note:
      Refactor ORC RecordReader to use ByteBuffer APIs instead of relying on underlying array()

      Description

      The current ORC RecordReader implementation assumes array structures backing the ByteBuffers it passes around between RecordReaderImpl and Compressed/Uncompressed InStream objects.

      This patch attempts to refactor those assumptions out of both classes, allowing the future use of direct byte buffers within ORC (as might come from HDFS zero-copy readers).

      1. HIVE-5663.03.patch
        21 kB
        Gopal V
      2. HIVE-5663.02.patch
        21 kB
        Gopal V
      3. HIVE-5663.01.patch
        21 kB
        Gopal V

        Activity

        Hide
        Brock Noland added a comment -

        Minor nit:

        Lines like this:

        private ByteBuffer compressed = null;
        

        should be:

        private ByteBuffer compressed;
        

        since uninitialized variables are automatically set to null.

        Show
        Brock Noland added a comment - Minor nit: Lines like this: private ByteBuffer compressed = null; should be: private ByteBuffer compressed; since uninitialized variables are automatically set to null.
        Hide
        Gopal V added a comment -

        Ah, okay. I was just following the original code pattern, which had code like

        private byte[] compressed = null;
        

        which was being replaced with the ByteBuffer.

        I can amend that, but it made my diffs look "same-ish" when I was viewing it side-by-side with the current patch.

        Show
        Gopal V added a comment - Ah, okay. I was just following the original code pattern, which had code like private byte [] compressed = null ; which was being replaced with the ByteBuffer. I can amend that, but it made my diffs look "same-ish" when I was viewing it side-by-side with the current patch.
        Hide
        Brock Noland added a comment -

        Yep I saw that and I think keeping in line with the existing code base is a great goal. However, I don't think that makes sense when there is redundant or non-standard code like this case.

        Thanks for the contribution!

        Show
        Brock Noland added a comment - Yep I saw that and I think keeping in line with the existing code base is a great goal. However, I don't think that makes sense when there is redundant or non-standard code like this case. Thanks for the contribution!
        Hide
        Gopal V added a comment -

        Updated patch as per comments.

        Show
        Gopal V added a comment - Updated patch as per comments.
        Hide
        Gopal V added a comment -

        Opened review request - https://reviews.apache.org/r/15097/

        Show
        Gopal V added a comment - Opened review request - https://reviews.apache.org/r/15097/
        Hide
        Owen O'Malley added a comment -

        This looks good, but a few comments:

        • Please remove the code for OutStream.HEADER_SIZE > 3, since it is always false.
        • Please remove the offsets array from the interface since we are switching to ByteBuffer.

        Other than that, it looks great.

        Show
        Owen O'Malley added a comment - This looks good, but a few comments: Please remove the code for OutStream.HEADER_SIZE > 3, since it is always false. Please remove the offsets array from the interface since we are switching to ByteBuffer. Other than that, it looks great.
        Hide
        Owen O'Malley added a comment -

        Oops, obviously on second thought the second one is wrong. Please do change the HEADER_SIZE > 3 into an assert.

        Show
        Owen O'Malley added a comment - Oops, obviously on second thought the second one is wrong. Please do change the HEADER_SIZE > 3 into an assert.
        Hide
        Gopal V added a comment -

        Rebasing patch to trunk and converting the if() into an assert block with a message.

        Show
        Gopal V added a comment - Rebasing patch to trunk and converting the if() into an assert block with a message.
        Hide
        Owen O'Malley added a comment -

        +1

        Show
        Owen O'Malley added a comment - +1
        Hide
        Hive QA added a comment -

        Overall: -1 at least one tests failed

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12614718/HIVE-5663.03.patch

        ERROR: -1 due to 1 failed/errored test(s), 4665 tests executed
        Failed tests:

        org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket_num_reducers
        

        Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/370/testReport
        Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/370/console

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        Tests failed with: TestsFailedException: 1 tests failed
        

        This message is automatically generated.

        ATTACHMENT ID: 12614718

        Show
        Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12614718/HIVE-5663.03.patch ERROR: -1 due to 1 failed/errored test(s), 4665 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket_num_reducers Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/370/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/370/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 1 tests failed This message is automatically generated. ATTACHMENT ID: 12614718
        Hide
        Gopal V added a comment -

        bucket_num_reducers.q is

        CREATE TABLE bucket_nr(key int, value string) CLUSTERED BY (key) INTO 50 BUCKETS;
        
        insert overwrite table bucket_nr
        select * from src;
        

        And src table is a text table from kv1.txt - trying to confirm if this test fails on trunk without the patch (which should only kick in for ORC tables & that too in the read path only).

        Show
        Gopal V added a comment - bucket_num_reducers.q is CREATE TABLE bucket_nr(key int , value string) CLUSTERED BY (key) INTO 50 BUCKETS; insert overwrite table bucket_nr select * from src; And src table is a text table from kv1.txt - trying to confirm if this test fails on trunk without the patch (which should only kick in for ORC tables & that too in the read path only).
        Hide
        Ashutosh Chauhan added a comment -

        Comitted to trunk. Thanks, Gopal!

        Show
        Ashutosh Chauhan added a comment - Comitted to trunk. Thanks, Gopal!

          People

          • Assignee:
            Gopal V
            Reporter:
            Gopal V
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development