Hive
  1. Hive
  2. HIVE-5325

Implement statistics providing ORC writer and reader interfaces

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.13.0
    • Fix Version/s: 0.13.0
    • Component/s: None

      Description

      HIVE-5324 adds new interfaces that can be implemented by ORC reader/writer to provide statistics. Writer provided statistics is used to update table/partition level statistics in metastore. Reader provided statistics can be used for reducer estimation, CBO etc. in the absence of metastore statistics.

      1. HIVE-5325-java-only.1.patch.txt
        75 kB
        Prasanth J
      2. HIVE-5325.1.patch.txt
        143 kB
        Prasanth J
      3. HIVE-5325-java-only.2.patch.txt
        81 kB
        Prasanth J
      4. HIVE-5325.2.patch.txt
        150 kB
        Prasanth J
      5. HIVE-5325-java-only.3.patch.txt
        77 kB
        Prasanth J
      6. HIVE-5325.3.patch.txt
        145 kB
        Prasanth J

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          ABORTED: Integrated in Hive-trunk-hadoop2 #468 (See https://builds.apache.org/job/Hive-trunk-hadoop2/468/)
          HIVE-5325 : Implement statistics providing ORC writer and reader interfaces (Prasanth J via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1528108)

          • /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/BinaryColumnStatistics.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatisticsImpl.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringColumnStatistics.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/util/JavaDataModel.java
          • /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcSerDeStats.java
          • /hive/trunk/ql/src/test/resources/orc-file-dump-dictionary-threshold.out
          • /hive/trunk/ql/src/test/resources/orc-file-dump.out
          Show
          Hudson added a comment - ABORTED: Integrated in Hive-trunk-hadoop2 #468 (See https://builds.apache.org/job/Hive-trunk-hadoop2/468/ ) HIVE-5325 : Implement statistics providing ORC writer and reader interfaces (Prasanth J via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1528108 ) /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/BinaryColumnStatistics.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatisticsImpl.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringColumnStatistics.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/util/JavaDataModel.java /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcSerDeStats.java /hive/trunk/ql/src/test/resources/orc-file-dump-dictionary-threshold.out /hive/trunk/ql/src/test/resources/orc-file-dump.out
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #188 (See https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/188/)
          HIVE-5325 : Implement statistics providing ORC writer and reader interfaces (Prasanth J via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1528108)

          • /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/BinaryColumnStatistics.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatisticsImpl.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringColumnStatistics.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/util/JavaDataModel.java
          • /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcSerDeStats.java
          • /hive/trunk/ql/src/test/resources/orc-file-dump-dictionary-threshold.out
          • /hive/trunk/ql/src/test/resources/orc-file-dump.out
          Show
          Hudson added a comment - SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #188 (See https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/188/ ) HIVE-5325 : Implement statistics providing ORC writer and reader interfaces (Prasanth J via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1528108 ) /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/BinaryColumnStatistics.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatisticsImpl.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringColumnStatistics.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/util/JavaDataModel.java /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcSerDeStats.java /hive/trunk/ql/src/test/resources/orc-file-dump-dictionary-threshold.out /hive/trunk/ql/src/test/resources/orc-file-dump.out
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hive-trunk-hadoop2-ptest #122 (See https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/122/)
          HIVE-5325 : Implement statistics providing ORC writer and reader interfaces (Prasanth J via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1528108)

          • /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/BinaryColumnStatistics.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatisticsImpl.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringColumnStatistics.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/util/JavaDataModel.java
          • /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcSerDeStats.java
          • /hive/trunk/ql/src/test/resources/orc-file-dump-dictionary-threshold.out
          • /hive/trunk/ql/src/test/resources/orc-file-dump.out
          Show
          Hudson added a comment - FAILURE: Integrated in Hive-trunk-hadoop2-ptest #122 (See https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/122/ ) HIVE-5325 : Implement statistics providing ORC writer and reader interfaces (Prasanth J via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1528108 ) /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/BinaryColumnStatistics.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatisticsImpl.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringColumnStatistics.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/util/JavaDataModel.java /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcSerDeStats.java /hive/trunk/ql/src/test/resources/orc-file-dump-dictionary-threshold.out /hive/trunk/ql/src/test/resources/orc-file-dump.out
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hive-trunk-h0.21 #2370 (See https://builds.apache.org/job/Hive-trunk-h0.21/2370/)
          HIVE-5325 : Implement statistics providing ORC writer and reader interfaces (Prasanth J via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1528108)

          • /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/BinaryColumnStatistics.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatisticsImpl.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringColumnStatistics.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/util/JavaDataModel.java
          • /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcSerDeStats.java
          • /hive/trunk/ql/src/test/resources/orc-file-dump-dictionary-threshold.out
          • /hive/trunk/ql/src/test/resources/orc-file-dump.out
          Show
          Hudson added a comment - FAILURE: Integrated in Hive-trunk-h0.21 #2370 (See https://builds.apache.org/job/Hive-trunk-h0.21/2370/ ) HIVE-5325 : Implement statistics providing ORC writer and reader interfaces (Prasanth J via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1528108 ) /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/BinaryColumnStatistics.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatisticsImpl.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringColumnStatistics.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/util/JavaDataModel.java /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcSerDeStats.java /hive/trunk/ql/src/test/resources/orc-file-dump-dictionary-threshold.out /hive/trunk/ql/src/test/resources/orc-file-dump.out
          Hide
          Ashutosh Chauhan added a comment -

          Committed to trunk. Thanks, Prasanth!

          Show
          Ashutosh Chauhan added a comment - Committed to trunk. Thanks, Prasanth!
          Hide
          Hive QA added a comment -

          Overall: +1 all checks pass

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12606028/HIVE-5325.3.patch.txt

          SUCCESS: +1 4077 tests passed

          Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/978/testReport
          Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/978/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          

          This message is automatically generated.

          Show
          Hive QA added a comment - Overall : +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12606028/HIVE-5325.3.patch.txt SUCCESS: +1 4077 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/978/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/978/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase This message is automatically generated.
          Hide
          Hive QA added a comment -

          Overall: +1 all checks pass

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12606028/HIVE-5325.3.patch.txt

          SUCCESS: +1 4077 tests passed

          Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/977/testReport
          Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/977/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          

          This message is automatically generated.

          Show
          Hive QA added a comment - Overall : +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12606028/HIVE-5325.3.patch.txt SUCCESS: +1 4077 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/977/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/977/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase This message is automatically generated.
          Hide
          Ashutosh Chauhan added a comment -

          +1

          It will pick the latest patch which conforms to naming convention.

          Show
          Ashutosh Chauhan added a comment - +1 It will pick the latest patch which conforms to naming convention.
          Hide
          Prasanth J added a comment -

          Ashutosh Chauhan which among the two patch will HiveQA pick? I have java-only patch and protobuf generated code patch.

          Show
          Prasanth J added a comment - Ashutosh Chauhan which among the two patch will HiveQA pick? I have java-only patch and protobuf generated code patch.
          Hide
          Prasanth J added a comment -

          Fixed a failed test case.

          Show
          Prasanth J added a comment - Fixed a failed test case.
          Hide
          Prasanth J added a comment -

          Ashutosh Chauhan addressed your code review comments. Updated RB with new patch as well.

          Show
          Prasanth J added a comment - Ashutosh Chauhan addressed your code review comments. Updated RB with new patch as well.
          Hide
          Ashutosh Chauhan added a comment -

          Left some comments on RB.

          Show
          Ashutosh Chauhan added a comment - Left some comments on RB.
          Hide
          Prasanth J added a comment -

          Ashutosh Chauhan refreshed the patch. Updated RB as well. Its now ready for review

          Show
          Prasanth J added a comment - Ashutosh Chauhan refreshed the patch. Updated RB as well. Its now ready for review
          Hide
          Prasanth J added a comment -

          Refreshed the patch after HIVE-5324 changes. Added implementation for the new interface getRawDataSizeOfColumns(List<String> colNames).

          Show
          Prasanth J added a comment - Refreshed the patch after HIVE-5324 changes. Added implementation for the new interface getRawDataSizeOfColumns(List<String> colNames).
          Hide
          Prasanth J added a comment -

          The attached patch is generated on top of HIVE-5324.

          Here is RB link https://reviews.apache.org/r/14243/

          Show
          Prasanth J added a comment - The attached patch is generated on top of HIVE-5324 . Here is RB link https://reviews.apache.org/r/14243/

            People

            • Assignee:
              Prasanth J
              Reporter:
              Prasanth J
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development