Hive
  1. Hive
  2. HIVE-4724

ORC readers should have a better error detection for non-ORC files

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11.1, 0.12.0
    • Component/s: File Formats
    • Labels:
      None

      Description

      A customer loaded a text file into a table that is stored as ORC. The error message was very unfriendly.

      1. HIVE-4724.D11529.3.patch
        29 kB
        Phabricator
      2. HIVE-4724.D11529.2.patch
        28 kB
        Phabricator

        Activity

        Hide
        Phabricator added a comment -

        omalley requested code review of "HIVE-4724 [jira] ORC readers should have a better error detection for non-ORC files".

        Reviewers: JIRA

        Add better checks for non-ORC files in the ORC reader to fail with a better error message. Also add a version
        check to warn users if they are reading files from a more advanaced version of Hadoop. Added a check so that unknown
        encodings for a column will fail quickly with a good error message.

        TEST PLAN
        EMPTY

        REVISION DETAIL
        https://reviews.facebook.net/D11529

        AFFECTED FILES
        .gitignore
        .idea/.name
        .idea/ant.xml
        .idea/codeStyleSettings.xml
        .idea/compiler.xml
        .idea/copyright/Apache.xml
        .idea/copyright/profiles_settings.xml
        .idea/encodings.xml
        .idea/libraries/default.xml
        .idea/libraries/hadoop0_20S_shim.xml
        .idea/libraries/hadoop0_20_shim.xml
        .idea/libraries/hadoop0_23.xml
        .idea/misc.xml
        .idea/modules.xml
        .idea/scopes/scope_settings.xml
        .idea/uiDesigner.xml
        .idea/vcs.xml
        .idea/workspace.xml
        ant/Ant.iml
        builtins/Builtins.iml
        cli/src/Cli.iml
        common/src/Common.iml
        contrib/src/Contrib.iml
        hbase-handler/src/Hbase-handler.iml
        hwi/src/Hwi.iml
        jdbc/src/Jdbc.iml
        metastore/src/Metastore.iml
        metastore/src/gen/thrift/Thrift.iml
        metastore/src/test/Metastore-test.iml
        pdk/src/Pdk.iml
        pdk/test-plugin/Test-plugin.iml
        ql/src/Ql.iml
        ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java
        ql/src/gen/thrift/Thrift1.iml
        ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java
        ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java
        ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
        ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
        ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto
        ql/src/test/Ql-test.iml
        serde/src/Serde.iml
        serde/src/gen/protobuf/Protobuf.iml
        serde/src/gen/thrift/Thrift2.iml
        serde/src/test/Serde-test.iml
        service/src/Service.iml
        service/src/gen/thrift/Thrift3.iml
        shims/src/0.20/Shims-0.20.iml
        shims/src/0.20S/Shims-0.20S.iml
        shims/src/0.23/Shims-0.23.iml
        shims/src/Shims.iml
        shims/src/common-secure/Shims-secure.iml
        shims/src/test/Shims-test.iml

        MANAGE HERALD RULES
        https://reviews.facebook.net/herald/view/differential/

        WHY DID I GET THIS EMAIL?
        https://reviews.facebook.net/herald/transcript/27255/

        To: JIRA, omalley

        Show
        Phabricator added a comment - omalley requested code review of " HIVE-4724 [jira] ORC readers should have a better error detection for non-ORC files". Reviewers: JIRA Add better checks for non-ORC files in the ORC reader to fail with a better error message. Also add a version check to warn users if they are reading files from a more advanaced version of Hadoop. Added a check so that unknown encodings for a column will fail quickly with a good error message. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D11529 AFFECTED FILES .gitignore .idea/.name .idea/ant.xml .idea/codeStyleSettings.xml .idea/compiler.xml .idea/copyright/Apache.xml .idea/copyright/profiles_settings.xml .idea/encodings.xml .idea/libraries/default.xml .idea/libraries/hadoop0_20S_shim.xml .idea/libraries/hadoop0_20_shim.xml .idea/libraries/hadoop0_23.xml .idea/misc.xml .idea/modules.xml .idea/scopes/scope_settings.xml .idea/uiDesigner.xml .idea/vcs.xml .idea/workspace.xml ant/Ant.iml builtins/Builtins.iml cli/src/Cli.iml common/src/Common.iml contrib/src/Contrib.iml hbase-handler/src/Hbase-handler.iml hwi/src/Hwi.iml jdbc/src/Jdbc.iml metastore/src/Metastore.iml metastore/src/gen/thrift/Thrift.iml metastore/src/test/Metastore-test.iml pdk/src/Pdk.iml pdk/test-plugin/Test-plugin.iml ql/src/Ql.iml ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java ql/src/gen/thrift/Thrift1.iml ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto ql/src/test/Ql-test.iml serde/src/Serde.iml serde/src/gen/protobuf/Protobuf.iml serde/src/gen/thrift/Thrift2.iml serde/src/test/Serde-test.iml service/src/Service.iml service/src/gen/thrift/Thrift3.iml shims/src/0.20/Shims-0.20.iml shims/src/0.20S/Shims-0.20S.iml shims/src/0.23/Shims-0.23.iml shims/src/Shims.iml shims/src/common-secure/Shims-secure.iml shims/src/test/Shims-test.iml MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/27255/ To: JIRA, omalley
        Hide
        Phabricator added a comment -

        omalley updated the revision "HIVE-4724 [jira] ORC readers should have a better error detection for non-ORC files".

        Remove extra files from my workspace that slipped in.

        Reviewers: JIRA

        REVISION DETAIL
        https://reviews.facebook.net/D11529

        CHANGE SINCE LAST DIFF
        https://reviews.facebook.net/D11529?vs=35175&id=35199#toc

        AFFECTED FILES
        ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java
        ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java
        ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java
        ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
        ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
        ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto

        To: JIRA, omalley

        Show
        Phabricator added a comment - omalley updated the revision " HIVE-4724 [jira] ORC readers should have a better error detection for non-ORC files". Remove extra files from my workspace that slipped in. Reviewers: JIRA REVISION DETAIL https://reviews.facebook.net/D11529 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D11529?vs=35175&id=35199#toc AFFECTED FILES ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto To: JIRA, omalley
        Hide
        Phabricator added a comment -

        ashutoshc has commented on the revision "HIVE-4724 [jira] ORC readers should have a better error detection for non-ORC files".

        Patch looks alright. But wondering tying orc version with hive version is a good idea ?

        INLINE COMMENTS
        ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java:35 Is this tied to hive version? Also, since 0.11 is already released, this should be 12.

        More importantly though, does this mean this version need to be bumped when hive moves to next version even though there is no change in ORC itself?

        REVISION DETAIL
        https://reviews.facebook.net/D11529

        To: JIRA, omalley
        Cc: ashutoshc

        Show
        Phabricator added a comment - ashutoshc has commented on the revision " HIVE-4724 [jira] ORC readers should have a better error detection for non-ORC files". Patch looks alright. But wondering tying orc version with hive version is a good idea ? INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java:35 Is this tied to hive version? Also, since 0.11 is already released, this should be 12. More importantly though, does this mean this version need to be bumped when hive moves to next version even though there is no change in ORC itself? REVISION DETAIL https://reviews.facebook.net/D11529 To: JIRA, omalley Cc: ashutoshc
        Hide
        Owen O'Malley added a comment -

        Added comment explaining the version string.

        Show
        Owen O'Malley added a comment - Added comment explaining the version string.
        Hide
        Phabricator added a comment -

        omalley updated the revision "HIVE-4724 [jira] ORC readers should have a better error detection for non-ORC files".

        Updated comment to explain the use of the version.

        Reviewers: JIRA

        REVISION DETAIL
        https://reviews.facebook.net/D11529

        CHANGE SINCE LAST DIFF
        https://reviews.facebook.net/D11529?vs=35199&id=35829#toc

        AFFECTED FILES
        ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java
        ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java
        ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java
        ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
        ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
        ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto

        To: JIRA, omalley
        Cc: ashutoshc

        Show
        Phabricator added a comment - omalley updated the revision " HIVE-4724 [jira] ORC readers should have a better error detection for non-ORC files". Updated comment to explain the use of the version. Reviewers: JIRA REVISION DETAIL https://reviews.facebook.net/D11529 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D11529?vs=35199&id=35829#toc AFFECTED FILES ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto To: JIRA, omalley Cc: ashutoshc
        Hide
        Phabricator added a comment -

        ashutoshc has accepted the revision "HIVE-4724 [jira] ORC readers should have a better error detection for non-ORC files".

        +1 Make sense. Thanks, Owen for adding comments. That clarifies my confusion.

        REVISION DETAIL
        https://reviews.facebook.net/D11529

        BRANCH
        h-4724

        ARCANIST PROJECT
        hive

        To: JIRA, ashutoshc, omalley
        Cc: ashutoshc

        Show
        Phabricator added a comment - ashutoshc has accepted the revision " HIVE-4724 [jira] ORC readers should have a better error detection for non-ORC files". +1 Make sense. Thanks, Owen for adding comments. That clarifies my confusion. REVISION DETAIL https://reviews.facebook.net/D11529 BRANCH h-4724 ARCANIST PROJECT hive To: JIRA, ashutoshc, omalley Cc: ashutoshc
        Hide
        Edward Capriolo added a comment -

        Am I mistaken or are we checking in the protobuf generated files?

        Would it be better to use an ant target

        <exec executable="protoc">
        <arg value="--java_out=outdir" />
        <arg value="--proto_path=srcdir" />
        <arg value="srcdir/path/to/input.proto" />
        </exec>

        And not check in the generated files? I belive this is what we do with thrift.

        Show
        Edward Capriolo added a comment - Am I mistaken or are we checking in the protobuf generated files? Would it be better to use an ant target <exec executable="protoc"> <arg value="--java_out=outdir" /> <arg value="--proto_path=srcdir" /> <arg value="srcdir/path/to/input.proto" /> </exec> And not check in the generated files? I belive this is what we do with thrift.
        Hide
        Owen O'Malley added a comment -

        We had the discussion about whether to commit the generated files when ORC was committed. Thrift and protobuf generated files are currently checked in so that they are not required to build Hive. The ant target to regenerate the protobuf generated code is already there:

        % ant protobuf
        

        will regenerate the code.

        Show
        Owen O'Malley added a comment - We had the discussion about whether to commit the generated files when ORC was committed. Thrift and protobuf generated files are currently checked in so that they are not required to build Hive. The ant target to regenerate the protobuf generated code is already there: % ant protobuf will regenerate the code.
        Hide
        Owen O'Malley added a comment -

        I just committed this.

        Show
        Owen O'Malley added a comment - I just committed this.
        Hide
        Ashutosh Chauhan added a comment -

        This issue has been fixed and released as part of 0.12 release. If you find further issues, please create a new jira and link it to this one.

        Show
        Ashutosh Chauhan added a comment - This issue has been fixed and released as part of 0.12 release. If you find further issues, please create a new jira and link it to this one.

          People

          • Assignee:
            Owen O'Malley
            Reporter:
            Owen O'Malley
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development