Details

    • Release Note:
      Added support for 'STORED AS PARQUET' and for setting parquet as the default storage engine.

      Description

      Problem Statement:

      Hive would be easier to use if it had native Parquet support. Our organization, Criteo, uses Hive extensively. Therefore we built the Parquet Hive integration and would like to now contribute that integration to Hive.

      About Parquet:

      Parquet is a columnar storage format for Hadoop and integrates with many Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native Parquet integration.

      Changes Details:

      Parquet was built with dependency management in mind and therefore only a single Parquet jar will be added as a dependency.

      1. HIVE-5783.noprefix.patch
        196 kB
        Brock Noland
      2. HIVE-5783.noprefix.patch
        196 kB
        Brock Noland
      3. HIVE-5783.patch
        196 kB
        Brock Noland
      4. HIVE-5783.patch
        196 kB
        Brock Noland
      5. HIVE-5783.patch
        196 kB
        Brock Noland
      6. HIVE-5783.patch
        196 kB
        Brock Noland
      7. HIVE-5783.patch
        197 kB
        Brock Noland
      8. HIVE-5783.patch
        195 kB
        Brock Noland
      9. HIVE-5783.patch
        194 kB
        Brock Noland
      10. HIVE-5783.patch
        192 kB
        Brock Noland
      11. HIVE-5783.patch
        172 kB
        Brock Noland
      12. HIVE-5783.patch
        173 kB
        Brock Noland
      13. HIVE-5783.patch
        173 kB
        Brock Noland
      14. HIVE-5783.patch
        180 kB
        Justin Coffey
      15. HIVE-5783.patch
        199 kB
        Brock Noland
      16. HIVE-5783.patch
        199 kB
        Brock Noland
      17. HIVE-5783.patch
        171 kB
        Justin Coffey
      18. HIVE-5783.patch
        196 kB
        Brock Noland
      19. HIVE-5783.patch
        9 kB
        Xuefu Zhang

        Issue Links

          Activity

          Hide
          Carl Steinbach added a comment -

          Justin Coffey I added you to the list of Hive contributors on JIRA. Feel free to assign this ticket to yourself. Thanks.

          Show
          Carl Steinbach added a comment - Justin Coffey I added you to the list of Hive contributors on JIRA. Feel free to assign this ticket to yourself. Thanks.
          Hide
          Eric Hanson added a comment -

          One thing you may want to consider is adding a vectorized InputFormat for Parquet that works with the Hive vectorized query execution capability. This should allow you to get faster query execution over Parquet on Hive. Vectorization dovetails well with columnar storage formats. The vectorization code currently supports ORC. But the design of vectorized execution is independent of the physical data storage format. The rules for a vectorized iterator are described in the section "Vectorized Iterator" in the latest design document attached to https://issues.apache.org/jira/browse/HIVE-4160. By looking at that section of the design document, and the vectorized iterator source code for ORC, you should be able to determine how to add a vectorized iterator for Parquet.

          Show
          Eric Hanson added a comment - One thing you may want to consider is adding a vectorized InputFormat for Parquet that works with the Hive vectorized query execution capability. This should allow you to get faster query execution over Parquet on Hive. Vectorization dovetails well with columnar storage formats. The vectorization code currently supports ORC. But the design of vectorized execution is independent of the physical data storage format. The rules for a vectorized iterator are described in the section "Vectorized Iterator" in the latest design document attached to https://issues.apache.org/jira/browse/HIVE-4160 . By looking at that section of the design document, and the vectorized iterator source code for ORC, you should be able to determine how to add a vectorized iterator for Parquet.
          Hide
          Justin Coffey added a comment -

          Thanks Carl Steinbach and Eric Hanson. Regarding vectorization support the parquet team will review ASAP!

          Show
          Justin Coffey added a comment - Thanks Carl Steinbach and Eric Hanson . Regarding vectorization support the parquet team will review ASAP!
          Hide
          Justin Coffey added a comment -

          built and tested against hive 0.11--a rebase will be necessary to work against the trunk

          Show
          Justin Coffey added a comment - built and tested against hive 0.11--a rebase will be necessary to work against the trunk
          Hide
          Edward Capriolo added a comment -

          Why does support need to be build directly into the semantic analyzer? I think input format/serde's should be decoupled from the hive code as much as possible. hard codes like this make it hard to evolve support. I think you should be only adding the libs as a dependency to the pom files and building some tests.

          Show
          Edward Capriolo added a comment - Why does support need to be build directly into the semantic analyzer? I think input format/serde's should be decoupled from the hive code as much as possible. hard codes like this make it hard to evolve support. I think you should be only adding the libs as a dependency to the pom files and building some tests.
          Hide
          Xuefu Zhang added a comment -

          Justin Coffey Thanks for your contribution. I can help rebase with the latest trunk. However, are you sure your patch is complete? I don't see any new files as expected.

          Show
          Xuefu Zhang added a comment - Justin Coffey Thanks for your contribution. I can help rebase with the latest trunk. However, are you sure your patch is complete? I don't see any new files as expected.
          Hide
          Brock Noland added a comment -

          Why does support need to be build directly into the semantic analyzer?

          At present this is required to get STORED AS.

          I think input format/serde's should be decoupled from the hive code as much as possible. hard codes like this make it hard to evolve support.

          Yes, I agree. We should have some kind of registration system. I have created a jira for that HIVE-5976, but I don't see that as a blocker.

          Show
          Brock Noland added a comment - Why does support need to be build directly into the semantic analyzer? At present this is required to get STORED AS. I think input format/serde's should be decoupled from the hive code as much as possible. hard codes like this make it hard to evolve support. Yes, I agree. We should have some kind of registration system. I have created a jira for that HIVE-5976 , but I don't see that as a blocker.
          Hide
          Brock Noland added a comment -

          I don't see any new files as expected.

          It looks complete to me.

          Show
          Brock Noland added a comment - I don't see any new files as expected. It looks complete to me.
          Hide
          Justin Coffey added a comment -

          Edward Capriolo, regarding the support being built into the semantic analyzer, I mimicked what was done for ORC support. I agree that a hard coded switch statement is not the best approach, but thought a larger refactoring was out of scope for this request--and definitely not something to be done against the 0.11 branch . Now with trunk support for parquet-hive I suppose we could tackle this in a more generic/robust way.

          Xuefu Zhang, do you mean the actual parquet input/output formats and serde? If so, these are in the parquet-hive project (https://github.com/Parquet/parquet-mr/tree/master/parquet-hive).

          Show
          Justin Coffey added a comment - Edward Capriolo , regarding the support being built into the semantic analyzer, I mimicked what was done for ORC support. I agree that a hard coded switch statement is not the best approach, but thought a larger refactoring was out of scope for this request--and definitely not something to be done against the 0.11 branch . Now with trunk support for parquet-hive I suppose we could tackle this in a more generic/robust way. Xuefu Zhang , do you mean the actual parquet input/output formats and serde? If so, these are in the parquet-hive project ( https://github.com/Parquet/parquet-mr/tree/master/parquet-hive ).
          Hide
          Edward Capriolo added a comment -


          regarding the support being built into the semantic analyzer, I mimicked what was done for ORC support

          I think that was done before maven. I am sure there is a reason why RCFILE, ORCFILE and this add there own syntax, but this is something we might not want to copy-and-paste repeat just because the last person did it that way.

          Show
          Edward Capriolo added a comment - regarding the support being built into the semantic analyzer, I mimicked what was done for ORC support I think that was done before maven. I am sure there is a reason why RCFILE, ORCFILE and this add there own syntax, but this is something we might not want to copy-and-paste repeat just because the last person did it that way.
          Hide
          Justin Coffey added a comment -

          I think that was done before maven. I am sure there is a reason why RCFILE, ORCFILE and this add there own syntax, but this is something we might not want to copy-and-paste repeat just because the last person did it that way.

          I would normally agree with this, but I suppose I was trying to make as minor a change as possible.

          Show
          Justin Coffey added a comment - I think that was done before maven. I am sure there is a reason why RCFILE, ORCFILE and this add there own syntax, but this is something we might not want to copy-and-paste repeat just because the last person did it that way. I would normally agree with this, but I suppose I was trying to make as minor a change as possible.
          Hide
          Xuefu Zhang added a comment -

          Justin Coffey To rebase, we need to specify external dependency in Hive 0.13 pom file. What external lib does your patch need, such repo, groupid, artifactid, and version?

          Show
          Xuefu Zhang added a comment - Justin Coffey To rebase, we need to specify external dependency in Hive 0.13 pom file. What external lib does your patch need, such repo, groupid, artifactid, and version?
          Hide
          Edward Capriolo added a comment - - edited

          I would normally agree with this, but I suppose I was trying to make as minor a change as possible.

          Right I am not demanding that we do it one way or the other, just pointing out that we should not build tech dept. hive does not have a dedicated cleanup crew to handle all the non-sexy features

          Show
          Edward Capriolo added a comment - - edited I would normally agree with this, but I suppose I was trying to make as minor a change as possible. Right I am not demanding that we do it one way or the other, just pointing out that we should not build tech dept. hive does not have a dedicated cleanup crew to handle all the non-sexy features
          Hide
          Xuefu Zhang added a comment -

          Patch HIVE-5783.patch was same as original but rebased with trunk. Patch doesn't build pending pom file changes.

          Show
          Xuefu Zhang added a comment - Patch HIVE-5783 .patch was same as original but rebased with trunk. Patch doesn't build pending pom file changes.
          Hide
          Hive QA added a comment -

          Overall: -1 no tests executed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12617485/HIVE-5783.patch

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/557/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/557/console

          Messages:

          **** This message was trimmed, see log for full details ****
          Decision can match input such as "KW_ORDER KW_BY LPAREN" using multiple alternatives: 1, 2
          
          As a result, alternative(s) 2 were disabled for that input
          warning(200): IdentifiersParser.g:121:5: 
          Decision can match input such as "KW_CLUSTER KW_BY LPAREN" using multiple alternatives: 1, 2
          
          As a result, alternative(s) 2 were disabled for that input
          warning(200): IdentifiersParser.g:133:5: 
          Decision can match input such as "KW_PARTITION KW_BY LPAREN" using multiple alternatives: 1, 2
          
          As a result, alternative(s) 2 were disabled for that input
          warning(200): IdentifiersParser.g:144:5: 
          Decision can match input such as "KW_DISTRIBUTE KW_BY LPAREN" using multiple alternatives: 1, 2
          
          As a result, alternative(s) 2 were disabled for that input
          warning(200): IdentifiersParser.g:155:5: 
          Decision can match input such as "KW_SORT KW_BY LPAREN" using multiple alternatives: 1, 2
          
          As a result, alternative(s) 2 were disabled for that input
          warning(200): IdentifiersParser.g:172:7: 
          Decision can match input such as "STAR" using multiple alternatives: 1, 2
          
          As a result, alternative(s) 2 were disabled for that input
          warning(200): IdentifiersParser.g:185:5: 
          Decision can match input such as "KW_UNIONTYPE" using multiple alternatives: 5, 6
          
          As a result, alternative(s) 6 were disabled for that input
          warning(200): IdentifiersParser.g:185:5: 
          Decision can match input such as "KW_STRUCT" using multiple alternatives: 4, 6
          
          As a result, alternative(s) 6 were disabled for that input
          warning(200): IdentifiersParser.g:185:5: 
          Decision can match input such as "KW_ARRAY" using multiple alternatives: 2, 6
          
          As a result, alternative(s) 6 were disabled for that input
          warning(200): IdentifiersParser.g:267:5: 
          Decision can match input such as "KW_DATE StringLiteral" using multiple alternatives: 2, 3
          
          As a result, alternative(s) 3 were disabled for that input
          warning(200): IdentifiersParser.g:267:5: 
          Decision can match input such as "KW_NULL" using multiple alternatives: 1, 8
          
          As a result, alternative(s) 8 were disabled for that input
          warning(200): IdentifiersParser.g:267:5: 
          Decision can match input such as "KW_FALSE" using multiple alternatives: 3, 8
          
          As a result, alternative(s) 8 were disabled for that input
          warning(200): IdentifiersParser.g:267:5: 
          Decision can match input such as "KW_TRUE" using multiple alternatives: 3, 8
          
          As a result, alternative(s) 8 were disabled for that input
          warning(200): IdentifiersParser.g:399:5: 
          Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_ORDER KW_BY" using multiple alternatives: 2, 9
          
          As a result, alternative(s) 9 were disabled for that input
          warning(200): IdentifiersParser.g:399:5: 
          Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_LATERAL KW_VIEW" using multiple alternatives: 2, 9
          
          As a result, alternative(s) 9 were disabled for that input
          warning(200): IdentifiersParser.g:399:5: 
          Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_INSERT KW_INTO" using multiple alternatives: 2, 9
          
          As a result, alternative(s) 9 were disabled for that input
          warning(200): IdentifiersParser.g:399:5: 
          Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_DISTRIBUTE KW_BY" using multiple alternatives: 2, 9
          
          As a result, alternative(s) 9 were disabled for that input
          warning(200): IdentifiersParser.g:399:5: 
          Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_INSERT KW_OVERWRITE" using multiple alternatives: 2, 9
          
          As a result, alternative(s) 9 were disabled for that input
          warning(200): IdentifiersParser.g:399:5: 
          Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_SORT KW_BY" using multiple alternatives: 2, 9
          
          As a result, alternative(s) 9 were disabled for that input
          warning(200): IdentifiersParser.g:399:5: 
          Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_MAP LPAREN" using multiple alternatives: 2, 9
          
          As a result, alternative(s) 9 were disabled for that input
          warning(200): IdentifiersParser.g:399:5: 
          Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_CLUSTER KW_BY" using multiple alternatives: 2, 9
          
          As a result, alternative(s) 9 were disabled for that input
          warning(200): IdentifiersParser.g:399:5: 
          Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_GROUP KW_BY" using multiple alternatives: 2, 9
          
          As a result, alternative(s) 9 were disabled for that input
          warning(200): IdentifiersParser.g:399:5: 
          Decision can match input such as "KW_BETWEEN KW_MAP LPAREN" using multiple alternatives: 8, 9
          
          As a result, alternative(s) 9 were disabled for that input
          warning(200): IdentifiersParser.g:524:5: 
          Decision can match input such as "{AMPERSAND..BITWISEXOR, DIV..DIVIDE, EQUAL..EQUAL_NS, GREATERTHAN..GREATERTHANOREQUALTO, KW_AND, KW_ARRAY, KW_BETWEEN..KW_BOOLEAN, KW_CASE, KW_DOUBLE, KW_FLOAT, KW_IF, KW_IN, KW_INT, KW_LIKE, KW_MAP, KW_NOT, KW_OR, KW_REGEXP, KW_RLIKE, KW_SMALLINT, KW_STRING..KW_STRUCT, KW_TINYINT, KW_UNIONTYPE, KW_WHEN, LESSTHAN..LESSTHANOREQUALTO, MINUS..NOTEQUAL, PLUS, STAR, TILDE}" using multiple alternatives: 1, 3
          
          As a result, alternative(s) 3 were disabled for that input
          [INFO] 
          [INFO] --- maven-resources-plugin:2.5:resources (default-resources) @ hive-exec ---
          [debug] execute contextualize
          [INFO] Using 'UTF-8' encoding to copy filtered resources.
          [INFO] Copying 1 resource
          [INFO] 
          [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-exec ---
          [INFO] Executing tasks
          
          main:
          [INFO] Executed tasks
          [INFO] 
          [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-exec ---
          [INFO] Compiling 1412 source files to /data/hive-ptest/working/apache-svn-trunk-source/ql/target/classes
          [INFO] -------------------------------------------------------------
          [WARNING] COMPILATION WARNING : 
          [INFO] -------------------------------------------------------------
          [WARNING] Note: Some input files use or override a deprecated API.
          [WARNING] Note: Recompile with -Xlint:deprecation for details.
          [WARNING] Note: Some input files use unchecked or unsafe operations.
          [WARNING] Note: Recompile with -Xlint:unchecked for details.
          [INFO] 4 warnings 
          [INFO] -------------------------------------------------------------
          [INFO] -------------------------------------------------------------
          [ERROR] COMPILATION ERROR : 
          [INFO] -------------------------------------------------------------
          [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java:[84,20] package parquet.hive does not exist
          [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java:[85,20] package parquet.hive does not exist
          [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java:[86,26] package parquet.hive.serde does not exist
          [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java:[143,53] cannot find symbol
          symbol  : class MapredParquetInputFormat
          location: class org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer
          [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java:[144,54] cannot find symbol
          symbol  : class MapredParquetOutputFormat
          location: class org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer
          [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java:[145,53] cannot find symbol
          symbol  : class ParquetHiveSerDe
          location: class org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer
          [INFO] 6 errors 
          [INFO] -------------------------------------------------------------
          [INFO] ------------------------------------------------------------------------
          [INFO] Reactor Summary:
          [INFO] 
          [INFO] Hive .............................................. SUCCESS [4.823s]
          [INFO] Hive Ant Utilities ................................ SUCCESS [7.428s]
          [INFO] Hive Shims Common ................................. SUCCESS [3.338s]
          [INFO] Hive Shims 0.20 ................................... SUCCESS [2.407s]
          [INFO] Hive Shims Secure Common .......................... SUCCESS [2.713s]
          [INFO] Hive Shims 0.20S .................................. SUCCESS [1.347s]
          [INFO] Hive Shims 0.23 ................................... SUCCESS [2.960s]
          [INFO] Hive Shims ........................................ SUCCESS [3.451s]
          [INFO] Hive Common ....................................... SUCCESS [9.701s]
          [INFO] Hive Serde ........................................ SUCCESS [12.315s]
          [INFO] Hive Metastore .................................... SUCCESS [26.493s]
          [INFO] Hive Query Language ............................... FAILURE [27.614s]
          [INFO] Hive Service ...................................... SKIPPED
          [INFO] Hive JDBC ......................................... SKIPPED
          [INFO] Hive Beeline ...................................... SKIPPED
          [INFO] Hive CLI .......................................... SKIPPED
          [INFO] Hive Contrib ...................................... SKIPPED
          [INFO] Hive HBase Handler ................................ SKIPPED
          [INFO] Hive HCatalog ..................................... SKIPPED
          [INFO] Hive HCatalog Core ................................ SKIPPED
          [INFO] Hive HCatalog Pig Adapter ......................... SKIPPED
          [INFO] Hive HCatalog Server Extensions ................... SKIPPED
          [INFO] Hive HCatalog Webhcat Java Client ................. SKIPPED
          [INFO] Hive HCatalog Webhcat ............................. SKIPPED
          [INFO] Hive HCatalog HBase Storage Handler ............... SKIPPED
          [INFO] Hive HWI .......................................... SKIPPED
          [INFO] Hive ODBC ......................................... SKIPPED
          [INFO] Hive Shims Aggregator ............................. SKIPPED
          [INFO] Hive TestUtils .................................... SKIPPED
          [INFO] Hive Packaging .................................... SKIPPED
          [INFO] ------------------------------------------------------------------------
          [INFO] BUILD FAILURE
          [INFO] ------------------------------------------------------------------------
          [INFO] Total time: 1:47.416s
          [INFO] Finished at: Fri Dec 06 17:41:14 EST 2013
          [INFO] Final Memory: 59M/506M
          [INFO] ------------------------------------------------------------------------
          [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hive-exec: Compilation failure: Compilation failure:
          [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java:[84,20] package parquet.hive does not exist
          [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java:[85,20] package parquet.hive does not exist
          [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java:[86,26] package parquet.hive.serde does not exist
          [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java:[143,53] cannot find symbol
          [ERROR] symbol  : class MapredParquetInputFormat
          [ERROR] location: class org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer
          [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java:[144,54] cannot find symbol
          [ERROR] symbol  : class MapredParquetOutputFormat
          [ERROR] location: class org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer
          [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java:[145,53] cannot find symbol
          [ERROR] symbol  : class ParquetHiveSerDe
          [ERROR] location: class org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer
          [ERROR] -> [Help 1]
          [ERROR] 
          [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
          [ERROR] Re-run Maven using the -X switch to enable full debug logging.
          [ERROR] 
          [ERROR] For more information about the errors and possible solutions, please read the following articles:
          [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
          [ERROR] 
          [ERROR] After correcting the problems, you can resume the build with the command
          [ERROR]   mvn <goals> -rf :hive-exec
          + exit 1
          '
          

          This message is automatically generated.

          ATTACHMENT ID: 12617485

          Show
          Hive QA added a comment - Overall : -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12617485/HIVE-5783.patch Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/557/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/557/console Messages: **** This message was trimmed, see log for full details **** Decision can match input such as "KW_ORDER KW_BY LPAREN" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:121:5: Decision can match input such as "KW_CLUSTER KW_BY LPAREN" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:133:5: Decision can match input such as "KW_PARTITION KW_BY LPAREN" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:144:5: Decision can match input such as "KW_DISTRIBUTE KW_BY LPAREN" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:155:5: Decision can match input such as "KW_SORT KW_BY LPAREN" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:172:7: Decision can match input such as "STAR" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:185:5: Decision can match input such as "KW_UNIONTYPE" using multiple alternatives: 5, 6 As a result, alternative(s) 6 were disabled for that input warning(200): IdentifiersParser.g:185:5: Decision can match input such as "KW_STRUCT" using multiple alternatives: 4, 6 As a result, alternative(s) 6 were disabled for that input warning(200): IdentifiersParser.g:185:5: Decision can match input such as "KW_ARRAY" using multiple alternatives: 2, 6 As a result, alternative(s) 6 were disabled for that input warning(200): IdentifiersParser.g:267:5: Decision can match input such as "KW_DATE StringLiteral" using multiple alternatives: 2, 3 As a result, alternative(s) 3 were disabled for that input warning(200): IdentifiersParser.g:267:5: Decision can match input such as "KW_NULL" using multiple alternatives: 1, 8 As a result, alternative(s) 8 were disabled for that input warning(200): IdentifiersParser.g:267:5: Decision can match input such as "KW_FALSE" using multiple alternatives: 3, 8 As a result, alternative(s) 8 were disabled for that input warning(200): IdentifiersParser.g:267:5: Decision can match input such as "KW_TRUE" using multiple alternatives: 3, 8 As a result, alternative(s) 8 were disabled for that input warning(200): IdentifiersParser.g:399:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_ORDER KW_BY" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:399:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_LATERAL KW_VIEW" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:399:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_INSERT KW_INTO" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:399:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_DISTRIBUTE KW_BY" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:399:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_INSERT KW_OVERWRITE" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:399:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_SORT KW_BY" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:399:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_MAP LPAREN" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:399:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_CLUSTER KW_BY" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:399:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_GROUP KW_BY" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:399:5: Decision can match input such as "KW_BETWEEN KW_MAP LPAREN" using multiple alternatives: 8, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:524:5: Decision can match input such as "{AMPERSAND..BITWISEXOR, DIV..DIVIDE, EQUAL..EQUAL_NS, GREATERTHAN..GREATERTHANOREQUALTO, KW_AND, KW_ARRAY, KW_BETWEEN..KW_BOOLEAN, KW_CASE, KW_DOUBLE, KW_FLOAT, KW_IF, KW_IN, KW_INT, KW_LIKE, KW_MAP, KW_NOT, KW_OR, KW_REGEXP, KW_RLIKE, KW_SMALLINT, KW_STRING..KW_STRUCT, KW_TINYINT, KW_UNIONTYPE, KW_WHEN, LESSTHAN..LESSTHANOREQUALTO, MINUS..NOTEQUAL, PLUS, STAR, TILDE}" using multiple alternatives: 1, 3 As a result, alternative(s) 3 were disabled for that input [INFO] [INFO] --- maven-resources-plugin:2.5:resources (default-resources) @ hive-exec --- [debug] execute contextualize [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] Copying 1 resource [INFO] [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-exec --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-exec --- [INFO] Compiling 1412 source files to /data/hive-ptest/working/apache-svn-trunk-source/ql/target/classes [INFO] ------------------------------------------------------------- [WARNING] COMPILATION WARNING : [INFO] ------------------------------------------------------------- [WARNING] Note: Some input files use or override a deprecated API. [WARNING] Note: Recompile with -Xlint:deprecation for details. [WARNING] Note: Some input files use unchecked or unsafe operations. [WARNING] Note: Recompile with -Xlint:unchecked for details. [INFO] 4 warnings [INFO] ------------------------------------------------------------- [INFO] ------------------------------------------------------------- [ERROR] COMPILATION ERROR : [INFO] ------------------------------------------------------------- [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java:[84,20] package parquet.hive does not exist [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java:[85,20] package parquet.hive does not exist [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java:[86,26] package parquet.hive.serde does not exist [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java:[143,53] cannot find symbol symbol : class MapredParquetInputFormat location: class org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java:[144,54] cannot find symbol symbol : class MapredParquetOutputFormat location: class org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java:[145,53] cannot find symbol symbol : class ParquetHiveSerDe location: class org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer [INFO] 6 errors [INFO] ------------------------------------------------------------- [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Hive .............................................. SUCCESS [4.823s] [INFO] Hive Ant Utilities ................................ SUCCESS [7.428s] [INFO] Hive Shims Common ................................. SUCCESS [3.338s] [INFO] Hive Shims 0.20 ................................... SUCCESS [2.407s] [INFO] Hive Shims Secure Common .......................... SUCCESS [2.713s] [INFO] Hive Shims 0.20S .................................. SUCCESS [1.347s] [INFO] Hive Shims 0.23 ................................... SUCCESS [2.960s] [INFO] Hive Shims ........................................ SUCCESS [3.451s] [INFO] Hive Common ....................................... SUCCESS [9.701s] [INFO] Hive Serde ........................................ SUCCESS [12.315s] [INFO] Hive Metastore .................................... SUCCESS [26.493s] [INFO] Hive Query Language ............................... FAILURE [27.614s] [INFO] Hive Service ...................................... SKIPPED [INFO] Hive JDBC ......................................... SKIPPED [INFO] Hive Beeline ...................................... SKIPPED [INFO] Hive CLI .......................................... SKIPPED [INFO] Hive Contrib ...................................... SKIPPED [INFO] Hive HBase Handler ................................ SKIPPED [INFO] Hive HCatalog ..................................... SKIPPED [INFO] Hive HCatalog Core ................................ SKIPPED [INFO] Hive HCatalog Pig Adapter ......................... SKIPPED [INFO] Hive HCatalog Server Extensions ................... SKIPPED [INFO] Hive HCatalog Webhcat Java Client ................. SKIPPED [INFO] Hive HCatalog Webhcat ............................. SKIPPED [INFO] Hive HCatalog HBase Storage Handler ............... SKIPPED [INFO] Hive HWI .......................................... SKIPPED [INFO] Hive ODBC ......................................... SKIPPED [INFO] Hive Shims Aggregator ............................. SKIPPED [INFO] Hive TestUtils .................................... SKIPPED [INFO] Hive Packaging .................................... SKIPPED [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 1:47.416s [INFO] Finished at: Fri Dec 06 17:41:14 EST 2013 [INFO] Final Memory: 59M/506M [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hive-exec: Compilation failure: Compilation failure: [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java:[84,20] package parquet.hive does not exist [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java:[85,20] package parquet.hive does not exist [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java:[86,26] package parquet.hive.serde does not exist [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java:[143,53] cannot find symbol [ERROR] symbol : class MapredParquetInputFormat [ERROR] location: class org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java:[144,54] cannot find symbol [ERROR] symbol : class MapredParquetOutputFormat [ERROR] location: class org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java:[145,53] cannot find symbol [ERROR] symbol : class ParquetHiveSerDe [ERROR] location: class org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn <goals> -rf :hive-exec + exit 1 ' This message is automatically generated. ATTACHMENT ID: 12617485
          Hide
          Carl Steinbach added a comment -

          Justin Coffey Would you and your coworkers be willing to consider the option of committing the SerDe code directly to Hive instead of having Hive depend on a third-party JAR? I appreciate that this will make it a little less convenient for you to push in changes. However, I think there are two big drawbacks to the third-party JAR approach: 1) existing Hive contributors will be much less likely contribute improvements to this code since it lives in a different repository, and 2) Hive won't be able to benefit from parquet-serde improvements until they appear in a new parquet-serde release.

          Show
          Carl Steinbach added a comment - Justin Coffey Would you and your coworkers be willing to consider the option of committing the SerDe code directly to Hive instead of having Hive depend on a third-party JAR? I appreciate that this will make it a little less convenient for you to push in changes. However, I think there are two big drawbacks to the third-party JAR approach: 1) existing Hive contributors will be much less likely contribute improvements to this code since it lives in a different repository, and 2) Hive won't be able to benefit from parquet-serde improvements until they appear in a new parquet-serde release.
          Hide
          Brock Noland added a comment -

          Hi Carl,

          FWIW I discussed this with the Parquet community at one of the bi-weekly Parquet conference calls and the feeling was that at least initially it's likely the Parquet contributors will be most likely to contribute to the Parquet Serde. Thus at the current time they'd like to keep the Serde in Parquet. AFAIK this same approach was taken in Pig.

          I don't pretend to speak for the Parquet community, but I do think this could change down the road. For example, it might be hard to implement the vectorization improvements as an external Serde. However, I think the scope of this JIRA is fairly narrow and thus can be implemented as a dependency.

          Brock

          Show
          Brock Noland added a comment - Hi Carl, FWIW I discussed this with the Parquet community at one of the bi-weekly Parquet conference calls and the feeling was that at least initially it's likely the Parquet contributors will be most likely to contribute to the Parquet Serde. Thus at the current time they'd like to keep the Serde in Parquet. AFAIK this same approach was taken in Pig. I don't pretend to speak for the Parquet community, but I do think this could change down the road. For example, it might be hard to implement the vectorization improvements as an external Serde. However, I think the scope of this JIRA is fairly narrow and thus can be implemented as a dependency. Brock
          Hide
          Carl Steinbach added a comment -

          Brock Noland Up to this point we have reserved first-class support for data formats in Hive (i.e. changing the grammar) to formats that are implemented natively in the Hive source repository. I think we should maintain this convention. There are a couple option available if we feel that it's important for users to be able to create Parquet formatted tables using the abbreviated syntax:

          1. Add a format registry feature to Hive that allows admins to register third-party SerDe implementations and associate them with a format keyword that users can reference in a DDL statement.
          2. Maintain two copies of the Parquet SerDe implementation – one in Hive and one in the parquet-mr repository – and backport patches between these repositories as necessary. If users want to use the parquet-mr version of the SerDe with Hive they may do so by referencing the third-party package name in their DDL.

          On a side note I think the ticket summary "Native Parquet Support in Hive" is misleading. Users who see this description in the release notes will conclude that the Parquet SerDe code lives in Hive when the exact opposite is true.

          Show
          Carl Steinbach added a comment - Brock Noland Up to this point we have reserved first-class support for data formats in Hive (i.e. changing the grammar) to formats that are implemented natively in the Hive source repository. I think we should maintain this convention. There are a couple option available if we feel that it's important for users to be able to create Parquet formatted tables using the abbreviated syntax: Add a format registry feature to Hive that allows admins to register third-party SerDe implementations and associate them with a format keyword that users can reference in a DDL statement. Maintain two copies of the Parquet SerDe implementation – one in Hive and one in the parquet-mr repository – and backport patches between these repositories as necessary. If users want to use the parquet-mr version of the SerDe with Hive they may do so by referencing the third-party package name in their DDL. On a side note I think the ticket summary "Native Parquet Support in Hive" is misleading. Users who see this description in the release notes will conclude that the Parquet SerDe code lives in Hive when the exact opposite is true.
          Hide
          Justin Coffey added a comment -

          Hi Carl Steinbach, so on the parquet-hive side, we're good to submit a new patch with direct serde integration. We'll work on that presently.

          Show
          Justin Coffey added a comment - Hi Carl Steinbach , so on the parquet-hive side, we're good to submit a new patch with direct serde integration. We'll work on that presently.
          Hide
          Justin Coffey added a comment -

          (sorry, errant trackpad submit on the last comment)

          I wanted to add that I think the registry/format factory refactoring of the BaseSemanticAnalyzer still seems out of scope for this request. There is willingness to work on that on a different ticket, but I humbly submit that the two are not linked and one should not impede the other.

          Good?

          Show
          Justin Coffey added a comment - (sorry, errant trackpad submit on the last comment) I wanted to add that I think the registry/format factory refactoring of the BaseSemanticAnalyzer still seems out of scope for this request. There is willingness to work on that on a different ticket, but I humbly submit that the two are not linked and one should not impede the other. Good?
          Hide
          Brock Noland added a comment -

          we're good to submit a new patch with direct serde integration

          Cool!

          I wanted to add that I think the registry/format factory refactoring of the BaseSemanticAnalyzer still seems out of scope for this request.

          I agree, I think it's out of scope for this change. I would actually like to that up that change to clean up this code and will do so in HIVE-5976.

          Show
          Brock Noland added a comment - we're good to submit a new patch with direct serde integration Cool! I wanted to add that I think the registry/format factory refactoring of the BaseSemanticAnalyzer still seems out of scope for this request. I agree, I think it's out of scope for this change. I would actually like to that up that change to clean up this code and will do so in HIVE-5976 .
          Hide
          Brock Noland added a comment -

          I would actually like to that up that change to clean up this code and will do so in HIVE-5976.

          Err I meant: I would like to take this up and will do so in HIVE-5976.

          Show
          Brock Noland added a comment - I would actually like to that up that change to clean up this code and will do so in HIVE-5976 . Err I meant: I would like to take this up and will do so in HIVE-5976 .
          Hide
          Carl Steinbach added a comment -

          on the parquet-hive side, we're good to submit a new patch with direct serde integration

          I humbly submit that the two are not linked and one should not impede the other.

          I agree. It wasn't my intention to imply that these issues were linked. Sorry if that wasn't clear.

          In addition to the SerDe can please also include some test cases? I think it would be good to aim for coverage on par with what was provided with OrcFile. Also, the data/files directory contains two files (alltypes.txt and alltypesorc) which will make testing type support a lot easier.

          Show
          Carl Steinbach added a comment - on the parquet-hive side, we're good to submit a new patch with direct serde integration I humbly submit that the two are not linked and one should not impede the other. I agree. It wasn't my intention to imply that these issues were linked. Sorry if that wasn't clear. In addition to the SerDe can please also include some test cases? I think it would be good to aim for coverage on par with what was provided with OrcFile. Also, the data/files directory contains two files (alltypes.txt and alltypesorc) which will make testing type support a lot easier.
          Hide
          Justin Coffey added a comment -

          Carl Steinbach all sounds good. Regarding test cases, I had some QTests prepared, but they were excluded from the initial patch to keep it as minimal as possible. We'll be sure to have full test coverage with the follow up patch.

          Show
          Justin Coffey added a comment - Carl Steinbach all sounds good. Regarding test cases, I had some QTests prepared, but they were excluded from the initial patch to keep it as minimal as possible. We'll be sure to have full test coverage with the follow up patch.
          Hide
          Remus Rusanu added a comment -

          If Parquet native goes in, is a perfect candidate to add a vectorized reader. I created HIVE-5998 to track that.

          Show
          Remus Rusanu added a comment - If Parquet native goes in, is a perfect candidate to add a vectorized reader. I created HIVE-5998 to track that.
          Hide
          Eric Hanson added a comment -

          Could somebody put the patch on ReviewBoard? That's make it easier to look at.

          Show
          Eric Hanson added a comment - Could somebody put the patch on ReviewBoard? That's make it easier to look at.
          Hide
          Brock Noland added a comment -

          Thanks Resmus for creating HIVE-5998.

          Eric, I think the current patch is stale since it's been decided the Parquet Serde will contributed to Hive.

          Show
          Brock Noland added a comment - Thanks Resmus for creating HIVE-5998 . Eric, I think the current patch is stale since it's been decided the Parquet Serde will contributed to Hive.
          Hide
          Justin Coffey added a comment -

          Yes this is true. We are refactoring to merge the whole parquet-hive project into hive. There are a couple of folks involved at this point and so it's taking a smidgen extra time what with holidays and all.

          Show
          Justin Coffey added a comment - Yes this is true. We are refactoring to merge the whole parquet-hive project into hive. There are a couple of folks involved at this point and so it's taking a smidgen extra time what with holidays and all.
          Hide
          Justin Coffey added a comment -

          After much delay, here is the patch. This integrates the former "parquet-hive" project directly into ql.io.parquet.

          There is a qtest file (modeled on that of ORC) and unit tests for much of the code.

          This applies cleanly to the commit 3a7cea58ababfbbbdb6eac97fefa4298337b7c06 on the branch-0.11.

          Comments welcome .

          Show
          Justin Coffey added a comment - After much delay, here is the patch. This integrates the former "parquet-hive" project directly into ql.io.parquet. There is a qtest file (modeled on that of ORC) and unit tests for much of the code. This applies cleanly to the commit 3a7cea58ababfbbbdb6eac97fefa4298337b7c06 on the branch-0.11. Comments welcome .
          Hide
          Remus Rusanu added a comment -

          Justin Coffey: can you add a reviewboard link? ty

          Show
          Remus Rusanu added a comment - Justin Coffey : can you add a reviewboard link? ty
          Show
          Justin Coffey added a comment - Remus Rusanu : like so? https://reviews.facebook.net/differential/diff/47487/
          Hide
          Brock Noland added a comment -

          Awesome, thank you very much guys!

          Show
          Brock Noland added a comment - Awesome, thank you very much guys!
          Hide
          Brock Noland added a comment -

          Hey guys,

          I rebased your patch on top of trunk. The bit items I changed are:

          • Moved DeprecatedParquet*Format classed back to original package since that is what users have stored in their metastore. We should be able to remove those classes after 2 releases
          • Removed @author tags since they aren't used in Apache
          • Fixed some license headers which were missing
          Show
          Brock Noland added a comment - Hey guys, I rebased your patch on top of trunk. The bit items I changed are: Moved DeprecatedParquet*Format classed back to original package since that is what users have stored in their metastore. We should be able to remove those classes after 2 releases Removed @author tags since they aren't used in Apache Fixed some license headers which were missing
          Hide
          Brock Noland added a comment -
          Show
          Brock Noland added a comment - RB link: https://reviews.apache.org/r/17061/
          Hide
          Brock Noland added a comment -

          When we commit this change we need to give credit to: Justin Coffey, Mickaël Lacour, Remy Pecqueur

          Show
          Brock Noland added a comment - When we commit this change we need to give credit to: Justin Coffey, Mickaël Lacour, Remy Pecqueur
          Hide
          Carl Steinbach added a comment -

          I noticed that many of the source files contain Criteo copyright notices. The ASF has a policy on this which is documented here:

          https://www.apache.org/legal/src-headers.html

          Since this patch was submitted directly to the ASF by the copyright owner or owner's agent it sounds like we have three options for handling this:

          1. Remove the notices
          2. move them to the NOTICE file associated with each applicable project release, or
          3. provide written permission for the ASF to make such removal or relocation of the notices

          Justin Coffey Remus Rusanu Do you guys have preference?

          Show
          Carl Steinbach added a comment - I noticed that many of the source files contain Criteo copyright notices. The ASF has a policy on this which is documented here: https://www.apache.org/legal/src-headers.html Since this patch was submitted directly to the ASF by the copyright owner or owner's agent it sounds like we have three options for handling this: Remove the notices move them to the NOTICE file associated with each applicable project release, or provide written permission for the ASF to make such removal or relocation of the notices Justin Coffey Remus Rusanu Do you guys have preference?
          Hide
          Remus Rusanu added a comment -

          Carl Steinbach I have no say in this, I'm not involved with the original effort. I'm only watching this because I want to add the vectorized support for it.

          Show
          Remus Rusanu added a comment - Carl Steinbach I have no say in this, I'm not involved with the original effort. I'm only watching this because I want to add the vectorized support for it.
          Hide
          Justin Coffey added a comment -

          Hi Carl Steinbach. Actually, that looks like just a boilerplate auto insertion in the affected class files. The ASF license is on our short list of approved OSS licenses, so I don't think it will be an issue for me to strip that out and resubmit. I'll just double check all is well and resubmit Monday.

          Show
          Justin Coffey added a comment - Hi Carl Steinbach . Actually, that looks like just a boilerplate auto insertion in the affected class files. The ASF license is on our short list of approved OSS licenses, so I don't think it will be an issue for me to strip that out and resubmit. I'll just double check all is well and resubmit Monday.
          Hide
          Justin Coffey added a comment -

          without license or author tags.

          Show
          Justin Coffey added a comment - without license or author tags.
          Hide
          Justin Coffey added a comment -

          this is the good one. had a final dependency to clean up.

          Show
          Justin Coffey added a comment - this is the good one. had a final dependency to clean up.
          Hide
          Justin Coffey added a comment -

          Sorry for the spam in posts. Latest patch is good:

          • no author tags
          • no criteo copyright
          • builds against latest version of parquet (1.3.2)

          I attempted to create a review.apache.org review, but am unable to publish it because I can't assign any reviewers.

          Show
          Justin Coffey added a comment - Sorry for the spam in posts. Latest patch is good: no author tags no criteo copyright builds against latest version of parquet (1.3.2) I attempted to create a review.apache.org review, but am unable to publish it because I can't assign any reviewers.
          Hide
          Brock Noland added a comment -

          Thank you very much Justin!! I have rebased the patch for trunk.

          Show
          Brock Noland added a comment - Thank you very much Justin!! I have rebased the patch for trunk.
          Hide
          Brock Noland added a comment -

          Marking "Patch Available" for precommit testing.

          Show
          Brock Noland added a comment - Marking "Patch Available" for precommit testing.
          Hide
          Brock Noland added a comment -

          RB item has been updated: https://reviews.apache.org/r/17061/

          Show
          Brock Noland added a comment - RB item has been updated: https://reviews.apache.org/r/17061/
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12624023/HIVE-5783.patch

          ERROR: -1 due to 1 failed/errored test(s), 4977 tests executed
          Failed tests:

          org.apache.hadoop.hive.ql.history.TestHiveHistory.testSimpleQuery
          

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/969/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/969/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 1 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12624023

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12624023/HIVE-5783.patch ERROR: -1 due to 1 failed/errored test(s), 4977 tests executed Failed tests: org.apache.hadoop.hive.ql.history.TestHiveHistory.testSimpleQuery Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/969/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/969/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed This message is automatically generated. ATTACHMENT ID: 12624023
          Hide
          Brock Noland added a comment -

          Failure was unrelated to the current patch:

           java.lang.RuntimeException: commitTransaction was called but openTransactionCalls = 0. This probably indicates that there are unbalanced calls to openTransaction/commitTransaction
          	at org.apache.hadoop.hive.metastore.ObjectStore.commitTransaction(ObjectStore.java:378)
          	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
          	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          	at java.lang.reflect.Method.invoke(Method.java:597)
          	at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:122)
          	at $Proxy6.commitTransaction(Unknown Source)
          	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1085)
          	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1117)
          
          Show
          Brock Noland added a comment - Failure was unrelated to the current patch: java.lang.RuntimeException: commitTransaction was called but openTransactionCalls = 0. This probably indicates that there are unbalanced calls to openTransaction/commitTransaction at org.apache.hadoop.hive.metastore.ObjectStore.commitTransaction(ObjectStore.java:378) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:122) at $Proxy6.commitTransaction(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1085) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1117)
          Hide
          Brock Noland added a comment -

          Uploading the exact same patch to get a second test run.

          Show
          Brock Noland added a comment - Uploading the exact same patch to get a second test run.
          Hide
          Lefty Leverenz added a comment -

          What documentation will this need? Is anything already written up that can be added to the wiki?

          Here's where the wiki documents file formats and serdes:

          Show
          Lefty Leverenz added a comment - What documentation will this need? Is anything already written up that can be added to the wiki? Here's where the wiki documents file formats and serdes: "Row Format, Storage Format, and SerDe" section in DDL doc, with links to other serde docs: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RowFormat,StorageFormat,andSerDe "File Formats" section in the Language Manual (includes ORC doc): https://cwiki.apache.org/confluence/display/Hive/LanguageManual Avro SerDe doc: https://cwiki.apache.org/confluence/display/Hive/AvroSerDe
          Hide
          Hive QA added a comment -

          Overall: +1 all checks pass

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12624093/HIVE-5783.patch

          SUCCESS: +1 4977 tests passed

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/973/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/973/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          

          This message is automatically generated.

          ATTACHMENT ID: 12624093

          Show
          Hive QA added a comment - Overall : +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12624093/HIVE-5783.patch SUCCESS: +1 4977 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/973/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/973/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase This message is automatically generated. ATTACHMENT ID: 12624093
          Hide
          Justin Coffey added a comment -

          Lefty Leverenz, if you'd like I can give this a review and propose changes.

          Show
          Justin Coffey added a comment - Lefty Leverenz , if you'd like I can give this a review and propose changes.
          Hide
          Brock Noland added a comment -

          Lefty Leverenz, good call.

          I think we should create a document under "File Formats". I will volunteer for that effort.

          Show
          Brock Noland added a comment - Lefty Leverenz , good call. I think we should create a document under "File Formats". I will volunteer for that effort.
          Hide
          Justin Coffey added a comment -

          We have unfortunately found a bug in MapredParquetInputFormat. We are working on a fix and will resubmit a patch once tested.

          Sorry

          Show
          Justin Coffey added a comment - We have unfortunately found a bug in MapredParquetInputFormat. We are working on a fix and will resubmit a patch once tested. Sorry
          Hide
          Brock Noland added a comment -

          Thank you for the report Justin!

          Show
          Brock Noland added a comment - Thank you for the report Justin!
          Hide
          Justin Coffey added a comment -

          The updated patch. This fixes incorrect behavior when using HiveInputSplits. Regression tests have been added as a qtest (parquet_partitioned.q).

          Show
          Justin Coffey added a comment - The updated patch. This fixes incorrect behavior when using HiveInputSplits. Regression tests have been added as a qtest (parquet_partitioned.q).
          Hide
          Brock Noland added a comment -

          Uploaded the latest patch rebased on trunk.

          Show
          Brock Noland added a comment - Uploaded the latest patch rebased on trunk.
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12625077/HIVE-5783.patch

          ERROR: -1 due to 7 failed/errored test(s), 4981 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union
          

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1003/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1003/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 7 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12625077

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12625077/HIVE-5783.patch ERROR: -1 due to 7 failed/errored test(s), 4981 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1003/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1003/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed This message is automatically generated. ATTACHMENT ID: 12625077
          Hide
          Brock Noland added a comment -

          Those tests failed due to HIVE-5728 (which was committed without testing) and will be fixed via HIVE-6302.

          Show
          Brock Noland added a comment - Those tests failed due to HIVE-5728 (which was committed without testing) and will be fixed via HIVE-6302 .
          Hide
          Carl Steinbach added a comment -

          I noticed that this SerDe doesn't support several of Hive's types: binary, timestamp, date, and probably a couple others as well. If there other known limitations it would be helpful to list them.

          Show
          Carl Steinbach added a comment - I noticed that this SerDe doesn't support several of Hive's types: binary, timestamp, date, and probably a couple others as well. If there other known limitations it would be helpful to list them.
          Hide
          Brock Noland added a comment -

          I believe the test issues have been resolved. Uploading same patch for another round of testing.

          Show
          Brock Noland added a comment - I believe the test issues have been resolved. Uploading same patch for another round of testing.
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12625200/HIVE-5783.patch

          ERROR: -1 due to 4 failed/errored test(s), 4990 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_import_exported_table
          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two
          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_load_hdfs_file_with_space_in_the_name
          org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_file_with_header_footer_negative
          

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1032/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1032/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 4 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12625200

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12625200/HIVE-5783.patch ERROR: -1 due to 4 failed/errored test(s), 4990 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_import_exported_table org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_load_hdfs_file_with_space_in_the_name org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_file_with_header_footer_negative Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1032/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1032/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed This message is automatically generated. ATTACHMENT ID: 12625200
          Hide
          Brock Noland added a comment -

          Those failures are covered by HIVE-6293.

          Show
          Brock Noland added a comment - Those failures are covered by HIVE-6293 .
          Hide
          Brock Noland added a comment -

          Latest rebase.

          Show
          Brock Noland added a comment - Latest rebase.
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12625632/HIVE-5783.patch

          ERROR: -1 due to 1 failed/errored test(s), 5004 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority2
          

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1103/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1103/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 1 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12625632

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12625632/HIVE-5783.patch ERROR: -1 due to 1 failed/errored test(s), 5004 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority2 Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1103/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1103/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed This message is automatically generated. ATTACHMENT ID: 12625632
          Hide
          Brock Noland added a comment -

          Found two issues with the patch:

          1) Some apache license headers were missing
          2) A deprecated Serde did not exist (so that existing metastores aren't hosed by the rename)

          Show
          Brock Noland added a comment - Found two issues with the patch: 1) Some apache license headers were missing 2) A deprecated Serde did not exist (so that existing metastores aren't hosed by the rename)
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12626107/HIVE-5783.patch

          ERROR: -1 due to 1 failed/errored test(s), 5012 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16
          

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1122/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1122/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 1 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12626107

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12626107/HIVE-5783.patch ERROR: -1 due to 1 failed/errored test(s), 5012 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16 Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1122/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1122/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed This message is automatically generated. ATTACHMENT ID: 12626107
          Hide
          Brock Noland added a comment -

          That test failure is unrelated to the patch.

          Show
          Brock Noland added a comment - That test failure is unrelated to the patch.
          Hide
          Xuefu Zhang added a comment -

          Some comments are posted on RB.

          Show
          Xuefu Zhang added a comment - Some comments are posted on RB.
          Hide
          Brock Noland added a comment -

          Thanks Xuefu. Justin, I can address these items tomorrow and have an updated patch.

          Show
          Brock Noland added a comment - Thanks Xuefu. Justin, I can address these items tomorrow and have an updated patch.
          Hide
          Brock Noland added a comment -

          Latest patch based on review.

          Show
          Brock Noland added a comment - Latest patch based on review.
          Hide
          Brock Noland added a comment -

          Noticed a couple items of trailing ws. Latest patch.

          Show
          Brock Noland added a comment - Noticed a couple items of trailing ws. Latest patch.
          Hide
          Brock Noland added a comment -

          FYI I created HIVE-6368 to document Parquet in Hive.

          Show
          Brock Noland added a comment - FYI I created HIVE-6368 to document Parquet in Hive.
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12626891/HIVE-5783.patch

          ERROR: -1 due to 2 failed/errored test(s), 5029 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16
          org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_invalid_priv_v1
          

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1183/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1183/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 2 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12626891

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12626891/HIVE-5783.patch ERROR: -1 due to 2 failed/errored test(s), 5029 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_invalid_priv_v1 Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1183/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1183/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed This message is automatically generated. ATTACHMENT ID: 12626891
          Hide
          Brock Noland added a comment -

          Those failures shouldn't be related, but regardless that was an old version of the patch. The latest version is running now.

          Show
          Brock Noland added a comment - Those failures shouldn't be related, but regardless that was an old version of the patch. The latest version is running now.
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12626901/HIVE-5783.patch

          ERROR: -1 due to 1 failed/errored test(s), 5029 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_invalid_priv_v1
          

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1184/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1184/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 1 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12626901

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12626901/HIVE-5783.patch ERROR: -1 due to 1 failed/errored test(s), 5029 tests executed Failed tests: org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_invalid_priv_v1 Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1184/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1184/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed This message is automatically generated. ATTACHMENT ID: 12626901
          Hide
          Brock Noland added a comment -

          I am able to produce that failure myself so I am looking at it.

          Show
          Brock Noland added a comment - I am able to produce that failure myself so I am looking at it.
          Hide
          Brock Noland added a comment -

          TL; DR: The new hive authz work has a test which stores the auto-generated token id for "DELETE".

          I fixed the error message and updated the test in this patch as:

          1. The error message is completely and utterly useless.
          2. Any time a token is added the token-ids change (again it's auto-generated)
          Show
          Brock Noland added a comment - TL; DR: The new hive authz work has a test which stores the auto-generated token id for "DELETE". I fixed the error message and updated the test in this patch as: The error message is completely and utterly useless. Any time a token is added the token-ids change (again it's auto-generated)
          Hide
          Xuefu Zhang added a comment -

          I left two minor comments on RB.

          Show
          Xuefu Zhang added a comment - I left two minor comments on RB.
          Hide
          Brock Noland added a comment -

          Thank you Xuefu! I had a question about one of your comments.

          Show
          Brock Noland added a comment - Thank you Xuefu! I had a question about one of your comments.
          Hide
          Brock Noland added a comment -

          Latest patch addresses the review concern.

          Show
          Brock Noland added a comment - Latest patch addresses the review concern.
          Hide
          Hive QA added a comment -

          Overall: +1 all checks pass

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12626998/HIVE-5783.patch

          SUCCESS: +1 5042 tests passed

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1192/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1192/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          

          This message is automatically generated.

          ATTACHMENT ID: 12626998

          Show
          Hive QA added a comment - Overall : +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12626998/HIVE-5783.patch SUCCESS: +1 5042 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1192/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1192/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase This message is automatically generated. ATTACHMENT ID: 12626998
          Hide
          Xuefu Zhang added a comment -

          +1 to the latest patch.

          Show
          Xuefu Zhang added a comment - +1 to the latest patch.
          Hide
          Xuefu Zhang added a comment -

          It seems two things are needed in order to commit the patch:

          1. Patch to be rebased (conflicts in pom.xml)
          2. patch to be generated using git diff --no-prefix

          Show
          Xuefu Zhang added a comment - It seems two things are needed in order to commit the patch: 1. Patch to be rebased (conflicts in pom.xml) 2. patch to be generated using git diff --no-prefix
          Hide
          Brock Noland added a comment -

          Thanks Xuefu, I am updating the patch.

          I will generate with --no-prefix but I am not sure why that is a requirement?

          FWIW, I use the following script which applies patch regardless of prefix:

          https://github.com/apache/hive/blob/trunk/testutils/ptest2/src/main/resources/smart-apply-patch.sh

          Show
          Brock Noland added a comment - Thanks Xuefu, I am updating the patch. I will generate with --no-prefix but I am not sure why that is a requirement? FWIW, I use the following script which applies patch regardless of prefix: https://github.com/apache/hive/blob/trunk/testutils/ptest2/src/main/resources/smart-apply-patch.sh
          Hide
          Brock Noland added a comment -

          I attached the updated patch with and without prefix.

          Thank you Xuefu!

          Show
          Brock Noland added a comment - I attached the updated patch with and without prefix. Thank you Xuefu!
          Hide
          Xuefu Zhang added a comment -

          Brock Noland Thanks for sharing the link. It's good to know.

          As a fyi, I have found the following at Hive how to contribute page https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-CreatingaPatch:

          If you are using Git instead of Subversion, it's important that you generate your patch using the following command:
          git diff --no-prefix <commit> > HIVE-1234.1.patch.txt

          Show
          Xuefu Zhang added a comment - Brock Noland Thanks for sharing the link. It's good to know. As a fyi, I have found the following at Hive how to contribute page https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-CreatingaPatch: If you are using Git instead of Subversion, it's important that you generate your patch using the following command: git diff --no-prefix <commit> > HIVE-1234 .1.patch.txt
          Hide
          Hive QA added a comment -

          Overall: -1 no tests executed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12627508/HIVE-5783.patch

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1230/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1230/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Tests exited with: NonZeroExitCodeException
          Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n '' ]]
          + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
          + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
          + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
          + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
          + cd /data/hive-ptest/working/
          + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-1230/source-prep.txt
          + [[ false == \t\r\u\e ]]
          + mkdir -p maven ivy
          + [[ svn = \s\v\n ]]
          + [[ -n '' ]]
          + [[ -d apache-svn-trunk-source ]]
          + [[ ! -d apache-svn-trunk-source/.svn ]]
          + [[ ! -d apache-svn-trunk-source ]]
          + cd apache-svn-trunk-source
          + svn revert -R .
          ++ egrep -v '^X|^Performing status on external'
          ++ awk '{print $2}'
          ++ svn status --no-ignore
          + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/storage-handlers/hbase/target hcatalog/server-extensions/target hcatalog/core/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target hwi/target common/target common/src/gen service/target contrib/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target
          + svn update
          
          Fetching external item into 'hcatalog/src/test/e2e/harness'
          External at revision 1565604.
          
          At revision 1565604.
          + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
          + patchFilePath=/data/hive-ptest/working/scratch/build.patch
          + [[ -f /data/hive-ptest/working/scratch/build.patch ]]
          + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
          + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch
          The patch does not appear to apply with p0, p1, or p2
          + exit 1
          '
          

          This message is automatically generated.

          ATTACHMENT ID: 12627508

          Show
          Hive QA added a comment - Overall : -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12627508/HIVE-5783.patch Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1230/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1230/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n '' ]] + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-1230/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . ++ egrep -v '^X|^Performing status on external' ++ awk '{print $2}' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/storage-handlers/hbase/target hcatalog/server-extensions/target hcatalog/core/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target hwi/target common/target common/src/gen service/target contrib/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1565604. At revision 1565604. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' This message is automatically generated. ATTACHMENT ID: 12627508
          Hide
          Brock Noland added a comment -

          Yet another minor rebase.

          Show
          Brock Noland added a comment - Yet another minor rebase.
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12627637/HIVE-5783.patch

          ERROR: -1 due to 11 failed/errored test(s), 5072 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32_lessSize
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_merge3
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf7
          org.apache.hadoop.hive.metastore.TestMetaStoreEndFunctionListener.testEndFunctionListener
          org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAddPartitionWithValidPartVal
          org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithValidCharacters
          org.apache.hadoop.hive.metastore.TestSetUGIOnBothClientServer.testListPartitions
          org.apache.hadoop.hive.metastore.TestSetUGIOnBothClientServer.testNameMethods
          org.apache.hadoop.hive.metastore.TestSetUGIOnBothClientServer.testPartition
          org.apache.hadoop.hive.ql.security.TestClientSideAuthorizationProvider.testSimplePrivileges
          org.apache.hadoop.hive.ql.security.TestMetastoreAuthorizationProvider.testSimplePrivileges
          

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1243/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1243/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 11 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12627637

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12627637/HIVE-5783.patch ERROR: -1 due to 11 failed/errored test(s), 5072 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32_lessSize org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_merge3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf7 org.apache.hadoop.hive.metastore.TestMetaStoreEndFunctionListener.testEndFunctionListener org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAddPartitionWithValidPartVal org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithValidCharacters org.apache.hadoop.hive.metastore.TestSetUGIOnBothClientServer.testListPartitions org.apache.hadoop.hive.metastore.TestSetUGIOnBothClientServer.testNameMethods org.apache.hadoop.hive.metastore.TestSetUGIOnBothClientServer.testPartition org.apache.hadoop.hive.ql.security.TestClientSideAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestMetastoreAuthorizationProvider.testSimplePrivileges Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1243/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1243/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 11 tests failed This message is automatically generated. ATTACHMENT ID: 12627637
          Hide
          Brock Noland added a comment -

          Nothing in this patch should have caused those. I am attaching the exact same patch for a re-run.

          Show
          Brock Noland added a comment - Nothing in this patch should have caused those. I am attaching the exact same patch for a re-run.
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12627821/HIVE-5783.patch

          ERROR: -1 due to 1 failed/errored test(s), 5073 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16
          

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1258/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1258/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 1 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12627821

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12627821/HIVE-5783.patch ERROR: -1 due to 1 failed/errored test(s), 5073 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16 Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1258/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1258/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed This message is automatically generated. ATTACHMENT ID: 12627821
          Hide
          Xuefu Zhang added a comment -

          I believe that the above failed test is flaky, and no related to the patch. Patch committed to trunk. Thanks to Justin for the contribution and Brock's help on this.

          Show
          Xuefu Zhang added a comment - I believe that the above failed test is flaky, and no related to the patch. Patch committed to trunk. Thanks to Justin for the contribution and Brock's help on this.
          Hide
          Justin Coffey added a comment -

          Thanks to all, and especially Brock Noland for all his help!

          Show
          Justin Coffey added a comment - Thanks to all, and especially Brock Noland for all his help!
          Hide
          Brock Noland added a comment -

          Note: Although I don't believe JIRA will allow us to do this, the attribution for this jira should be: Justin Coffey, Mickaël Lacour, and Remy Pecqueur

          Show
          Brock Noland added a comment - Note: Although I don't believe JIRA will allow us to do this, the attribution for this jira should be: Justin Coffey, Mickaël Lacour, and Remy Pecqueur

            People

            • Assignee:
              Justin Coffey
              Reporter:
              Justin Coffey
            • Votes:
              1 Vote for this issue
              Watchers:
              28 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development