Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-8247

Pig cursor written to Hive via HCat doesn't NULL-fill missing columns

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.13.1
    • None
    • HCatalog
    • None

    Description

      Suppose a Hive table has columns (a,b,c,d)
      If a Pig script writing to this table produces schema (a,b,c) it works: 'd' will be NULL.

      If a Pig script writing to this table produces schema (a,b,d) it fails with error below.
      This is an old issue. There is nothing in HCatalog documentation that indicates whether this should work.

      Also, (a,c,b) would not work either.

      Running org.apache.hive.hcatalog.pig.TestOrcHCatStorer
      Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 30.113 sec <<< FAILURE! - in org.apache.hive.hcatalog.pig.TestOrcHCatStorer
      partialSchemaSepcification(org.apache.hive.hcatalog.pig.TestOrcHCatStorer)  Time elapsed: 29.886 sec  <<< ERROR!
      org.apache.pig.impl.logicalLayer.FrontendException: Unable to store alias ABD
      	at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1635)
      	at org.apache.pig.PigServer.registerQuery(PigServer.java:575)
      	at org.apache.hive.hcatalog.mapreduce.HCatBaseTest.logAndRegister(HCatBaseTest.java:92)
      	at org.apache.hive.hcatalog.pig.TestHCatStorer.partialSchemaSepcification(TestHCatStorer.java:1035)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      	at java.lang.reflect.Method.invoke(Method.java:597)
      	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
      	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
      	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
      	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
      	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
      	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
      	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
      	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
      	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
      	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
      	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
      	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
      	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
      	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
      	at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
      	at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:254)
      	at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:149)
      	at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
      	at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
      	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
      	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
      Caused by: org.apache.pig.impl.plan.VisitorException: 
      <line 7, column 0> Output Location Validation Failed for: 'T More info to follow:
      org.apache.hive.hcatalog.common.HCatException : 2007 : Invalid column position in partition schema : Expected column <c> at position 3, found column <d>
      	at org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:75)
      	at org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:66)
      	at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64)
      	at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
      	at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
      	at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
      	at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
      	at org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53)
      	at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
      	at org.apache.pig.newplan.logical.rules.InputOutputFileValidator.validate(InputOutputFileValidator.java:45)
      	at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:303)
      	at org.apache.pig.PigServer.compilePp(PigServer.java:1380)
      	at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1305)
      	at org.apache.pig.PigServer.execute(PigServer.java:1297)
      	at org.apache.pig.PigServer.access$400(PigServer.java:122)
      	at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1630)
      	at org.apache.pig.PigServer.registerQuery(PigServer.java:575)
      	at org.apache.hive.hcatalog.mapreduce.HCatBaseTest.logAndRegister(HCatBaseTest.java:92)
      	at org.apache.hive.hcatalog.pig.TestHCatStorer.partialSchemaSepcification(TestHCatStorer.java:1035)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      	at java.lang.reflect.Method.invoke(Method.java:597)
      	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
      	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
      	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
      	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
      	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
      	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
      	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
      	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
      	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
      	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
      	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
      	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
      	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
      	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
      	at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
      	at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:254)
      	at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:149)
      	at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
      	at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
      	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
      	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
      Caused by: org.apache.hive.hcatalog.common.HCatException: org.apache.hive.hcatalog.common.HCatException : 2007 : Invalid column position in partition schema : Expected column <c> at position 3, found column <d>
      	at org.apache.hive.hcatalog.common.HCatUtil.validatePartitionSchema(HCatUtil.java:258)
      	at org.apache.hive.hcatalog.mapreduce.HCatBaseOutputFormat.setPartDetails(HCatBaseOutputFormat.java:231)
      	at org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.setSchema(HCatOutputFormat.java:244)
      	at org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.setSchema(HCatOutputFormat.java:231)
      	at org.apache.hive.hcatalog.pig.HCatStorer.setStoreLocation(HCatStorer.java:206)
      	at org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:68)
      	at org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:66)
      	at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64)
      	at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
      	at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
      	at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
      	at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
      	at org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53)
      	at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
      	at org.apache.pig.newplan.logical.rules.InputOutputFileValidator.validate(InputOutputFileValidator.java:45)
      	at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:303)
      	at org.apache.pig.PigServer.compilePp(PigServer.java:1380)
      	at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1305)
      	at org.apache.pig.PigServer.execute(PigServer.java:1297)
      	at org.apache.pig.PigServer.access$400(PigServer.java:122)
      	at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1630)
      	at org.apache.pig.PigServer.registerQuery(PigServer.java:575)
      	at org.apache.hive.hcatalog.mapreduce.HCatBaseTest.logAndRegister(HCatBaseTest.java:92)
      	at org.apache.hive.hcatalog.pig.TestHCatStorer.partialSchemaSepcification(TestHCatStorer.java:1035)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      	at java.lang.reflect.Method.invoke(Method.java:597)
      	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
      	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
      	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
      	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
      	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
      	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
      	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
      	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
      	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
      	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
      	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
      	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
      	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
      	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
      	at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
      	at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:254)
      	at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:149)
      	at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
      	at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
      	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
      	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
      
      
      Results :
      
      Tests in error: 
        TestOrcHCatStorer>TestHCatStorer.partialSchemaSepcification:1035->HCatBaseTest.logAndRegister:92 ? Frontend
      

      Reproducer (which can be added to org.apache.hive.hcatalog.pig.TestHCatStorer)

        @Test
        public void partialSchemaSepcification() throws Exception {
          driver.run("drop table if exists T");
          String createTable = "create table T(a int, b int, c string, d string) stored as " + getStorageFormat();
          int retCode = driver.run(createTable).getResponseCode();
          if (retCode != 0) {
            throw new IOException("Failed to create table.");
          }
          String[] inputData = {"1\t20\tstr1\tstr20", "2\t30\tstr2\tstr30", "3\t40\tstr3\tstr40", "4\t50\tstr4\tstr40"};
          HcatTestUtils.createTestDataFile(INPUT_FILE_NAME, inputData);
          int lineNumber = 1;
          PigServer ps = createPigServer(true);
          logAndRegister(ps, "A1 = LOAD '" + INPUT_FILE_NAME + "' USING PigStorage() AS (a:int,b:int,c:chararray,d:chararray);", lineNumber++);
          logAndRegister(ps, "ROW1 = FILTER A1 BY a == 1;", lineNumber++);
          logAndRegister(ps, "ABC = FOREACH ROW1 GENERATE a,b,c;", lineNumber++);
          logAndRegister(ps, "STORE ABC INTO 'T' USING " + HCatStorer.class.getName() + "();", lineNumber++);
      
          logAndRegister(ps, "ROW2 = FILTER A1 BY a == 2;", lineNumber++);
          logAndRegister(ps, "ABD = FOREACH ROW2 GENERATE a,b,d;", lineNumber++);
          logAndRegister(ps, "STORE ABD INTO 'T' USING " + HCatStorer.class.getName() + "();", lineNumber);
          driver.run("select * from T");
          ArrayList<String> results = new ArrayList<String>();
          driver.getResults(results);
          Assert.assertEquals(2, results.size());
          driver.run("drop table T");
        }
      

      Attachments

        1. reproducer.patch
          2 kB
          Eugene Koifman

        Activity

          People

            Unassigned Unassigned
            ekoifman Eugene Koifman
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated: