Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6450

Metastore Parquet table conversion fails when a single metastore Parquet table appears multiple times in the query

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 1.3.0
    • 1.3.1, 1.4.0
    • SQL
    • None

    Description

      The below query was working fine till 1.3 commit 9a151ce58b3e756f205c9f3ebbbf3ab0ba5b33fd.(Yes it definitely works at this commit although this commit is completely unrelated)

      It got broken in 1.3.0 release with an AnalysisException: resolved attributes ... missing from .... (although this list contains the fields which it reports missing)

      at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim13.scala:189)
      	at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231)
      	at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:218)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79)
      	at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37)
      	at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:415)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
      	at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:493)
      	at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:60)
      	at com.sun.proxy.$Proxy17.executeStatementAsync(Unknown Source)
      	at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:233)
      	at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:344)
      	at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
      	at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
      	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
      	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
      	at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55)
      	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      
      select Orders.Country, Orders.ProductCategory,count(1) from Orders join (select Orders.Country, count(1) CountryOrderCount from Orders where to_date(Orders.PlacedDate) > '2015-01-01' group by Orders.Country order by CountryOrderCount DESC LIMIT 5) Top5Countries on Top5Countries.Country = Orders.Country where to_date(Orders.PlacedDate) > '2015-01-01' group by Orders.Country,Orders.ProductCategory;
      

      The temporary workaround is to add explicit alias for the table Orders

      select o.Country, o.ProductCategory,count(1) from Orders o join (select r.Country, count(1) CountryOrderCount from Orders r where to_date(r.PlacedDate) > '2015-01-01' group by r.Country order by CountryOrderCount DESC LIMIT 5) Top5Countries on Top5Countries.Country = o.Country where to_date(o.PlacedDate) > '2015-01-01' group by o.Country,o.ProductCategory;
      

      However this change not only affects self joins, it also seems to affect union queries as well, like the below query which was again working before(commit 9a151ce) got broken

      select Orders.Country,null,count(1) OrderCount from Orders group by Orders.Country,null
      union all
      select null,Orders.ProductCategory,count(1) OrderCount from Orders group by null, Orders.ProductCategory
      

      also fails with a Analysis exception.
      The workaround is to add different aliases for the tables.

      Attachments

        Activity

          People

            lian cheng Cheng Lian
            chinnitv Anand Mohan Tumuluri
            Michael Armbrust Michael Armbrust
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: