Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17335

Creating Hive table from Spark data

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0
    • 2.0.1, 2.1.0
    • SQL
    • None

    Description

      Recently my team started using Spark for analysis of huge JSON objects. Spark itself handles it well. The problem starts when we try to create a Hive table from it using steps from this part of doc: http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables

      After running command `spark.sql("CREATE TABLE x AS (SELECT * FROM y)") we get following exception (sorry for obfuscating, confidential data):

      Exception in thread "main" org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: Error: : expected at the position 993 of 'string:struct<a:boolean,b:array<string>,c:boolean,d:struct<e:boolean,f:boolean,[...(few others)],z:boolean,... 4 more fields>,[...(rest of valid struct string)]>' but ' ' is found.;
      

      It turned out that the exception was raised because of `... 4 more fields` part as it is not a valid representation of data structure.

      An easy workaround is to set `spark.debug.maxToStringFields` to some large value. Nevertheless it shouldn't be required and the stringifying process should use methods targeted at giving valid data structure for Hive.

      In my opinion the root problem is here:
      https://github.com/apache/spark/blob/9d7a47406ed538f0005cdc7a62bc6e6f20634815/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala#L318 when calling `simpleString` method instead of `catalogString`. Nevertheless this class is used at many places and I don't feel that experienced with Spark to automatically submit PR.

      We believe this issue is indirectly caused by this PR: https://github.com/apache/spark/pull/13537
      There has been almost the same issue in the past. You can find it here: https://issues.apache.org/jira/browse/SPARK-16415

      Attachments

        Activity

          People

            hvanhovell Herman van Hövell
            jupblb Michal Kielbowicz
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: