Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-9932 Data source API improvement (Spark 1.6)
  3. SPARK-8887

Explicitly define which data types can be used as dynamic partition columns

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.4.0, 1.5.0
    • 1.6.0
    • SQL
    • None

    Description

      InsertIntoHadoopFsRelation implements Hive compatible dynamic partitioning insertion, which uses String.valueOf to write encode partition column values into dynamic partition directories. This actually limits the data types that can be used in partition column. For example, string representation of StructType values is not well defined. However, this limitation is not explicitly enforced.

      There are several things we can improve:

      1. Enforce dynamic column data type requirements by adding analysis rules and throws AnalysisException when violation occurs.
      2. Abstract away string representation of various data types, so that we don't need to convert internal representation types (e.g. UTF8String) to external types (e.g. String). A set of Hive compatible implementations should be provided to ensure compatibility with Hive.

      Attachments

        Activity

          People

            yijieshen Yijie Shen
            lian cheng Cheng Lian
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: