Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-7939

Make URL partition recognition return String by default for all partition column types and values

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.4.0
    • Fix Version/s: 1.5.0
    • Component/s: SQL
    • Labels:

      Description

      Imagine the following HDFS paths:

      /data/split=00
      /data/split=01
      ...
      /data/split=FF

      If I have less than or equal to 10 partitions (00, 01, ... 09), currently partition recognition will treat column 'split' as integer column.

      If I have more than 10 partitions, column 'split' will be recognized as String...

      This is very confusing. So I'm suggesting to treat partition columns as String by default, and allow user to specify types if needed.

      Another example is date:
      /data/date=2015-04-01 => 'date' is String
      /data/date=20150401 => 'date' is Int

      Jianshi

        Attachments

          Activity

            People

            • Assignee:
              viirya L. C. Hsieh
              Reporter:
              huangjs Jianshi Huang
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: