Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-5302

Add support for SQLContext "partition" columns

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.4.0
    • SQL
    • None

    Description

      For SQLContext (not HiveContext) it would be very convenient to support a virtual column that maps to part of the the file path, similar to what is done in Hive for partitions (e.g. /data/clicks/dt=2015-01-01/ where dt is a column of type TEXT).

      The API could allow the user to type the column using an appropriate DataType instance. This new field could be addressed in SQL statements much the same as is done in Hive.

      As a consequence, pruning of partitions could be possible when executing a query and also remove the need to materialize a column in each logical partition that is already encoded in the path name. Furthermore, this would provide an nice interop and migration strategy for Hive users who may one day use SQLContext directly.

      Attachments

        Issue Links

          Activity

            People

              lian cheng Cheng Lian
              btiernay Bob Tiernay
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: