Description
For SQLContext (not HiveContext) it would be very convenient to support a virtual column that maps to part of the the file path, similar to what is done in Hive for partitions (e.g. /data/clicks/dt=2015-01-01/ where dt is a column of type TEXT).
The API could allow the user to type the column using an appropriate DataType instance. This new field could be addressed in SQL statements much the same as is done in Hive.
As a consequence, pruning of partitions could be possible when executing a query and also remove the need to materialize a column in each logical partition that is already encoded in the path name. Furthermore, this would provide an nice interop and migration strategy for Hive users who may one day use SQLContext directly.
Attachments
Issue Links
- is related to
-
SPARK-5182 Partitioning support for tables created by the data source API
- Resolved