Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-50

Tag columns as partitioning columns

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Query Processor
    • None

    Description

      CREATE TABLE tname (INT cname1, INT pcol PARTITIONING )
      COMMENT 'This is a table'
      PARTITIONED BY(dt STRING)
      STORED AS SEQUENCEFILE;

      The goal here is to annotate a column as being a "partitioning" column. Consider pcol in the above example. It is annotated with 'PARTITIONING', which implies that the create table
      has

      PARTITIONED BY (dt, pcol)

      and every write to this table has implicitly

      INSERT OVERWRITE tname PARTITION (pcol='X')
      WHERE output.pcol = 'X'

      for every distinct value X that pcol takes.

      This is ideally an addition on top of the explicit partitioning that is already in the syntax, so that if I said

      INSERT OVERWRITE tname PARTITION (dt='D')

      it would still go into the partition (dt='D", pcol='Y') when the value of pcol is Y.

      It would be up to the user to make sure the cardinality of these columns is reasonable, and that enough data goes into each partition that there is some net benefit (just as it is in the explicit case).

      Attachments

        Activity

          People

            Unassigned Unassigned
            indigoviolet Venky Iyer
            Votes:
            2 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: