Hive
  1. Hive
  2. HIVE-957

Partition Metadata and Table Metadata

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.5.0
    • Component/s: Metastore
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Right now, we choose to use partition lever metadata. All metadata (column names, column types, fileformat, serde class, serde properties) right now are from partition level metadata. But hive does not support a method now to alter all existing partitions' metadata, so users mostly choose to alter table metadata, and think hive will use the new table level metadata.
      One approach is that we may need to provide a way to let user alter all partitions' metadata with one simple command. Right now a short term solution is to only get fileformat, serde class metadata from paritition level metadata, and use all other metadata from table.

      any comments?

      1. hive-957-2009-11-30.patch
        359 kB
        He Yongqiang
      2. hive-957-2009-11-30-2.patch
        379 kB
        He Yongqiang
      3. hive-957-2009-11-30-3.patch
        379 kB
        He Yongqiang

        Issue Links

          Activity

          Hide
          Zheng Shao added a comment -

          Right now, we choose to use partition lever metadata. All metadata (column names, column types, fileformat, serde class, serde properties) right now are from partition level metadata. But hive does not support a method now to alter all existing partitions' metadata, so users mostly choose to alter table metadata, and think hive will use the new table level metadata.

          In our environment, we have been seeing users calling "alter table" on table-level meta data to:

          • Replace the column separator for all partitions (because they created the table with wrong separator)
          • Append additional columns
          • Rename existing columns
          Show
          Zheng Shao added a comment - Right now, we choose to use partition lever metadata. All metadata (column names, column types, fileformat, serde class, serde properties) right now are from partition level metadata. But hive does not support a method now to alter all existing partitions' metadata, so users mostly choose to alter table metadata, and think hive will use the new table level metadata. In our environment, we have been seeing users calling "alter table" on table-level meta data to: Replace the column separator for all partitions (because they created the table with wrong separator) Append additional columns Rename existing columns
          Hide
          Prasad Chakka added a comment -

          it is not that much code to write a new command to alter all the partitions. the metadata calls already exist, only grammar needs to be enhanced.

          Show
          Prasad Chakka added a comment - it is not that much code to write a new command to alter all the partitions. the metadata calls already exist, only grammar needs to be enhanced.
          Hide
          He Yongqiang added a comment -

          upload a quick fix to let hive always get column names, column types, serde parameters from table metadata. And get others from partition level metadata.

          Agree with Prasad, we should support a command to alter all partitions' metadata. But is there a need to maintain partition metadata for column names, column types?

          Show
          He Yongqiang added a comment - upload a quick fix to let hive always get column names, column types, serde parameters from table metadata. And get others from partition level metadata. Agree with Prasad, we should support a command to alter all partitions' metadata. But is there a need to maintain partition metadata for column names, column types?
          Hide
          Namit Jain added a comment -

          Unless we change the query plan to work with different partitions - why should we allow the user to change the partition metadata ?
          Agreed, it will work for some cases (file format, separator etc,), but it will not work for others (columns).

          Anyway, the patch looks good to me. But, I don't see the problem it is solving.

          Show
          Namit Jain added a comment - Unless we change the query plan to work with different partitions - why should we allow the user to change the partition metadata ? Agreed, it will work for some cases (file format, separator etc,), but it will not work for others (columns). Anyway, the patch looks good to me. But, I don't see the problem it is solving.
          Hide
          Zheng Shao added a comment -

          A. Replace the column separator for all partitions (because they created the table with wrong separator)
          Users need to change partition metadata.

          B. Append additional columns
          In the long term, users don't want to change partition metadata for this - We should fix Hive execution time to support missing columns (returning NULL) ( S1: Currently Hive execution time will output an error if the column in the expression is missing from the row object)

          C. Rename existing columns
          Users need to change partition metadata.

          This patch will solve the problem caused by the following 3 items:
          1. "S1" above
          2. Hive execution time using partition metadata
          3. User append additional columns by changing table metadata only

          Show
          Zheng Shao added a comment - A. Replace the column separator for all partitions (because they created the table with wrong separator) Users need to change partition metadata. B. Append additional columns In the long term, users don't want to change partition metadata for this - We should fix Hive execution time to support missing columns (returning NULL) ( S1: Currently Hive execution time will output an error if the column in the expression is missing from the row object) C. Rename existing columns Users need to change partition metadata. This patch will solve the problem caused by the following 3 items: 1. "S1" above 2. Hive execution time using partition metadata 3. User append additional columns by changing table metadata only
          Hide
          Namit Jain added a comment -

          OK - but can you add a new test :

          create table T – 3 cols
          create partition P1 --> will have 3 columns
          alter table T add column 4
          create partition P4 --> will have 4 columns

          select c1,c2,c3,c4 from T (include both P1 and P2).

          Show
          Namit Jain added a comment - OK - but can you add a new test : create table T – 3 cols create partition P1 --> will have 3 columns alter table T add column 4 create partition P4 --> will have 4 columns select c1,c2,c3,c4 from T (include both P1 and P2).
          Hide
          He Yongqiang added a comment -

          Added a testcase for it.

          Show
          He Yongqiang added a comment - Added a testcase for it.
          Hide
          Namit Jain added a comment -

          In the new test, can you drop the table that you created at the end ?

          Show
          Namit Jain added a comment - In the new test, can you drop the table that you created at the end ?
          Hide
          He Yongqiang added a comment -

          Added the drop statement in the new patch. Thanks, Namit!

          Show
          He Yongqiang added a comment - Added the drop statement in the new patch. Thanks, Namit!
          Hide
          Namit Jain added a comment -

          Committed. Thanks Yongqiang

          Show
          Namit Jain added a comment - Committed. Thanks Yongqiang

            People

            • Assignee:
              He Yongqiang
              Reporter:
              He Yongqiang
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development