Hive
  1. Hive
  2. HIVE-957

Partition Metadata and Table Metadata

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.5.0
    • Component/s: Metastore
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Right now, we choose to use partition lever metadata. All metadata (column names, column types, fileformat, serde class, serde properties) right now are from partition level metadata. But hive does not support a method now to alter all existing partitions' metadata, so users mostly choose to alter table metadata, and think hive will use the new table level metadata.
      One approach is that we may need to provide a way to let user alter all partitions' metadata with one simple command. Right now a short term solution is to only get fileformat, serde class metadata from paritition level metadata, and use all other metadata from table.

      any comments?

      1. hive-957-2009-11-30-3.patch
        379 kB
        He Yongqiang
      2. hive-957-2009-11-30-2.patch
        379 kB
        He Yongqiang
      3. hive-957-2009-11-30.patch
        359 kB
        He Yongqiang

        Issue Links

          Activity

          He Yongqiang created issue -
          He Yongqiang made changes -
          Field Original Value New Value
          Link This issue is related to HIVE-922 [ HIVE-922 ]
          He Yongqiang made changes -
          Link This issue blocks HIVE-922 [ HIVE-922 ]
          Hide
          Zheng Shao added a comment -

          Right now, we choose to use partition lever metadata. All metadata (column names, column types, fileformat, serde class, serde properties) right now are from partition level metadata. But hive does not support a method now to alter all existing partitions' metadata, so users mostly choose to alter table metadata, and think hive will use the new table level metadata.

          In our environment, we have been seeing users calling "alter table" on table-level meta data to:

          • Replace the column separator for all partitions (because they created the table with wrong separator)
          • Append additional columns
          • Rename existing columns
          Show
          Zheng Shao added a comment - Right now, we choose to use partition lever metadata. All metadata (column names, column types, fileformat, serde class, serde properties) right now are from partition level metadata. But hive does not support a method now to alter all existing partitions' metadata, so users mostly choose to alter table metadata, and think hive will use the new table level metadata. In our environment, we have been seeing users calling "alter table" on table-level meta data to: Replace the column separator for all partitions (because they created the table with wrong separator) Append additional columns Rename existing columns
          Hide
          Prasad Chakka added a comment -

          it is not that much code to write a new command to alter all the partitions. the metadata calls already exist, only grammar needs to be enhanced.

          Show
          Prasad Chakka added a comment - it is not that much code to write a new command to alter all the partitions. the metadata calls already exist, only grammar needs to be enhanced.
          Hide
          He Yongqiang added a comment -

          upload a quick fix to let hive always get column names, column types, serde parameters from table metadata. And get others from partition level metadata.

          Agree with Prasad, we should support a command to alter all partitions' metadata. But is there a need to maintain partition metadata for column names, column types?

          Show
          He Yongqiang added a comment - upload a quick fix to let hive always get column names, column types, serde parameters from table metadata. And get others from partition level metadata. Agree with Prasad, we should support a command to alter all partitions' metadata. But is there a need to maintain partition metadata for column names, column types?
          He Yongqiang made changes -
          Attachment hive-957-2009-11-30.patch [ 12426475 ]
          Hide
          Namit Jain added a comment -

          Unless we change the query plan to work with different partitions - why should we allow the user to change the partition metadata ?
          Agreed, it will work for some cases (file format, separator etc,), but it will not work for others (columns).

          Anyway, the patch looks good to me. But, I don't see the problem it is solving.

          Show
          Namit Jain added a comment - Unless we change the query plan to work with different partitions - why should we allow the user to change the partition metadata ? Agreed, it will work for some cases (file format, separator etc,), but it will not work for others (columns). Anyway, the patch looks good to me. But, I don't see the problem it is solving.
          Hide
          Zheng Shao added a comment -

          A. Replace the column separator for all partitions (because they created the table with wrong separator)
          Users need to change partition metadata.

          B. Append additional columns
          In the long term, users don't want to change partition metadata for this - We should fix Hive execution time to support missing columns (returning NULL) ( S1: Currently Hive execution time will output an error if the column in the expression is missing from the row object)

          C. Rename existing columns
          Users need to change partition metadata.

          This patch will solve the problem caused by the following 3 items:
          1. "S1" above
          2. Hive execution time using partition metadata
          3. User append additional columns by changing table metadata only

          Show
          Zheng Shao added a comment - A. Replace the column separator for all partitions (because they created the table with wrong separator) Users need to change partition metadata. B. Append additional columns In the long term, users don't want to change partition metadata for this - We should fix Hive execution time to support missing columns (returning NULL) ( S1: Currently Hive execution time will output an error if the column in the expression is missing from the row object) C. Rename existing columns Users need to change partition metadata. This patch will solve the problem caused by the following 3 items: 1. "S1" above 2. Hive execution time using partition metadata 3. User append additional columns by changing table metadata only
          Zheng Shao made changes -
          Summary Partiition Metadata and Table Metadata Partition Metadata and Table Metadata
          Hide
          Namit Jain added a comment -

          OK - but can you add a new test :

          create table T – 3 cols
          create partition P1 --> will have 3 columns
          alter table T add column 4
          create partition P4 --> will have 4 columns

          select c1,c2,c3,c4 from T (include both P1 and P2).

          Show
          Namit Jain added a comment - OK - but can you add a new test : create table T – 3 cols create partition P1 --> will have 3 columns alter table T add column 4 create partition P4 --> will have 4 columns select c1,c2,c3,c4 from T (include both P1 and P2).
          Namit Jain made changes -
          Assignee He Yongqiang [ he yongqiang ]
          Hide
          He Yongqiang added a comment -

          Added a testcase for it.

          Show
          He Yongqiang added a comment - Added a testcase for it.
          He Yongqiang made changes -
          Attachment hive-957-2009-11-30-2.patch [ 12426488 ]
          Hide
          Namit Jain added a comment -

          In the new test, can you drop the table that you created at the end ?

          Show
          Namit Jain added a comment - In the new test, can you drop the table that you created at the end ?
          Hide
          He Yongqiang added a comment -

          Added the drop statement in the new patch. Thanks, Namit!

          Show
          He Yongqiang added a comment - Added the drop statement in the new patch. Thanks, Namit!
          He Yongqiang made changes -
          Attachment hive-957-2009-11-30-3.patch [ 12426490 ]
          Hide
          Namit Jain added a comment -

          Committed. Thanks Yongqiang

          Show
          Namit Jain added a comment - Committed. Thanks Yongqiang
          Namit Jain made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Fix Version/s 0.5.0 [ 12314156 ]
          Resolution Fixed [ 1 ]
          Carl Steinbach made changes -
          Component/s Metastore [ 12312584 ]
          Carl Steinbach made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          7h 47m 1 Namit Jain 01/Dec/09 06:25
          Resolved Resolved Closed Closed
          745d 17h 40m 1 Carl Steinbach 17/Dec/11 00:06

            People

            • Assignee:
              He Yongqiang
              Reporter:
              He Yongqiang
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development