Hive
  1. Hive
  2. HIVE-431

Auto-add table property "select" to be the select statement that created the table

    Details

    • Type: Wish Wish
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      A syntactic copy of the query that was used to fill a table would often be AMAZINGLY useful for figuring out where the data in the table came from.

      I think the best way to implement this would be to automatically add a table property which includes the SELECT statement. For partitioned tables, this would need to exist for each partition...or perhaps use some canonical name like selectquery for unpartitioned tables, plus selectquery_ds=<DATEID> for partitioned tables.

      This problem is growing as more and more tables in our database are generated by either "root" or by people who are no longer easy to contact.

        Activity

        Hide
        Zheng Shao added a comment -

        I guess the information is already in lineage.

        I think it's a good idea to keep lineage information away from the core metadata, especially given that we are going to have column lineage etc.
        But we should provide an easy way for users to retrieve the lineage information.

        Show
        Zheng Shao added a comment - I guess the information is already in lineage. I think it's a good idea to keep lineage information away from the core metadata, especially given that we are going to have column lineage etc. But we should provide an easy way for users to retrieve the lineage information.
        Hide
        Adam Kramer added a comment -

        Also note that for partitioned tables, the tableproperty could be generated accordingly. select?ds=2009-01-01&foo=bar

        Show
        Adam Kramer added a comment - Also note that for partitioned tables, the tableproperty could be generated accordingly. select?ds=2009-01-01&foo=bar
        Hide
        Namit Jain added a comment -

        This can be shown as part of describe extended (only the last partition will be shown by default)

        We can store it in the metastore.

        If we are supporting this, it would be useful to have a way to recover the exact statement used to create the table also.

        Show
        Namit Jain added a comment - This can be shown as part of describe extended (only the last partition will be shown by default) We can store it in the metastore. If we are supporting this, it would be useful to have a way to recover the exact statement used to create the table also.
        Hide
        Adam Kramer added a comment -

        Another note: This isn't done or represented in normal SQL at all because normal SQL allows for updates--so the SELECT query that generated the table's data could quickly become obsolete. Not so with Hive!

        This is also very useful metadata to search, as it lets us cross-index tables to know which tables feed which other tables. This will help us detect aggregate tables (to avoid re-aggregation) and to identify dependencies among tables (so if we change a given table's contents, we will have a good guess at who will be affected)

        Show
        Adam Kramer added a comment - Another note: This isn't done or represented in normal SQL at all because normal SQL allows for updates--so the SELECT query that generated the table's data could quickly become obsolete. Not so with Hive! This is also very useful metadata to search, as it lets us cross-index tables to know which tables feed which other tables. This will help us detect aggregate tables (to avoid re-aggregation) and to identify dependencies among tables (so if we change a given table's contents, we will have a good guess at who will be affected)

          People

          • Assignee:
            Unassigned
            Reporter:
            Adam Kramer
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development