Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-26389

ALTER TABLE CASCADE is slow for tables with many partitions

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 4.0.0-alpha-2
    • None
    • Metastore, Query Planning
    • None

    Description

      Consider the following simplified scenario with a table having two partitions.

      CREATE TABLE student (fname string, lname string) PARTITIONED BY (department string);
      INSERT INTO student VALUES ('Alex','Dumas', 'Computer Science');
      INSERT INTO student VALUES ('Victor','Hugo', 'Physics');
      

      Altering a column of this table and propagating the changes to the partitions (using the CASCADE) syntax is slow.

      ALTER TABLE student CHANGE lname lastname STRING CASCADE;
      

      The seemingly simple ALTER statement outlined above triggers roughly 136 SQL queries in the underlying DBMS of the metastore (see native_sql_queries.txt).

      We can observe that some of these queries are recurring and appear as many times as there are partitions in the table (see per_partition_sql_queries.txt).

      As the number of partitions grows so does the number of queries so if we manage to reduce the number of queries send per partition or make them more efficient this will have a positive impact on performance.

      Attachments

        1. native_sql_queries.txt
          27 kB
          Stamatis Zampetakis
        2. per_partition_sql_queries.txt
          5 kB
          Stamatis Zampetakis

        Activity

          People

            zabetak Stamatis Zampetakis
            zabetak Stamatis Zampetakis
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: