Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6536

CREATE TABLE on S3 takes a very long time

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: Impala 2.9.0, Impala 2.10.0, Impala 2.11.0, Impala 3.0, Impala 2.12.0
    • Fix Version/s: None
    • Component/s: Catalog, Frontend

      Description

      Summary
      Creating a table that points to existing data in S3 can take an excessive amount of time.

      Reason
      If the Hive Metastore is configured with "hive.stats.autogather=true" then Hive lists the files of newly created tables to populate basic statistics like file count and file byte sizes. Unfortunately, this listing operation can take an excessive amount of time particularly on S3.

      Workaround

      • Reconfigure the Hive Metastore with "hive.stats.autogather=false"
      • Note that TBLPROPERTIES("DO_NOT_UPDATE_STATS"="true") does not address the issue due to a bug in Hive

      Related:
      https://issues.apache.org/jira/browse/HIVE-18743

      Example

      CREATE EXTERNAL TABLE tpch_lineitem_s3 (
        l_orderkey BIGINT,
        l_partkey BIGINT,
        l_suppkey BIGINT,
        l_linenumber BIGINT,
        l_quantity DECIMAL(12,2),
        l_extendedprice DECIMAL(12,2),
        l_discount DECIMAL(12,2),
        l_tax DECIMAL(12,2),
        l_returnflag STRING,
        l_linestatus STRING,
        l_shipdate STRING,
        l_commitdate STRING,
        l_receiptdate STRING,
        l_shipinstruct STRING,
        l_shipmode STRING,
        l_comment STRING
      )
      STORED AS PARQUET
      LOCATION "s3a://some_location/my_existing_data"
      

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              alex.behm Alexander Behm
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated: