Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3440

S3 : Orphaned HDFS files after table location is altered

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Not A Bug
    • Impala 2.6.0
    • Impala 2.6.0
    • Backend

    Description

      Repro

      • Create partitioned table (Default FS is HDFS)
      • Insert 100 rows data into partitioned table
      • Alter table set location -> S3
      • Insert overwrite table limit 0
      • Select count on table returns 100 rows where 0 is expected

      Full repro

      create table orders_part_2 (O_ORDERKEY BIGINT,
       O_CUSTKEY BIGINT,
       O_ORDERSTATUS STRING,
       O_TOTALPRICE decimal(12,2),
       O_ORDERPRIORITY STRING,
       O_CLERK STRING,
       O_SHIPPRIORITY BIGINT,
       O_COMMENT STRING)
       partitioned by (O_ORDERDATE string)
       stored as parquet;
      
       insert into  etl_s3_parquet.orders_part_2 partition(o_orderdate) select O_ORDERKEY,O_CUSTKEY ,  O_ORDERSTATUS ,  O_TOTALPRICE ,  O_ORDERPRIORITY,  O_CLERK ,  O_SHIPPRIORITY,  O_COMMENT, O_ORDERDATE from  tpch_300_parquet_s3.orders limit 10000;
      Query: insert into  etl_s3_parquet.orders_part_2 partition(o_orderdate) select O_ORDERKEY,O_CUSTKEY ,  O_ORDERSTATUS ,  O_TOTALPRICE ,  O_ORDERPRIORITY,  O_CLERK ,  O_SHIPPRIORITY,  O_COMMENT, O_ORDERDATE from  tpch_300_parquet_s3.orders limit 10000
      WARNINGS: Cancelled (1 of 12 similar)
      
      Inserted 10000 row(s) in 66.39s
      
      
      alter table orders_part_2 set location 's3a://cloudera-impala-perf-w2/etl_s3_parquet.db/orders_part_2';
      
      [impala-s3-scale-2.vpc.cloudera.com:21000] > select count(*) from orders_part_2;
      Query: select count(*) from orders_part_2
      +----------+
      | count(*) |
      +----------+
      | 10000    |
      +----------+
      Fetched 1 row(s) in 1.20s
      
      insert overwrite table  etl_s3_parquet.orders_part_2 partition(o_orderdate) select O_ORDERKEY,O_CUSTKEY ,  O_ORDERSTATUS ,  O_TOTALPRICE ,  O_ORDERPRIORITY,  O_CLERK ,  O_SHIPPRIORITY,  O_COMMENT, O_ORDERDATE from  tpch_300_parquet_s3.orders limit 0;
      Query: insert overwrite table  etl_s3_parquet.orders_part_2 partition(o_orderdate) select O_ORDERKEY,O_CUSTKEY ,  O_ORDERSTATUS ,  O_TOTALPRICE ,  O_ORDERPRIORITY,  O_CLERK ,  O_SHIPPRIORITY,  O_COMMENT, O_ORDERDATE from  tpch_300_parquet_s3.orders limit 0
      Inserted 0 row(s) in 0.28s
      
      [impala-s3-scale-2.vpc.cloudera.com:21000] > select count(*) from etl_s3_parquet.orders_part_2;
      Query: select count(*) from etl_s3_parquet.orders_part_2
      +----------+
      | count(*) |
      +----------+
      | 10000    |
      +----------+
      Fetched 1 row(s) in 1.29s
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            sailesh Sailesh Mukil
            mmokhtar Mostafa Mokhtar
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment