Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3440

S3 : Orphaned HDFS files after table location is altered

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Not A Bug
    • Affects Version/s: Impala 2.6.0
    • Fix Version/s: Impala 2.6.0
    • Component/s: Backend
    • Labels:

      Description

      Repro

      • Create partitioned table (Default FS is HDFS)
      • Insert 100 rows data into partitioned table
      • Alter table set location -> S3
      • Insert overwrite table limit 0
      • Select count on table returns 100 rows where 0 is expected

      Full repro

      create table orders_part_2 (O_ORDERKEY BIGINT,
       O_CUSTKEY BIGINT,
       O_ORDERSTATUS STRING,
       O_TOTALPRICE decimal(12,2),
       O_ORDERPRIORITY STRING,
       O_CLERK STRING,
       O_SHIPPRIORITY BIGINT,
       O_COMMENT STRING)
       partitioned by (O_ORDERDATE string)
       stored as parquet;
      
       insert into  etl_s3_parquet.orders_part_2 partition(o_orderdate) select O_ORDERKEY,O_CUSTKEY ,  O_ORDERSTATUS ,  O_TOTALPRICE ,  O_ORDERPRIORITY,  O_CLERK ,  O_SHIPPRIORITY,  O_COMMENT, O_ORDERDATE from  tpch_300_parquet_s3.orders limit 10000;
      Query: insert into  etl_s3_parquet.orders_part_2 partition(o_orderdate) select O_ORDERKEY,O_CUSTKEY ,  O_ORDERSTATUS ,  O_TOTALPRICE ,  O_ORDERPRIORITY,  O_CLERK ,  O_SHIPPRIORITY,  O_COMMENT, O_ORDERDATE from  tpch_300_parquet_s3.orders limit 10000
      WARNINGS: Cancelled (1 of 12 similar)
      
      Inserted 10000 row(s) in 66.39s
      
      
      alter table orders_part_2 set location 's3a://cloudera-impala-perf-w2/etl_s3_parquet.db/orders_part_2';
      
      [impala-s3-scale-2.vpc.cloudera.com:21000] > select count(*) from orders_part_2;
      Query: select count(*) from orders_part_2
      +----------+
      | count(*) |
      +----------+
      | 10000    |
      +----------+
      Fetched 1 row(s) in 1.20s
      
      insert overwrite table  etl_s3_parquet.orders_part_2 partition(o_orderdate) select O_ORDERKEY,O_CUSTKEY ,  O_ORDERSTATUS ,  O_TOTALPRICE ,  O_ORDERPRIORITY,  O_CLERK ,  O_SHIPPRIORITY,  O_COMMENT, O_ORDERDATE from  tpch_300_parquet_s3.orders limit 0;
      Query: insert overwrite table  etl_s3_parquet.orders_part_2 partition(o_orderdate) select O_ORDERKEY,O_CUSTKEY ,  O_ORDERSTATUS ,  O_TOTALPRICE ,  O_ORDERPRIORITY,  O_CLERK ,  O_SHIPPRIORITY,  O_COMMENT, O_ORDERDATE from  tpch_300_parquet_s3.orders limit 0
      Inserted 0 row(s) in 0.28s
      
      [impala-s3-scale-2.vpc.cloudera.com:21000] > select count(*) from etl_s3_parquet.orders_part_2;
      Query: select count(*) from etl_s3_parquet.orders_part_2
      +----------+
      | count(*) |
      +----------+
      | 10000    |
      +----------+
      Fetched 1 row(s) in 1.29s
      

        Attachments

          Activity

            People

            • Assignee:
              sailesh Sailesh Mukil
              Reporter:
              mmokhtar Mostafa Mokhtar
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: