Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9293

Impala Doc: Revise explanation of HDFS trashcan usage on S3

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • Impala 3.4.0
    • Docs
    • None
    • ghx-label-13

    Description

      The Impala docs state:

      By default, when you drop an internal (managed) table, the data files are moved to the HDFS trashcan. This operation is expensive for tables that reside on the Amazon S3 filesystem. Therefore, for S3 tables, prefer to use DROP TABLE table_name PURGE rather than the default DROP TABLE statement. The PURGE clause makes Impala delete the data files immediately, skipping the HDFS trashcan.

      and

      The default DROP TABLE/PARTITION is slow because Impala copies the files to the HDFS trash folder, and Impala waits until all the data is moved. DROP TABLE/PARTITION .. PURGE is a fast delete operation, and the Impala statement finishes quickly even though the change might not have propagated fully throughout S3.

      The confusing part is "Impala copies the files to the HDFS trash folder". Users might think that when a managed Impala table on S3 is dropped, Impala actually copies the data from S3 to a trashcan folder stored on HDFS. This isn't true. The term "HDFS trashcan" is used to refer to a feature of HDFS where all deleted data is moved to a trash folder rather than being deleted immediately. See https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html#File+Deletes+and+Undeletes for details.

      What actually happens is that there is a trashcan folder on S3 itself, and when a S3 managed table is dropped, the data is copied from from the managed table folder to the trashcan folder stored on S3.

      Attachments

        Activity

          People

            stakiar Sahil Takiar
            stakiar Sahil Takiar
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: