Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-23816

Concurrent access of metastore dynamic partition registration API resulting in data loss due to HDFS dir deletion

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • HiveServer2
    • None

    Description

      During the process of partition registration via thrift api we are noticing that the HDFS file path associated is being deleted even though the path was not created by the same process. 

      This results in loss of data in the dir path.  In the below example there are 3 threads that is trying to create a dir and only one of succeeds in registering a partition , resulting the other 2 threads deleting the directory created and registered by the original thread. 

      hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,307 INFO org.apache.hadoop.hive.common.FileUtils: [pool-5-thread-379217]: Creating directory if it doesn't exist: hdfs://test_path/dt=2020-07-02/hhmm-0850
      hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,308 INFO org.apache.hadoop.hive.common.FileUtils: [pool-5-thread-386717]: Creating directory if it doesn't exist: hdfs://test_path/dt=2020-07-02/hhmm-0850
      hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,308 INFO org.apache.hadoop.hive.common.FileUtils: [pool-5-thread-379074]: Creating directory if it doesn't exist: hdfs://test_path/dt=2020-07-02/hhmm-0850
      hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,314 INFO hive.metastore.hivemetastoressimpl: [pool-5-thread-386717]: deleting hdfs://test_path/dt=2020-07-02/hhmm-0850
      hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,315 INFO hive.metastore.hivemetastoressimpl: [pool-5-thread-379217]: deleting hdfs://test_path/dt=2020-07-02/hhmm-0850
      hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,321 INFO org.apache.hadoop.fs.TrashPolicyDefault: [pool-5-thread-386717]: Moved: 'hdfs://test_path/dt=2020-07-02/hhmm-0850' to trash at: hdfs://user/test/.Trash/Current/test/dt=2020-07-02/hhmm=0850
      hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,321 INFO hive.metastore.hivemetastoressimpl: [pool-5-thread-386717]: Moved to trash: hdfs://test_path/dt=2020-07-02/hhmm-0850
      hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,323 ERROR hive.log: [pool-5-thread-379217]: Got exception: java.io.IOException Failed to move to trash: hdfs://test_path/dt=2020-07-02/hhmm-0850
      hadoop-cmf-hive-HIVEMETASTORE-******.41:java.io.IOException: Failed to move to trash: hdfs://test_path/dt=2020-07-02/hhmm-0850
      hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,328 ERROR org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-5-thread-379217]: MetaException(message:Got exception: java.io.IOException Failed to move to trash: hdfs://test_path/dt=2020-07-02/hhmm-0850)

       

      Attachments

        Activity

          People

            ramkrish1489 rameshkrishnan muthusamy
            ramkrish1489 rameshkrishnan muthusamy
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: