Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14886

File deduplication in FSOP is not used correctly for list bucketing

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      I am making things work for MM tables, so I noticed this after adding the logging to removeTempOrDuplicateFiles/2 method that is called from FSOP:

            } else /* sershe: means "if !isTempPath(one)" */ {
              String taskId = getPrefixedTaskIdFromFilename(one.getPath().getName());
              Utilities.LOG14535.info("removeTempOrDuplicateFiles pondering " + one.getPath() + ", taskId " + taskId);
      

      This is called from FSOP jobCloseOp, via Utilities.mvFileToFinalPath, then via non-dynpart path in removeTempOrDuplicateFiles/4.
      taskId line is from the original code, so it's used later to decide on the fate of the file.
      The files passed in are from the root of the table, disregarding list bucketing, so what happens is this:

      2016-10-03T19:01:38,615  INFO [912dde0f-91af-4a27-b358-5d782897ed1d main] Log14535: removeTempOrDuplicateFiles pondering hdfs://localhost:63026/build/ql/test/data/warehouse/skew_mm/.hive-staging_hive_2016-10-03_19-01-38_324_9113577068018508885-1/_tmp.-ext-10000/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME, taskId HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME
      2016-10-03T19:01:38,616  INFO [912dde0f-91af-4a27-b358-5d782897ed1d main] Log14535: removeTempOrDuplicateFiles pondering hdfs://localhost:63026/build/ql/test/data/warehouse/skew_mm/.hive-staging_hive_2016-10-03_19-01-38_324_9113577068018508885-1/_tmp.-ext-10000/k1=0, taskId 0 [sershe: this is only true by coincidence, task if comes from k1 value]
      

      When I started calling the method correctly on MM path, it started deleting files for different LB directories thinking they are the same stuff... so, some special logic may be needed for this similar to dpCtx.

      Attachments

        Activity

          People

            Unassigned Unassigned
            sershe Sergey Shelukhin
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: