Uploaded image for project: 'CarbonData'
  1. CarbonData
  2. CARBONDATA-1690

Query failed after swap table by renaming

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.3.0
    • 1.3.0
    • spark-integration
    • None

    Description

      1. SCENARIO

      I encountered query error after swap table by renaming table. Steps to reproduce this bug are listed as below.

      These steps work fine:

      1. CREATE TABLE `t1`;
      2. LOAD DATA TO `t1`;
      3. CREATE TABLE `t2`;
      4. LOAD DATA TO `t2`;
      5. RENAME `t1` TO `t3`;
      6. RENAME `t2` TO `t1`;
      7. QUERY `t1`;

      These steps work wrong:

      1. CREATE TABLE `t1`;
      2. LOAD DATA TO `t1`;
      3. CREATE TABLE `t2`;
      4. LOAD DATA TO `t2`;
      *5. QUERY `t1`;* — Added this step
      6. RENAME `t1` TO `t3`;
      7. RENAME `t2` TO `t1`;
      8. QUERY `t1`; — This step will cause failure

      The above two scenario differs from that the second one add Step5 and the error will be thrown in Step8. The error message in sparksql shell looks like
      ```
      Error: java.io.FileNotFoundException: File hdfs://slave1:9000/carbonstore/default/test_table/Fact/Part0/Segment_0/part-0-0_batchno0-0-1510144676427.carbondata does not exist. (state=,code=0)
      ```

      1. Analyze

      Renaming table name in carbondata actually is done through renaming the corresponding data folder name. In addition, carbondata also refresh the metadata and its cache.

      Having seen from the error message above, we find that the file name is exactly the one before rename operation. We guess the problems may lies in data map.

      In the second scenario, before renaming, when we query `t1 ` (Step5), the corresponding data map will be loaded and cached. Since data map is table name based, when we query `t1` again (Step8) after renaming, the previous data map will be used, which is outdated and incorrect, thus will cause the `FileNotFoundException` error.

      In the first scenario, when we query `t1` (Step7), it is the first time to load the data map, so the correct data will be readed, that's why it acts OK.

      1. Resolve

      There are two ways to fix this bug:

      1. Change the index key of Data Map. Use `table_name + table_schema_last_update_time` in replace of `table_name`.

      2. Clear corresponding Data Map when doing renaming operation.

      I prefer the second one since it is easy to implement —— just one line of code.

      Attachments

        Issue Links

          Activity

            People

              xuchuanyin Chuanyin Xu
              xuchuanyin Chuanyin Xu
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1.5h
                  1.5h