Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-1292 [Umbrella] RFC-15 : File Listing and Query Planning Optimizations
  3. HUDI-1502

Restore on MOR table leaves metadata table out-of-sync from data table

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotVotersWatch issueWatchersConvert to IssueLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

      Description

      Below is the stack trace from running `TestHoodieBackedMetadata#testSync` on MOR tables. This seems like a more fundamental issue with deleting instant files, during restore.

      So what happens is that we restore which rolls back a delta commit that has not been synced yet. (20210103224054 in the e.g) And that delta commit has introduced a new log file, which has not been added to the metadata table. But the restore effectively deletes the 20210103224054.deltacommit.

      Commit 20210103224042 added HoodieKey { recordKey=2016/03/15 partitionPath=files} HoodieMetadataPayload {key=2016/03/15, type=2, creations=[6b8f2187-5505-40ae-845e-a71a2163d064-0_4-2-6_20210103224041.parquet], deletions=[], }
      	HoodieKey { recordKey=2015/03/16 partitionPath=files} HoodieMetadataPayload {key=2015/03/16, type=2, creations=[028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_1-9-19_20210103224042.parquet, 25c9a174-4c07-43a1-a1a2-40454a3f0310-0_0-9-18_20210103224042.parquet], deletions=[], }
      	HoodieKey { recordKey=2015/03/17 partitionPath=files} HoodieMetadataPayload {key=2015/03/17, type=2, creations=[2ab899de-4745-43c5-9fa4-d09721d3aa91-0_1-2-3_20210103224041.parquet, 4733dbda-7824-4411-a708-4b2d978f887b-0_4-9-22_20210103224042.parquet, 532a6f9b-ca89-4b96-84b7-0e3b13068b4b-0_3-9-21_20210103224042.parquet, 6842e596-46b3-4546-9faa-8a7f8c674a17-0_0-2-2_20210103224041.parquet, 7f0635d7-126e-40b6-9677-7fd8a123d5b9-0_3-2-5_20210103224041.parquet, d1906fdc-66ca-48a4-86b6-687c865d939d-0_2-9-20_20210103224042.parquet, fd446460-a662-434a-a6ab-1cd498af94ca-0_2-2-4_20210103224041.parquet], deletions=[], }
      	HoodieKey { recordKey=__all_partitions__ partitionPath=files} HoodieMetadataPayload {key=__all_partitions__, type=1, creations=[2015/03/16, 2015/03/17, 2016/03/15], deletions=[], } 
       Syncing [20210103224045__deltacommit__COMPLETED] to metadata table.
      Commit 20210103224045 added HoodieKey { recordKey=2016/03/15 partitionPath=files} HoodieMetadataPayload {key=2016/03/15, type=2, creations=[6b8f2187-5505-40ae-845e-a71a2163d064-0_0-31-52_20210103224045.parquet], deletions=[], }
      	HoodieKey { recordKey=2015/03/16 partitionPath=files} HoodieMetadataPayload {key=2015/03/16, type=2, creations=[25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet], deletions=[], }
      	HoodieKey { recordKey=2015/03/17 partitionPath=files} HoodieMetadataPayload {key=2015/03/17, type=2, creations=[2ab899de-4745-43c5-9fa4-d09721d3aa91-0_2-31-54_20210103224045.parquet], deletions=[], }
      	HoodieKey { recordKey=__all_partitions__ partitionPath=files} HoodieMetadataPayload {key=__all_partitions__, type=1, creations=[2015/03/16, 2015/03/17, 2016/03/15], deletions=[], } >>> (after compaction) State at 20210103224051 files 
       	 .028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_20210103224042.log.1_0-100-148 
      	 028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_1-9-19_20210103224042.parquet 
      	 028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_3-110-170_20210103224051.parquet 
      	 25c9a174-4c07-43a1-a1a2-40454a3f0310-0_0-9-18_20210103224042.parquet 
      	 25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet 
       
      >>> (after delete) State at 20210103224052 files 
       	 .028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_20210103224042.log.1_0-100-148 
      	 028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_1-9-19_20210103224042.parquet 
      	 028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_3-110-170_20210103224051.parquet 
      	 25c9a174-4c07-43a1-a1a2-40454a3f0310-0_0-9-18_20210103224042.parquet 
      	 25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet 
       
      >>> (after clean) State at 20210103224053 files 
       	 028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_3-110-170_20210103224051.parquet 
      	 25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet 
       
      >>> (after update) State at 20210103224054 files 
       	 .028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_20210103224051.log.1_1-160-262 
      	 .25c9a174-4c07-43a1-a1a2-40454a3f0310-0_20210103224045.log.1_2-160-263 
      	 028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_3-110-170_20210103224051.parquet 
      	 25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet 
       
      >>> (after restore) State after restore files 
       	 .028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_20210103224051.log.1_1-160-262 
      	 .028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_20210103224051.log.2_1-0-1 
      	 .25c9a174-4c07-43a1-a1a2-40454a3f0310-0_20210103224045.log.1_2-160-263 
      	 .25c9a174-4c07-43a1-a1a2-40454a3f0310-0_20210103224045.log.2_1-0-1 
      	 028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_3-110-170_20210103224051.parquet 
      	 25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet 
      
      Syncing [20210103224047__deltacommit__COMPLETED] to metadata table.
      Commit 20210103224047 added HoodieKey { recordKey=2015/03/17 partitionPath=files} HoodieMetadataPayload {key=2015/03/17, type=2, creations=[.2ab899de-4745-43c5-9fa4-d09721d3aa91-0_20210103224045.log.1_2-68-104, .532a6f9b-ca89-4b96-84b7-0e3b13068b4b-0_20210103224042.log.1_3-68-105, .6842e596-46b3-4546-9faa-8a7f8c674a17-0_20210103224041.log.1_1-68-103, .d1906fdc-66ca-48a4-86b6-687c865d939d-0_20210103224042.log.1_0-68-102], deletions=[], }
      	HoodieKey { recordKey=__all_partitions__ partitionPath=files} HoodieMetadataPayload {key=__all_partitions__, type=1, creations=[2015/03/17], deletions=[], } 
       Syncing [20210103224049__deltacommit__COMPLETED] to metadata table.
      Commit 20210103224049 added HoodieKey { recordKey=2015/03/16 partitionPath=files} HoodieMetadataPayload {key=2015/03/16, type=2, creations=[.028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_20210103224042.log.1_0-100-148], deletions=[], }
      	HoodieKey { recordKey=2015/03/17 partitionPath=files} HoodieMetadataPayload {key=2015/03/17, type=2, creations=[.2ab899de-4745-43c5-9fa4-d09721d3aa91-0_20210103224045.log.2_5-100-153, .4733dbda-7824-4411-a708-4b2d978f887b-0_20210103224042.log.1_1-100-149, .6842e596-46b3-4546-9faa-8a7f8c674a17-0_20210103224041.log.2_3-100-151, .7f0635d7-126e-40b6-9677-7fd8a123d5b9-0_20210103224041.log.1_6-100-154, .d1906fdc-66ca-48a4-86b6-687c865d939d-0_20210103224042.log.2_2-100-150, .fd446460-a662-434a-a6ab-1cd498af94ca-0_20210103224041.log.1_4-100-152], deletions=[], }
      	HoodieKey { recordKey=__all_partitions__ partitionPath=files} HoodieMetadataPayload {key=__all_partitions__, type=1, creations=[2015/03/16, 2015/03/17], deletions=[], } 
       Syncing [20210103224051__commit__COMPLETED] to metadata table.
      Commit 20210103224051 added HoodieKey { recordKey=2015/03/16 partitionPath=files} HoodieMetadataPayload {key=2015/03/16, type=2, creations=[028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_3-110-170_20210103224051.parquet], deletions=[], }
      	HoodieKey { recordKey=2015/03/17 partitionPath=files} HoodieMetadataPayload {key=2015/03/17, type=2, creations=[2ab899de-4745-43c5-9fa4-d09721d3aa91-0_0-110-167_20210103224051.parquet, 4733dbda-7824-4411-a708-4b2d978f887b-0_6-110-173_20210103224051.parquet, 532a6f9b-ca89-4b96-84b7-0e3b13068b4b-0_5-110-172_20210103224051.parquet, 6842e596-46b3-4546-9faa-8a7f8c674a17-0_1-110-168_20210103224051.parquet, 7f0635d7-126e-40b6-9677-7fd8a123d5b9-0_4-110-171_20210103224051.parquet, d1906fdc-66ca-48a4-86b6-687c865d939d-0_2-110-169_20210103224051.parquet, fd446460-a662-434a-a6ab-1cd498af94ca-0_7-110-174_20210103224051.parquet], deletions=[], }
      	HoodieKey { recordKey=__all_partitions__ partitionPath=files} HoodieMetadataPayload {key=__all_partitions__, type=1, creations=[2015/03/16, 2015/03/17], deletions=[], } 
       Syncing [20210103224053__clean__COMPLETED] to metadata table.
      
       Syncing [20210103224056__restore__COMPLETED] to metadata table.
      Restore Instant 20210103224056 metadata: 
      	[20210103224052] {2015/03/16={"partitionPath": "2015/03/16", "successDeleteFiles": [], "failedDeleteFiles": [], "appendFiles": {}}, 2015/03/17={"partitionPath": "2015/03/17", "successDeleteFiles": [], "failedDeleteFiles": [], "appendFiles": {}}, 2016/03/15={"partitionPath": "2016/03/15", "successDeleteFiles": [], "failedDeleteFiles": [], "appendFiles": {}}}
      Restore Instant 20210103224056 metadata: 
      	[20210103224054] {2015/03/16={"partitionPath": "2015/03/16", "successDeleteFiles": [], "failedDeleteFiles": [], "appendFiles": {"file:/var/folders/nm/pf0yxwtn5n5dftzh09zbx36h0000gn/T/junit3448725349106849265/dataset/2015/03/16/.028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_20210103224051.log.2_1-0-1": 99, "file:/var/folders/nm/pf0yxwtn5n5dftzh09zbx36h0000gn/T/junit3448725349106849265/dataset/2015/03/16/.25c9a174-4c07-43a1-a1a2-40454a3f0310-0_20210103224045.log.2_1-0-1": 99}}, 2015/03/17={"partitionPath": "2015/03/17", "successDeleteFiles": ["file:/var/folders/nm/pf0yxwtn5n5dftzh09zbx36h0000gn/T/junit3448725349106849265/dataset/2015/03/17/4733dbda-7824-4411-a708-4b2d978f887b-0_3-160-264_20210103224054.parquet"], "failedDeleteFiles": [], "appendFiles": {"file:/var/folders/nm/pf0yxwtn5n5dftzh09zbx36h0000gn/T/junit3448725349106849265/dataset/2015/03/17/.532a6f9b-ca89-4b96-84b7-0e3b13068b4b-0_20210103224051.log.2_1-0-1": 99, "file:/var/folders/nm/pf0yxwtn5n5dftzh09zbx36h0000gn/T/junit3448725349106849265/dataset/2015/03/17/.4733dbda-7824-4411-a708-4b2d978f887b-0_20210103224051.log.1_1-0-1": 99, "file:/var/folders/nm/pf0yxwtn5n5dftzh09zbx36h0000gn/T/junit3448725349106849265/dataset/2015/03/17/.d1906fdc-66ca-48a4-86b6-687c865d939d-0_20210103224051.log.2_1-0-1": 99, "file:/var/folders/nm/pf0yxwtn5n5dftzh09zbx36h0000gn/T/junit3448725349106849265/dataset/2015/03/17/.7f0635d7-126e-40b6-9677-7fd8a123d5b9-0_20210103224051.log.2_1-0-1": 99, "file:/var/folders/nm/pf0yxwtn5n5dftzh09zbx36h0000gn/T/junit3448725349106849265/dataset/2015/03/17/.6842e596-46b3-4546-9faa-8a7f8c674a17-0_20210103224051.log.2_1-0-1": 99}}, 2016/03/15={"partitionPath": "2016/03/15", "successDeleteFiles": [], "failedDeleteFiles": [], "appendFiles": {"file:/var/folders/nm/pf0yxwtn5n5dftzh09zbx36h0000gn/T/junit3448725349106849265/dataset/2016/03/15/.6b8f2187-5505-40ae-845e-a71a2163d064-0_20210103224045.log.2_1-0-1": 99}}}
      Rollback/Restore 20210103224056 
       deleted : 2015/03/17=[4733dbda-7824-4411-a708-4b2d978f887b-0_3-160-264_20210103224054.parquet], 
       appended : 2016/03/15={.6b8f2187-5505-40ae-845e-a71a2163d064-0_20210103224045.log.2_1-0-1=99}
      	2015/03/16={.25c9a174-4c07-43a1-a1a2-40454a3f0310-0_20210103224045.log.2_1-0-1=99, .028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_20210103224051.log.2_1-0-1=99}
      	2015/03/17={.7f0635d7-126e-40b6-9677-7fd8a123d5b9-0_20210103224051.log.2_1-0-1=99, .d1906fdc-66ca-48a4-86b6-687c865d939d-0_20210103224051.log.2_1-0-1=99, .6842e596-46b3-4546-9faa-8a7f8c674a17-0_20210103224051.log.2_1-0-1=99, .532a6f9b-ca89-4b96-84b7-0e3b13068b4b-0_20210103224051.log.2_1-0-1=99, .4733dbda-7824-4411-a708-4b2d978f887b-0_20210103224051.log.1_1-0-1=99} 
      >>> (after pending rollback) State at 20210103224920 files 
       	 .028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_20210103224051.log.1_1-160-262 
      	 .028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_20210103224051.log.2_1-0-1 
      	 .25c9a174-4c07-43a1-a1a2-40454a3f0310-0_20210103224045.log.1_2-160-263 
      	 .25c9a174-4c07-43a1-a1a2-40454a3f0310-0_20210103224045.log.2_1-0-1 
      	 028cc15e-85ef-4b6f-b6f1-a1aa01131dbc-0_3-110-170_20210103224051.parquet 
      	 25c9a174-4c07-43a1-a1a2-40454a3f0310-0_1-31-53_20210103224045.parquet 
       
      

      This means, when the sync later runs. It will not any chance of adding the log file to metadata table. i.e we end up with a missed log file. Below you can see that at the time the sync actually runs, it does not have the instants to sync

      vmacs:dataset vs$ ls -lrt /var/folders/nm/pf0yxwtn5n5dftzh09zbx36h0000gn/T/junit3448725349106849265/dataset/.hoodie/metadata/.hoodie/
      total 40
      drwxr-xr-x  2 vs  staff    64 Jan  3 22:40 archived
      -rw-r--r--  1 vs  staff   360 Jan  3 22:40 hoodie.properties
      -rw-r--r--  1 vs  staff     0 Jan  3 22:40 20210103224042.deltacommit.requested
      -rw-r--r--  1 vs  staff   982 Jan  3 22:40 20210103224042.deltacommit.inflight
      -rw-r--r--  1 vs  staff  1981 Jan  3 22:40 20210103224042.deltacommit
      -rw-r--r--  1 vs  staff     0 Jan  3 22:40 20210103224045.deltacommit.requested
      -rw-r--r--  1 vs  staff  1642 Jan  3 22:40 20210103224045.deltacommit.inflight
      -rw-r--r--  1 vs  staff  2225 Jan  3 22:40 20210103224045.deltacommit
      vmacs:dataset vs$ ls -lrt /var/folders/nm/pf0yxwtn5n5dftzh09zbx36h0000gn/T/junit3448725349106849265/dataset/.hoodie/
      total 224
      -rw-r--r--  1 vs  staff     0 Jan  3 22:40 20210103224041.deltacommit.requested
      -rw-r--r--  1 vs  staff   301 Jan  3 22:40 hoodie.properties
      -rw-r--r--  1 vs  staff     0 Jan  3 22:40 20210103224041.deltacommit.inflight
      -rw-r--r--  1 vs  staff  6070 Jan  3 22:40 20210103224041.deltacommit
      -rw-r--r--  1 vs  staff     0 Jan  3 22:40 20210103224042.deltacommit.requested
      -rw-r--r--  1 vs  staff     0 Jan  3 22:40 20210103224042.deltacommit.inflight
      -rw-r--r--  1 vs  staff  6080 Jan  3 22:40 20210103224042.deltacommit
      drwxr-xr-x  4 vs  staff   128 Jan  3 22:40 metadata
      -rw-r--r--  1 vs  staff     0 Jan  3 22:40 20210103224045.deltacommit.requested
      -rw-r--r--  1 vs  staff  2143 Jan  3 22:40 20210103224045.deltacommit.inflight
      -rw-r--r--  1 vs  staff  4490 Jan  3 22:40 20210103224045.deltacommit
      -rw-r--r--  1 vs  staff     0 Jan  3 22:40 20210103224047.deltacommit.requested
      -rw-r--r--  1 vs  staff  3619 Jan  3 22:40 20210103224047.deltacommit.inflight
      -rw-r--r--  1 vs  staff  6198 Jan  3 22:40 20210103224047.deltacommit
      -rw-r--r--  1 vs  staff     0 Jan  3 22:40 20210103224049.deltacommit.requested
      -rw-r--r--  1 vs  staff  6181 Jan  3 22:40 20210103224049.deltacommit.inflight
      -rw-r--r--  1 vs  staff  9651 Jan  3 22:40 20210103224049.deltacommit
      -rw-r--r--  1 vs  staff  4172 Jan  3 22:40 20210103224051.compaction.requested
      -rw-r--r--  1 vs  staff     0 Jan  3 22:40 20210103224051.compaction.inflight
      -rw-r--r--  1 vs  staff  8700 Jan  3 22:40 20210103224051.commit
      -rw-r--r--  1 vs  staff  5134 Jan  3 22:40 20210103224053.clean.requested
      -rw-r--r--  1 vs  staff  5134 Jan  3 22:40 20210103224053.clean.inflight
      -rw-r--r--  1 vs  staff  4539 Jan  3 22:40 20210103224053.clean
      -rw-r--r--  1 vs  staff     0 Jan  3 22:40 20210103224056.restore.inflight
      -rw-r--r--  1 vs  staff  3861 Jan  3 22:40 20210103224056.restore
      vmacs:dataset vs$  

        Attachments

          Activity

          $i18n.getText('security.level.explanation', $currentSelection) Viewable by All Users
          Cancel

            People

            • Assignee:
              vinoth Vinoth Chandar Assign to me
              Reporter:
              vinoth Vinoth Chandar

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment