Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-10275

Double buffer not flushing DB transactions

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • 1.4.0, 1.4.1
    • None
    • None

    Description

      While looking into snapshot diff failure because it could not load the snapshot because checkpointing dir doesn’t exist. Snapshot creation succeeded but checkpointing dir doesn’t exist because it happens inside double buffed flush.

      Looked at logs and there was no double buffer flush logs during that time.

      Snapshot creation request:

      2023-11-27 00:40:23,345 INFO [OM StateMachine ApplyTransaction Thread - 0]-org.apache.hadoop.ozone.om.request.snapshot.OMSnapshotCreateRequest: Created snapshot: 'snap-ay36z' with snapshotId: 'bf0c6141-4185-4361-b15f-c4aa71c5c6d8' under path 'vol-2xd36/buck-id806'
      

      Double Buffer flush logs:

      ...
      2023-11-27 00:10:23,826 INFO [OMDoubleBufferFlushThread]-org.apache.hadoop.hdds.utils.db.RDBCheckpointManager: Created checkpoint in rocksDB at /var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-b2e9acb3-fee2-4190-8272-0649edca8d93 in 30 milliseconds
      2023-11-27 00:10:23,827 INFO [OMDoubleBufferFlushThread]-org.apache.hadoop.hdds.utils.db.RDBCheckpointUtils: Waited for 1 milliseconds for checkpoint directory /var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-b2e9acb3-fee2-4190-8272-0649edca8d93 availability.
      2023-11-27 00:10:23,828 INFO [OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.OmSnapshotManager: Created checkpoint : /var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-b2e9acb3-fee2-4190-8272-0649edca8d93 for snapshot snap-mswq9
      2023-11-27 00:10:39,586 INFO [OMDoubleBufferFlushThread]-org.apache.hadoop.hdds.utils.db.RDBCheckpointManager: Created checkpoint in rocksDB at /var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-3369ac3a-61e1-4eca-b3cf-eb2de0b2d688 in 30 milliseconds
      2023-11-27 00:10:39,586 INFO [OMDoubleBufferFlushThread]-org.apache.hadoop.hdds.utils.db.RDBCheckpointUtils: Waited for 0 milliseconds for checkpoint directory /var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-3369ac3a-61e1-4eca-b3cf-eb2de0b2d688 availability.
      2023-11-27 00:10:39,587 INFO [OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.OmSnapshotManager: Created checkpoint : /var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-3369ac3a-61e1-4eca-b3cf-eb2de0b2d688 for snapshot snap-f5u3t
      2023-11-27 00:10:55,949 INFO [OMDoubleBufferFlushThread]-org.apache.hadoop.hdds.utils.db.RDBCheckpointManager: Created checkpoint in rocksDB at /var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-3a690c8f-f3ef-415d-b25c-3aaf763c9507 in 22 milliseconds
      2023-11-27 00:10:55,950 INFO [OMDoubleBufferFlushThread]-org.apache.hadoop.hdds.utils.db.RDBCheckpointUtils: Waited for 1 milliseconds for checkpoint directory /var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-3a690c8f-f3ef-415d-b25c-3aaf763c9507 availability.
      2023-11-27 00:10:55,950 INFO [OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.OmSnapshotManager: Created checkpoint : /var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-3a690c8f-f3ef-415d-b25c-3aaf763c9507 for snapshot snap-jfktn
      2023-11-29 08:52:24,698 INFO [OMDoubleBufferFlushThread]-org.apache.hadoop.hdds.utils.db.RDBCheckpointManager: Created checkpoint in rocksDB at /var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-c3ba17ef-d947-454e-9c4f-b9063ae65650 in 15 milliseconds
      2023-11-29 08:52:24,715 INFO [OMDoubleBufferFlushThread]-org.apache.hadoop.hdds.utils.db.RDBCheckpointUtils: Waited for 16 milliseconds for checkpoint directory /var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-c3ba17ef-d947-454e-9c4f-b9063ae65650 availability.
      2023-11-29 08:52:24,717 WARN [OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.OmSnapshotManager: Took 614733 ns to find endKey. Caller is deleteKeysFromDelKeyTableInSnapshotScope
      2023-11-29 08:52:24,718 INFO [OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.OmSnapshotManager: Created checkpoint : /var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-c3ba17ef-d947-454e-9c4f-b9063ae65650 for snapshot snap-ay36z
      2023-11-29 08:52:24,745 INFO [OMDoubleBufferFlushThread]-org.apache.hadoop.hdds.utils.db.RDBCheckpointManager: Created checkpoint in rocksDB at /var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-bf0c6141-4185-4361-b15f-c4aa71c5c6d8 in 12 milliseconds
      2023-11-29 08:52:24,746 INFO [OMDoubleBufferFlushThread]-org.apache.hadoop.hdds.utils.db.RDBCheckpointUtils: Waited for 0 milliseconds for checkpoint directory /var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-bf0c6141-4185-4361-b15f-c4aa71c5c6d8 availability.
      2023-11-29 08:52:24,747 INFO [OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.OmSnapshotManager: Created checkpoint : /var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-bf0c6141-4185-4361-b15f-c4aa71c5c6d8 for snapshot snap-ay36z
      ...
      

      Also looked if double buffer thread was terminated or paused but no log exists for that as well. I looked at the logs for the whole hour between last double buffer flush and check-pointing was not created. Couldn’t find any issue in that as well.

      On follower nodes, double buffer were working properly.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              hemantk Hemant Kumar
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: