Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1793

When out of disk space, LBM can corrupt data files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 1.1.0
    • 1.2.0
    • None
    • None

    Description

      The log block manager can corrupt a container data file when the following conditions are met:

      1. A data directory runs out of disk space.
      2. The operation in question is a merge compaction (that is, the server does not crash).
      3. The data directory eventually empties somewhat, allowing the server to recover.

      When all of these conditions are met, the changes introduced by commit abea8c6 (released in 1.1.0) may cause the container's bookkeeping to become somewhat inconsistent. Specifically, if the data dir has enough free space such that the container is able to append some data belonging to a new block but not finalize that block, an unexpected "hole" may be added to the container.

      When the server is restarted, the container's bookkeeping doesn't account for this hole, leading to data being overwritten when a new block is appended to the container. Moreover, commit 4aacaf6 (not yet released) exacerbates the issue by causing the LBM to explicitly truncate the container at the wrong place during startup, yielding immediate data loss.

      This case was observed in an internal Cloudera cluster.

      Attachments

        Activity

          People

            adar Adar Dembo
            adar Adar Dembo
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: