Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1586

If a single op is larger than consensus_max_batch_size_bytes, consensus gets stuck

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 0.10.0
    • 1.0.0
    • consensus
    • None

    Description

      I noticed on a cluster test that a leader was spinning with log messages like:

      I0829 14:17:31.870786 22184 log_cache.cc:307] T e7cacfdb22744496a6d5d66227a69823 P 5d15962d2f2445b1ba15b93ead4fb31b: Successfully read 1 ops from disk (866604..866604)
      I0829 14:17:31.873234 6186 log_cache.cc:307] T e7cacfdb22744496a6d5d66227a69823 P 5d15962d2f2445b1ba15b93ead4fb31b: Successfully read 1 ops from disk (866604..866604)
      I0829 14:17:31.875713 22184 log_cache.cc:307] T e7cacfdb22744496a6d5d66227a69823 P 5d15962d2f2445b1ba15b93ead4fb31b: Successfully read 1 ops from disk (866604..866604)
      I0829 14:17:31.878078 6186 log_cache.cc:307] T e7cacfdb22744496a6d5d66227a69823 P 5d15962d2f2445b1ba15b93ead4fb31b: Successfully read 1 ops from disk (866604..866604)

      After investigation, it seems this op was larger than 1MB (default consensus batch size) and this caused this tight loop behavior with no progress.

      Attachments

        Activity

          People

            tlipcon Todd Lipcon
            tlipcon Todd Lipcon
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: