Uploaded image for project: 'Ratis'
  1. Ratis
  2. RATIS-815

Log entry corrupted with 0 checksum

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 0.5.0
    • 1.0.0
    • server
    • None

    Description

      After writing a few large keys (128MB) with very small chunks size (64KB) in Ozone, Ratis reports log entry corruption due to checksum error:

      2020-02-13 12:01:41 INFO  SegmentedRaftLogWorker:396 - e5e4fd1e-aa81-48a2-98f9-b1ba24531624@group-B85226EEE236-SegmentedRaftLogWorker: Rolling segment log-62379_62465 to index:62465
      2020-02-13 12:01:41 INFO  SegmentedRaftLogWorker:541 - e5e4fd1e-aa81-48a2-98f9-b1ba24531624@group-B85226EEE236-SegmentedRaftLogWorker: Rolled log segment from /data/metadata/ratis/f89fc072-9ee9-459b-85d1-b85226eee236/current/log_inprogress_62379 to /data/metadata/ratis/f89fc072-9ee9-459b-85d1-b85226eee236/current/log_62379-62465
      2020-02-13 12:01:41 INFO  SegmentedRaftLogWorker:583 - e5e4fd1e-aa81-48a2-98f9-b1ba24531624@group-B85226EEE236-SegmentedRaftLogWorker: created new log segment /data/metadata/ratis/f89fc072-9ee9-459b-85d1-b85226eee236/current/log_inprogress_62466
      2020-02-13 12:01:41 ERROR LogAppender:81 - e5e4fd1e-aa81-48a2-98f9-b1ba24531624@group-B85226EEE236->ac5b3434-874b-4375-8a03-989e8c7fb692-GrpcLogAppender-AppenderDaemon failed RaftLog
      org.apache.ratis.server.raftlog.RaftLogIOException: org.apache.ratis.protocol.ChecksumException: Log entry corrupted: Calculated checksum is CDFED097 but read checksum is 00000000.
      	at org.apache.ratis.server.raftlog.segmented.LogSegment.loadCache(LogSegment.java:311)
      	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.get(SegmentedRaftLog.java:292)
      	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.getEntryWithData(SegmentedRaftLog.java:297)
      	at org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:213)
      	at org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:179)
      	at org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:122)
      	at org.apache.ratis.server.impl.LogAppender$AppenderDaemon.run(LogAppender.java:77)
      	at java.lang.Thread.run(Thread.java:748)
      Caused by: org.apache.ratis.protocol.ChecksumException: Log entry corrupted: Calculated checksum is CDFED097 but read checksum is 00000000.
      	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.decodeEntry(SegmentedRaftLogReader.java:312)
      	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.readEntry(SegmentedRaftLogReader.java:194)
      	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogInputStream.nextEntry(SegmentedRaftLogInputStream.java:129)
      	at org.apache.ratis.server.raftlog.segmented.LogSegment.readSegmentFile(LogSegment.java:98)
      	at org.apache.ratis.server.raftlog.segmented.LogSegment$LogEntryLoader.load(LogSegment.java:202)
      	at org.apache.ratis.server.raftlog.segmented.LogSegment.loadCache(LogSegment.java:309)
      	... 7 more
      

      Steps to reproduce:

      1. Configure Ozone with 64KB chunk size and slightly higher buffer sizes:

      ozone.scm.chunk.size: 64KB
      ozone.client.stream.buffer.flush.size: 256KB
      ozone.client.stream.buffer.max.size: 1MB
      

      2. Run Freon:

      ozone freon ockg -n 1 -t 1 -p warmup
      ozone freon ockg -p test -t 8 -s 134217728 -n 32
      

      Interestingly, even log_5106-5509 has invalid entry (according to log dump utility):

      Processing Raft Log file: /data/metadata/ratis/f89fc072-9ee9-459b-85d1-b85226eee236/current/log_5106-5509 size:1030796
      ...
      (t:1, i:5161), STATEMACHINELOGENTRY, client-296B6A48E40D, cid=3307
      Exception in thread "main" org.apache.ratis.protocol.ChecksumException: Log entry corrupted: Calculated checksum is 926127AE but read checksum is 00000000.
      

      Attachments

        1. dumps.tar.gz
          85 kB
          Lokesh Jain
        2. logs.tar.gz
          13.41 MB
          Lokesh Jain
        3. RATIS-815.temp.patch
          2 kB
          Lokesh Jain
        4. r815_20200220.patch
          15 kB
          Tsz-wo Sze
        5. r815_20200228.patch
          15 kB
          Tsz-wo Sze
        6. r815_20200302.patch
          15 kB
          Tsz-wo Sze

        Issue Links

          Activity

            People

              szetszwo Tsz-wo Sze
              adoroszlai Attila Doroszlai
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: