Kafka
  1. Kafka
  2. KAFKA-577

extend DumpLogSegments to verify consistency btw data and index

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.8.0
    • Component/s: core
    • Labels:

      Description

      It would be good to extend DumpLogSegments to do the following verification:
      1. The offsets stored in the index match those in the log data.
      2. The offsets in the data log is consecutive.

      1. kafka_577_v6.diff
        7 kB
        Yang Ye
      2. kafka_577_v5.diff
        7 kB
        Yang Ye
      3. kafka_577_v4.diff
        6 kB
        Yang Ye
      4. kafka_577_v3.diff
        4 kB
        Yang Ye
      5. kafka_577_v2.diff
        4 kB
        Yang Ye
      6. kafka_577_v1.diff
        2 kB
        Yang Ye

        Activity

        Hide
        Jun Rao added a comment -

        Thanks for the patch. Some comments:

        1. Could you add a --verifyOnly option so that we only do the verification but not print out the content?
        2. The text in the following statement is too verbose. It doesn't need to print the index file since it's provided in the command line. It can just say "Index position %d doesn't match log position at offset %d".
        System.err.println(("The offset in index file [%s] does not match the offset stored in " +
        "log file [%s], they're %d and %d separately").format(entry.offset + index.baseOffset, messageAndOffset.offset))
        3. Shouldn't we use %d instead of %l in the following line?
        System.err.println("The offset in the data log file [%s] is not consecutive, [%l] follows [%l]".format(file.getName, messageAndOffset.offset, lastOffset))
        4. Could you add a shell script for DumpLogSegments in bin/ ?

        Show
        Jun Rao added a comment - Thanks for the patch. Some comments: 1. Could you add a --verifyOnly option so that we only do the verification but not print out the content? 2. The text in the following statement is too verbose. It doesn't need to print the index file since it's provided in the command line. It can just say "Index position %d doesn't match log position at offset %d". System.err.println(("The offset in index file [%s] does not match the offset stored in " + "log file [%s] , they're %d and %d separately").format(entry.offset + index.baseOffset, messageAndOffset.offset)) 3. Shouldn't we use %d instead of %l in the following line? System.err.println("The offset in the data log file [%s] is not consecutive, [%l] follows [%l] ".format(file.getName, messageAndOffset.offset, lastOffset)) 4. Could you add a shell script for DumpLogSegments in bin/ ?
        Hide
        Yang Ye added a comment -


        changed according to the comments

        Show
        Yang Ye added a comment - changed according to the comments
        Hide
        Jun Rao added a comment -

        Thanks for patch v2. Instead of exiting on first verification failure, it would be better if we dump the whole log and report all failed verification at the very end.

        Show
        Jun Rao added a comment - Thanks for patch v2. Instead of exiting on first verification failure, it would be better if we dump the whole log and report all failed verification at the very end.
        Hide
        Yang Ye added a comment -


        cache all verification failure and print out at the end, not verified by real data yet because cannot find proper data for that...., just check the logic of the code..

        Show
        Yang Ye added a comment - cache all verification failure and print out at the end, not verified by real data yet because cannot find proper data for that...., just check the logic of the code..
        Hide
        Yang Ye added a comment -

        1. also "do it in batch" for dumpLog() function

        2. use scala HashMap instead of Pool

        3. Use LinkedList in the right way

        Show
        Yang Ye added a comment - 1. also "do it in batch" for dumpLog() function 2. use scala HashMap instead of Pool 3. Use LinkedList in the right way
        Hide
        Jun Rao added a comment -

        A couple of more comments for patch v4.

        40. LinkedList is supposed to be used for manipulating links directly. We should just use List (immutable) instead. Also, instead of the following,

        misMatchesSeq = misMatchesSeq.:+((entry.offset + index.baseOffset, messageAndOffset.offset).asInstanceOf[(Int, Int)])

        we should write

        misMatchesSeq :+= ((entry.offset + index.baseOffset, messageAndOffset.offset).asInstanceOf[(Int, Int)])

        Show
        Jun Rao added a comment - A couple of more comments for patch v4. 40. LinkedList is supposed to be used for manipulating links directly. We should just use List (immutable) instead. Also, instead of the following, misMatchesSeq = misMatchesSeq.:+((entry.offset + index.baseOffset, messageAndOffset.offset).asInstanceOf [(Int, Int)] ) we should write misMatchesSeq :+= ((entry.offset + index.baseOffset, messageAndOffset.offset).asInstanceOf [(Int, Int)] )
        Hide
        Yang Ye added a comment -


        Thanks for the comments, and the LinkedList is replaced with List now

        Show
        Yang Ye added a comment - Thanks for the comments, and the LinkedList is replaced with List now
        Hide
        Neha Narkhede added a comment -

        Sorry for coming to this a little late. Patch v5 looks good, just one suggestion - Looks like DumpLogSegments is the only tool that doesn't quite follow the convention for argument parsing. It will be good to standardize on jopt parser since it also helps with a description for each option, a help command and convenient required argument checks. Just my 2c.

        Show
        Neha Narkhede added a comment - Sorry for coming to this a little late. Patch v5 looks good, just one suggestion - Looks like DumpLogSegments is the only tool that doesn't quite follow the convention for argument parsing. It will be good to standardize on jopt parser since it also helps with a description for each option, a help command and convenient required argument checks. Just my 2c.
        Hide
        Yang Ye added a comment -

        @Neha sure, we can do that

        Show
        Yang Ye added a comment - @Neha sure, we can do that
        Hide
        Yang Ye added a comment -


        Using jOptSimple for option parsing

        Show
        Yang Ye added a comment - Using jOptSimple for option parsing
        Hide
        Jun Rao added a comment -

        Thanks for patch v6. 1. Committed to 0.8 with a minor change: using ::= on a list instead of .:=

        Show
        Jun Rao added a comment - Thanks for patch v6. 1. Committed to 0.8 with a minor change: using ::= on a list instead of .: =

          People

          • Assignee:
            Unassigned
            Reporter:
            Jun Rao
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 24h
              24h
              Remaining:
              Remaining Estimate - 24h
              24h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development