Kafka
  1. Kafka
  2. KAFKA-107

Bug in serialize and collate logic in the DefaultEventHandler

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.7
    • Fix Version/s: 0.7
    • Component/s: None
    • Labels:
      None

      Description

      There is a bug in the serialize and collate in the DefaultEventHandler, that uses the map() API on a hashmap to convert a sequence of messages to a ByteBufferMessageSet, based on the compression configs. The usage of the zip() API after the map() API on a hashmap, has the side effect of reordering the mapping between the keys and the values.

      1. KAFKA-107.patch
        6 kB
        Neha Narkhede
      2. KAFKA-107-unit-test.patch
        2 kB
        Neha Narkhede

        Activity

        Jun Rao made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Fix Version/s 0.7 [ 12317243 ]
        Resolution Fixed [ 1 ]
        Hide
        Jun Rao added a comment -

        +1

        Show
        Jun Rao added a comment - +1
        Neha Narkhede made changes -
        Attachment KAFKA-107-unit-test.patch [ 12490842 ]
        Hide
        Neha Narkhede added a comment -

        Adding a unit test to the test suite for AsyncProducer that catches the reordering bug in serialize() API of DefaultEventHandler

        Show
        Neha Narkhede added a comment - Adding a unit test to the test suite for AsyncProducer that catches the reordering bug in serialize() API of DefaultEventHandler
        Hide
        Neha Narkhede added a comment -

        Jun, the code is not zipping a map with a list. That won't even compile. The code is just zipping a sequence of tuples with a sequence of ByteBufferMessageSet. The problem is that one sequence is generated using the map API on the original HashMap and another sequence is generated using the map API on another HashMap. So ordering of data across those sequences is not guaranteed. Hence, a zip on those sequences will not associate the keys with the correct values.

        I will add a unit test to cover this, and remove the unreferenced package imports.

        Show
        Neha Narkhede added a comment - Jun, the code is not zipping a map with a list. That won't even compile. The code is just zipping a sequence of tuples with a sequence of ByteBufferMessageSet. The problem is that one sequence is generated using the map API on the original HashMap and another sequence is generated using the map API on another HashMap. So ordering of data across those sequences is not guaranteed. Hence, a zip on those sequences will not associate the keys with the correct values. I will add a unit test to cover this, and remove the unreferenced package imports.
        Hide
        Jun Rao added a comment -

        Another thing, please remove unreferenced package imports.

        Show
        Jun Rao added a comment - Another thing, please remove unreferenced package imports.
        Hide
        Jun Rao added a comment -

        Ok, so the problem was that the code was trying to zip a map (topicsAndPartitions) and a list (messages). Even though they are derived from the same map, there is no guarantee that the map is going to be iterated in the same order as the list. So, in the future, we need to be careful about using zip. We need to make sure that the two collections to be zipped have well defined ordering (e.g., list) for iteration and the ordering is what we want. Could you double check other usage of zip introduced in the compression patch?

        The patch looks fine. However, could you also add a unit test (probably just at the serialize() level) that exposes this problem?

        Show
        Jun Rao added a comment - Ok, so the problem was that the code was trying to zip a map (topicsAndPartitions) and a list (messages). Even though they are derived from the same map, there is no guarantee that the map is going to be iterated in the same order as the list. So, in the future, we need to be careful about using zip. We need to make sure that the two collections to be zipped have well defined ordering (e.g., list) for iteration and the ordering is what we want. Could you double check other usage of zip introduced in the compression patch? The patch looks fine. However, could you also add a unit test (probably just at the serialize() level) that exposes this problem?
        Neha Narkhede made changes -
        Attachment KAFKA-107.patch [ 12490597 ]
        Neha Narkhede made changes -
        Field Original Value New Value
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Neha Narkhede added a comment -

        This patch refactors the code in the DefaultEventHandler serialize API to fix the reordering bug

        Show
        Neha Narkhede added a comment - This patch refactors the code in the DefaultEventHandler serialize API to fix the reordering bug
        Neha Narkhede created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Neha Narkhede
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development