Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.9.0
    • Fix Version/s: 0.9.1
    • Component/s: None
    • Labels:
      None

      Description

      To reproduce:

      Create a job that reads from two topics with equal number of partitions > 1. Configure the job to bootstrap from one of the topics and set yarn.container.count > 1.

      Observed outcome:

      You'll see that the container completes bootstrapping for the partitions assigned to it's tasks but never finishes bootstrapping for the remaining partitions that are assigned to other containers.

      Expected outcome:

      When each task as finished bootstrapping the partitions it's responsible for, bootstrapping should be considered complete.

      Debug logs available here: https://mail-archives.apache.org/mod_mbox/samza-dev/201506.mbox/%3CCAPOm%3DTN88YSbn_pzpb-%2BwGxc619LsE%3D-XAfaydBKmLrTLOf4QA%40mail.gmail.com%3E

      1. SAMZA-720.1.patch
        3 kB
        Yan Fang
      2. SAMZA-720.patch
        2 kB
        Yan Fang

        Activity

        Hide
        theduderog Roger Hoover added a comment -

        Was looking through the code a little and it looks like the BootstrappingChooser could use the list of SSPs passed into it's register() method to figure out which partitions it need to monitor.

        Show
        theduderog Roger Hoover added a comment - Was looking through the code a little and it looks like the BootstrappingChooser could use the list of SSPs passed into it's register() method to figure out which partitions it need to monitor.
        Hide
        closeuris Yan Fang added a comment -

        Roger Hoover, you are right. The bootstrapping list contains all the partitions of the bootstrapping stream, while the task will only update the partition assigned to it. Other partitions remain "unbootstrapped". That's why the bootstrap hangs. You can try the patch. It should fix this problem.

        RB: https://reviews.apache.org/r/35723/

        Thanks.

        Show
        closeuris Yan Fang added a comment - Roger Hoover , you are right. The bootstrapping list contains all the partitions of the bootstrapping stream, while the task will only update the partition assigned to it. Other partitions remain "unbootstrapped". That's why the bootstrap hangs. You can try the patch. It should fix this problem. RB: https://reviews.apache.org/r/35723/ Thanks.
        Hide
        jghoman Jakob Homan added a comment -

        'Twould be nice to have a test for the fix. We should check that the bootstrap ends up with all the partitions it needs.

        Show
        jghoman Jakob Homan added a comment - 'Twould be nice to have a test for the fix. We should check that the bootstrap ends up with all the partitions it needs.
        Hide
        closeuris Yan Fang added a comment -

        Added a unit test

        Show
        closeuris Yan Fang added a comment - Added a unit test
        Hide
        closeuris Yan Fang added a comment -

        Yi Pan (Data Infrastructure), if you are working the the release, feel free to commit when you think the patch is ready. In case I do not have time to commit it. Thanks.

        Show
        closeuris Yan Fang added a comment - Yi Pan (Data Infrastructure) , if you are working the the release, feel free to commit when you think the patch is ready. In case I do not have time to commit it. Thanks.
        Hide
        nickpan47 Yi Pan (Data Infrastructure) added a comment -

        Merged and backported to both master and 0.9.1. Thanks, Yan Fang and Roger Hoover!

        Show
        nickpan47 Yi Pan (Data Infrastructure) added a comment - Merged and backported to both master and 0.9.1. Thanks, Yan Fang and Roger Hoover !

          People

          • Assignee:
            closeuris Yan Fang
            Reporter:
            theduderog Roger Hoover
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development