Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-685

Wrong partition count in auto-created changelog streams for KV Store

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 0.9.0
    • None
    • container
    • None

    Description

      SAMZA-226 impletented an auto-create feature for the changelog streams of used KV stores. Unfortunally there is a bug, when the job is deployed on a yarn cluster with more then one task containers:

      The auto-created changelog Kafka topics for the KV stores have wrong partition counts. I deployed the job serveral times and only on low container counts the right amount of changelog partitions were created correctly. The job deployment fails also in case of wrong partition counts - see error log below.

      Below is the Kafka topic list. Task input is the topic "portal-events" with 16 partitions. Auto-created changelog topics "buffer-store-changelog" and "user-store-changelog" have wrong partition counts (14 and 12 - should be also 16):

      Topic:__samza_checkpoint_ver_1_for_portal-parser_1 PartitionCount:1 ReplicationFactor:3 Configs:segment.bytes=26214400,cleanup.policy=compact
      Topic: __samza_checkpoint_ver_1_for_portal-parser_1 Partition: 0 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1
      Topic:portal-events PartitionCount:16 ReplicationFactor:3 Configs:
      Topic: portal-events Partition: 0 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2
      Topic: portal-events Partition: 1 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3
      Topic: portal-events Partition: 2 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1
      Topic: portal-events Partition: 3 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1
      Topic: portal-events Partition: 4 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2
      Topic: portal-events Partition: 5 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3
      Topic: portal-events Partition: 6 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2
      Topic: portal-events Partition: 7 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3
      Topic: portal-events Partition: 8 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1
      Topic: portal-events Partition: 9 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1
      Topic: portal-events Partition: 10 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2
      Topic: portal-events Partition: 11 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3
      Topic: portal-events Partition: 12 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2
      Topic: portal-events Partition: 13 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3
      Topic: portal-events Partition: 14 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1
      Topic: portal-events Partition: 15 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1
      Topic:buffer-store-changelog PartitionCount:14 ReplicationFactor:3 Configs:segment.bytes=536870912,cleanup.policy=compact
      Topic: buffer-store-changelog Partition: 0 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2
      Topic: buffer-store-changelog Partition: 1 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3
      Topic: buffer-store-changelog Partition: 2 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1
      Topic: buffer-store-changelog Partition: 3 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3
      Topic: buffer-store-changelog Partition: 4 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1
      Topic: buffer-store-changelog Partition: 5 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2
      Topic: buffer-store-changelog Partition: 6 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2
      Topic: buffer-store-changelog Partition: 7 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3
      Topic: buffer-store-changelog Partition: 8 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1
      Topic: buffer-store-changelog Partition: 9 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3
      Topic: buffer-store-changelog Partition: 10 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1
      Topic: buffer-store-changelog Partition: 11 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2
      Topic: buffer-store-changelog Partition: 12 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2
      Topic: buffer-store-changelog Partition: 13 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3
      Topic:user-store-changelog PartitionCount:12 ReplicationFactor:3 Configs:segment.bytes=536870912,cleanup.policy=compact
      Topic: user-store-changelog Partition: 0 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2
      Topic: user-store-changelog Partition: 1 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3
      Topic: user-store-changelog Partition: 2 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1
      Topic: user-store-changelog Partition: 3 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1
      Topic: user-store-changelog Partition: 4 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2
      Topic: user-store-changelog Partition: 5 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3
      Topic: user-store-changelog Partition: 6 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2
      Topic: user-store-changelog Partition: 7 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3
      Topic: user-store-changelog Partition: 8 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1
      Topic: user-store-changelog Partition: 9 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1
      Topic: user-store-changelog Partition: 10 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2
      Topic: user-store-changelog Partition: 11 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3

      Lines of the Samza application master error log:

      2015-05-20 16:08:48 AbstractHttpConnection [DEBUG] closed AsyncHttpConnection@3e134a3f,g=HttpGenerator

      {s=0,h=-1,b=-1,c=-1}

      ,p=HttpParser

      {s=0,l=0,c=-3}

      ,r=1
      2015-05-20 16:08:49 JvmMetrics [DEBUG] updating jvm metrics
      2015-05-20 16:08:49 JvmMetrics [DEBUG] updated metrics to: [64.090485, 64.875, 247.20428, 723.5, 0, 21, 9, 5, 23, 0, 6, 176]
      2015-05-20 16:08:49 Client [DEBUG] IPC Client (828072249) connection to yrmsl1/10.100.86.204:8030 from yarn sending #24
      2015-05-20 16:08:49 Client [DEBUG] IPC Client (828072249) connection to yrmsl1/10.100.86.204:8030 from yarn got value #24
      2015-05-20 16:08:49 ProtobufRpcEngine [DEBUG] Call: allocate took 2ms
      2015-05-20 16:08:49 SamzaAppMasterTaskManager [INFO] Container container_1431707519455_0024_01_000002 failed with exit code 1 - Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException:
      org.apache.hadoop.util.Shell$ExitCodeException:
      at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
      at org.apache.hadoop.util.Shell.run(Shell.java:418)
      at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
      at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
      at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:745)

      Container exited with a non-zero exit code 1
      .
      2015-05-20 16:08:49 SamzaAppMasterTaskManager [INFO] Failed container container_1431707519455_0024_01_000002 owned task id 0.
      2015-05-20 16:08:49 SamzaAppMasterTaskManager [INFO] Current fail count for task id 0 is 1.
      2015-05-20 16:08:49 SamzaAppMasterTaskManager [INFO] Requesting 1 container(s) with 1024mb of memory

      Lines of the task container error log:

      java version "1.7.0_79"
      OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)
      OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)
      org.apache.samza.system.kafka.KafkaSystemAdmin$KafkaChangelogException: Changelog topic validation failed for topic buffer-store-changelog because partition count 14 did not match expected partition count of 16
      at org.apache.samza.system.kafka.KafkaSystemAdmin$$anonfun$validateTopicInKafka$2.apply(KafkaSystemAdmin.scala:327)
      at org.apache.samza.system.kafka.KafkaSystemAdmin$$anonfun$validateTopicInKafka$2.apply(KafkaSystemAdmin.scala:319)
      at org.apache.samza.util.ExponentialSleepStrategy.run(ExponentialSleepStrategy.scala:82)
      at org.apache.samza.system.kafka.KafkaSystemAdmin.validateTopicInKafka(KafkaSystemAdmin.scala:318)
      at org.apache.samza.system.kafka.KafkaSystemAdmin.createChangelogStream(KafkaSystemAdmin.scala:354)
      at org.apache.samza.storage.TaskStorageManager$$anonfun$createStreams$3.apply(TaskStorageManager.scala:86)
      at org.apache.samza.storage.TaskStorageManager$$anonfun$createStreams$3.apply(TaskStorageManager.scala:84)
      at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
      at scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
      at scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
      at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
      at scala.collection.immutable.Map$Map2.foreach(Map.scala:130)
      at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
      at scala.collection.MapLike$MappedValues.foreach(MapLike.scala:245)
      at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
      at org.apache.samza.storage.TaskStorageManager.createStreams(TaskStorageManager.scala:84)
      at org.apache.samza.storage.TaskStorageManager.init(TaskStorageManager.scala:63)
      at org.apache.samza.container.TaskInstance.startStores(TaskInstance.scala:90)
      at org.apache.samza.container.SamzaContainer$$anonfun$startStores$2.apply(SamzaContainer.scala:600)
      at org.apache.samza.container.SamzaContainer$$anonfun$startStores$2.apply(SamzaContainer.scala:600)
      at scala.collection.Iterator$class.foreach(Iterator.scala:727)
      at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
      at scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:206)
      at org.apache.samza.container.SamzaContainer.startStores(SamzaContainer.scala:600)
      at org.apache.samza.container.SamzaContainer.run(SamzaContainer.scala:543)
      at org.apache.samza.container.SamzaContainer$.safeMain(SamzaContainer.scala:93)
      at org.apache.samza.container.SamzaContainer$.main(SamzaContainer.scala:67)
      at org.apache.samza.container.SamzaContainer.main(SamzaContainer.scala)

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mikk Michael Strobl
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: