Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
0.9.0
-
None
-
None
Description
SAMZA-226 impletented an auto-create feature for the changelog streams of used KV stores. Unfortunally there is a bug, when the job is deployed on a yarn cluster with more then one task containers:
The auto-created changelog Kafka topics for the KV stores have wrong partition counts. I deployed the job serveral times and only on low container counts the right amount of changelog partitions were created correctly. The job deployment fails also in case of wrong partition counts - see error log below.
Below is the Kafka topic list. Task input is the topic "portal-events" with 16 partitions. Auto-created changelog topics "buffer-store-changelog" and "user-store-changelog" have wrong partition counts (14 and 12 - should be also 16):
Topic:__samza_checkpoint_ver_1_for_portal-parser_1 PartitionCount:1 ReplicationFactor:3 Configs:segment.bytes=26214400,cleanup.policy=compact
Topic: __samza_checkpoint_ver_1_for_portal-parser_1 Partition: 0 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1
Topic:portal-events PartitionCount:16 ReplicationFactor:3 Configs:
Topic: portal-events Partition: 0 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2
Topic: portal-events Partition: 1 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3
Topic: portal-events Partition: 2 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1
Topic: portal-events Partition: 3 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1
Topic: portal-events Partition: 4 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2
Topic: portal-events Partition: 5 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3
Topic: portal-events Partition: 6 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2
Topic: portal-events Partition: 7 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3
Topic: portal-events Partition: 8 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1
Topic: portal-events Partition: 9 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1
Topic: portal-events Partition: 10 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2
Topic: portal-events Partition: 11 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3
Topic: portal-events Partition: 12 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2
Topic: portal-events Partition: 13 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3
Topic: portal-events Partition: 14 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1
Topic: portal-events Partition: 15 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1
Topic:buffer-store-changelog PartitionCount:14 ReplicationFactor:3 Configs:segment.bytes=536870912,cleanup.policy=compact
Topic: buffer-store-changelog Partition: 0 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2
Topic: buffer-store-changelog Partition: 1 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3
Topic: buffer-store-changelog Partition: 2 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1
Topic: buffer-store-changelog Partition: 3 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3
Topic: buffer-store-changelog Partition: 4 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1
Topic: buffer-store-changelog Partition: 5 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2
Topic: buffer-store-changelog Partition: 6 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2
Topic: buffer-store-changelog Partition: 7 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3
Topic: buffer-store-changelog Partition: 8 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1
Topic: buffer-store-changelog Partition: 9 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3
Topic: buffer-store-changelog Partition: 10 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1
Topic: buffer-store-changelog Partition: 11 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2
Topic: buffer-store-changelog Partition: 12 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2
Topic: buffer-store-changelog Partition: 13 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3
Topic:user-store-changelog PartitionCount:12 ReplicationFactor:3 Configs:segment.bytes=536870912,cleanup.policy=compact
Topic: user-store-changelog Partition: 0 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2
Topic: user-store-changelog Partition: 1 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3
Topic: user-store-changelog Partition: 2 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1
Topic: user-store-changelog Partition: 3 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1
Topic: user-store-changelog Partition: 4 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2
Topic: user-store-changelog Partition: 5 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3
Topic: user-store-changelog Partition: 6 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2
Topic: user-store-changelog Partition: 7 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3
Topic: user-store-changelog Partition: 8 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1
Topic: user-store-changelog Partition: 9 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1
Topic: user-store-changelog Partition: 10 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2
Topic: user-store-changelog Partition: 11 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3
Lines of the Samza application master error log:
2015-05-20 16:08:48 AbstractHttpConnection [DEBUG] closed AsyncHttpConnection@3e134a3f,g=HttpGenerator
{s=0,h=-1,b=-1,c=-1},p=HttpParser
{s=0,l=0,c=-3},r=1
2015-05-20 16:08:49 JvmMetrics [DEBUG] updating jvm metrics
2015-05-20 16:08:49 JvmMetrics [DEBUG] updated metrics to: [64.090485, 64.875, 247.20428, 723.5, 0, 21, 9, 5, 23, 0, 6, 176]
2015-05-20 16:08:49 Client [DEBUG] IPC Client (828072249) connection to yrmsl1/10.100.86.204:8030 from yarn sending #24
2015-05-20 16:08:49 Client [DEBUG] IPC Client (828072249) connection to yrmsl1/10.100.86.204:8030 from yarn got value #24
2015-05-20 16:08:49 ProtobufRpcEngine [DEBUG] Call: allocate took 2ms
2015-05-20 16:08:49 SamzaAppMasterTaskManager [INFO] Container container_1431707519455_0024_01_000002 failed with exit code 1 - Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
.
2015-05-20 16:08:49 SamzaAppMasterTaskManager [INFO] Failed container container_1431707519455_0024_01_000002 owned task id 0.
2015-05-20 16:08:49 SamzaAppMasterTaskManager [INFO] Current fail count for task id 0 is 1.
2015-05-20 16:08:49 SamzaAppMasterTaskManager [INFO] Requesting 1 container(s) with 1024mb of memory
Lines of the task container error log:
java version "1.7.0_79"
OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)
OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)
org.apache.samza.system.kafka.KafkaSystemAdmin$KafkaChangelogException: Changelog topic validation failed for topic buffer-store-changelog because partition count 14 did not match expected partition count of 16
at org.apache.samza.system.kafka.KafkaSystemAdmin$$anonfun$validateTopicInKafka$2.apply(KafkaSystemAdmin.scala:327)
at org.apache.samza.system.kafka.KafkaSystemAdmin$$anonfun$validateTopicInKafka$2.apply(KafkaSystemAdmin.scala:319)
at org.apache.samza.util.ExponentialSleepStrategy.run(ExponentialSleepStrategy.scala:82)
at org.apache.samza.system.kafka.KafkaSystemAdmin.validateTopicInKafka(KafkaSystemAdmin.scala:318)
at org.apache.samza.system.kafka.KafkaSystemAdmin.createChangelogStream(KafkaSystemAdmin.scala:354)
at org.apache.samza.storage.TaskStorageManager$$anonfun$createStreams$3.apply(TaskStorageManager.scala:86)
at org.apache.samza.storage.TaskStorageManager$$anonfun$createStreams$3.apply(TaskStorageManager.scala:84)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at scala.collection.immutable.Map$Map2.foreach(Map.scala:130)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at scala.collection.MapLike$MappedValues.foreach(MapLike.scala:245)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at org.apache.samza.storage.TaskStorageManager.createStreams(TaskStorageManager.scala:84)
at org.apache.samza.storage.TaskStorageManager.init(TaskStorageManager.scala:63)
at org.apache.samza.container.TaskInstance.startStores(TaskInstance.scala:90)
at org.apache.samza.container.SamzaContainer$$anonfun$startStores$2.apply(SamzaContainer.scala:600)
at org.apache.samza.container.SamzaContainer$$anonfun$startStores$2.apply(SamzaContainer.scala:600)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:206)
at org.apache.samza.container.SamzaContainer.startStores(SamzaContainer.scala:600)
at org.apache.samza.container.SamzaContainer.run(SamzaContainer.scala:543)
at org.apache.samza.container.SamzaContainer$.safeMain(SamzaContainer.scala:93)
at org.apache.samza.container.SamzaContainer$.main(SamzaContainer.scala:67)
at org.apache.samza.container.SamzaContainer.main(SamzaContainer.scala)
Attachments
Issue Links
- is related to
-
SAMZA-662 Samza auto-creates changelog stream without sufficient partitions when container number > 1
- Resolved