Details
-
Bug
-
Status: To Do
-
Major
-
Resolution: Unresolved
-
0.4.1
-
None
-
None
-
0.4.1.1.3.0.0
Description
There is a scenario were the Indexing topology can continue writing to HDFS while nothing is committing on the source Kafka topic. This leads to the same data being written twice (or even multiple times) to HDFS.
To reproduce:
-Start indexing topology (in my case the ES output is off by disabling all ES writer bolts
-deactivate it long enough for the Indexing offset on Kafka to become obsolete ( Indexing topo offset < Indexing topic start offset )
-Re Activate indexing topology
Under these circumstances writing to HDFS will continue while the work is never committed on Kafka:
HDFS CHECKS (still increasing):
~$ hdfs dfs -du /apps/metron/indexing/indexed/2017-10-03
86007422868 /apps/metron/indexing/indexed/2017-10-03/checkpoint_lea
27604282769 /apps/metron/indexing/indexed/2017-10-03/suricata
~$ hdfs dfs -du /apps/metron/indexing/indexed/2017-10-03
86093180848 /apps/metron/indexing/indexed/2017-10-03/checkpoint_lea
27604282769 /apps/metron/indexing/indexed/2017-10-03/suricata
~$ hdfs dfs -du /apps/metron/indexing/indexed/2017-10-03
86192718462 /apps/metron/indexing/indexed/2017-10-03/checkpoint_lea
27604282769 /apps/metron/indexing/indexed/2017-10-03/suricata
KAFKA TOPICS OFFSET CHECKS AT THE SAME TIME:
~$ sudo /usr/hdp/2.5.3.0-37/kafka/bin/kafka-consumer-offset-checker.sh --zookeeper $ZOOK --security-protocol SASL_PLAINTEXT --topic indexing --group indexing
2017-10-03 11:19:00
Group Topic Pid Offset logSize Lag Owner
indexing indexing 0 109267273 162048745 52781472 none
indexing indexing 1 109267278 162048799 52781521 none
indexing indexing 2 109267270 162048823 52781553 none
indexing indexing 3 109267276 162048844 52781568 none
indexing indexing 4 109267271 162048852 52781581 none
indexing indexing 5 109267267 162048860 52781593 none
indexing indexing 6 109267277 162048881 52781604 none
indexing indexing 7 109267275 162048890 52781615 none
indexing indexing 8 109267272 162048903 52781631 none
indexing indexing 9 109267277 162048914 52781637 none
indexing indexing 10 109267277 162048921 52781644 none
indexing indexing 11 109267271 162048924 52781653 none
2017-10-03 11:20:06
Group Topic Pid Offset logSize Lag Owner
indexing indexing 0 109267273 162063031 52795758 none
indexing indexing 1 109267278 162063060 52795782 none
indexing indexing 2 109267270 162063106 52795836 none
indexing indexing 3 109267276 162063115 52795839 none
indexing indexing 4 109267271 162063136 52795865 none
indexing indexing 5 109267267 162063132 52795865 none
indexing indexing 6 109267277 162063143 52795866 none
indexing indexing 7 109267275 162063138 52795863 none
indexing indexing 8 109267272 162063139 52795867 none
indexing indexing 9 109267277 162063143 52795866 none
indexing indexing 10 109267277 162063142 52795865 none
indexing indexing 11 109267271 162063139 52795868 none
2017-10-03 11:21:39
Group Topic Pid Offset logSize Lag Owner
indexing indexing 0 109267273 162079436 52812163 none
indexing indexing 1 109267278 162079442 52812164 none
indexing indexing 2 109267270 162079434 52812164 none
indexing indexing 3 109267276 162079437 52812161 none
indexing indexing 4 109267271 162079434 52812163 none
indexing indexing 5 109267267 162079431 52812164 none
indexing indexing 6 109267277 162079437 52812160 none
indexing indexing 7 109267275 162079437 52812162 none
indexing indexing 8 109267272 162079435 52812163 none
indexing indexing 9 109267277 162079441 52812164 none
indexing indexing 10 109267277 162079441 52812164 none
indexing indexing 11 109267271 162079434 52812163 none
WORKER LOG AT THE SAME TIME (no sign of any problems here):
2017-10-03 10:48:15.647 o.a.m.w.h.SourceHandler [INFO] File rotation took 11 ms
2017-10-03 10:48:16.991 o.a.m.w.h.SourceHandler [INFO] Rotating output file...
2017-10-03 10:48:17.000 o.a.m.w.h.SourceHandler [INFO] Performing 0 file rotation actions.
2017-10-03 10:48:17.000 o.a.m.w.h.SourceHandler [INFO] File rotation took 9 ms
2017-10-03 10:49:00.150 o.a.m.w.h.SourceHandler [INFO] Rotating output file...
2017-10-03 10:49:00.160 o.a.m.w.h.SourceHandler [INFO] Performing 0 file rotation actions.
2017-10-03 10:49:00.160 o.a.m.w.h.SourceHandler [INFO] File rotation took 10 ms
2017-10-03 10:49:03.037 o.a.m.w.h.SourceHandler [INFO] Rotating output file...
2017-10-03 10:49:03.045 o.a.m.w.h.SourceHandler [INFO] Performing 0 file rotation actions.
2017-10-03 10:49:03.046 o.a.m.w.h.SourceHandler [INFO] File rotation took 9 ms
2017-10-03 10:58:14.541 o.a.m.w.h.SourceHandler [INFO] Rotating output file...
2017-10-03 10:58:14.551 o.a.m.w.h.SourceHandler [INFO] Performing 0 file rotation actions.
2017-10-03 10:58:14.551 o.a.m.w.h.SourceHandler [INFO] File rotation took 9 ms
2017-10-03 10:58:18.736 o.a.m.w.h.SourceHandler [INFO] Rotating output file...
2017-10-03 10:58:18.746 o.a.m.w.h.SourceHandler [INFO] Performing 0 file rotation actions.
2017-10-03 10:58:18.746 o.a.m.w.h.SourceHandler [INFO] File rotation took 10 ms
2017-10-03 10:58:53.615 o.a.m.w.h.SourceHandler [INFO] Rotating output file...
2017-10-03 10:58:53.626 o.a.m.w.h.SourceHandler [INFO] Performing 0 file rotation actions.
2017-10-03 10:58:53.626 o.a.m.w.h.SourceHandler [INFO] File rotation took 11 ms
2017-10-03 10:59:03.560 o.a.m.w.h.SourceHandler [INFO] Rotating output file...
2017-10-03 10:59:03.569 o.a.m.w.h.SourceHandler [INFO] Performing 0 file rotation actions.
2017-10-03 10:59:03.569 o.a.m.w.h.SourceHandler [INFO] File rotation took 9 ms
2017-10-03 11:09:10.687 o.a.m.w.h.SourceHandler [INFO] Rotating output file...
2017-10-03 11:09:10.697 o.a.m.w.h.SourceHandler [INFO] Performing 0 file rotation actions.
2017-10-03 11:09:10.697 o.a.m.w.h.SourceHandler [INFO] File rotation took 10 ms
2017-10-03 11:09:15.455 o.a.m.w.h.SourceHandler [INFO] Rotating output file...
2017-10-03 11:09:15.464 o.a.m.w.h.SourceHandler [INFO] Performing 0 file rotation actions.
2017-10-03 11:09:15.464 o.a.m.w.h.SourceHandler [INFO] File rotation took 9 ms
2017-10-03 11:10:20.685 o.a.m.w.h.SourceHandler [INFO] Rotating output file...
2017-10-03 11:10:20.696 o.a.m.w.h.SourceHandler [INFO] Performing 0 file rotation actions.
2017-10-03 11:10:20.696 o.a.m.w.h.SourceHandler [INFO] File rotation took 11 ms
2017-10-03 11:10:32.027 o.a.m.w.h.SourceHandler [INFO] Rotating output file...
2017-10-03 11:10:32.037 o.a.m.w.h.SourceHandler [INFO] Performing 0 file rotation actions.
2017-10-03 11:10:32.037 o.a.m.w.h.SourceHandler [INFO] File rotation took 10 ms
I was quite suprised to NOT encounter any log messages like these (taken from other case, other scenario): that usually indicate that topology work cannot be committed:
2017-08-25 14:08:05.578 o.a.s.k.s.i.OffsetManager [WARN] topic-partition [indexing-1] has unexpected offset [28352483]. Current committed Offset [348312803]
2017-08-25 14:08:05.578 o.a.s.k.s.i.OffsetManager [WARN] topic-partition [indexing-11] has unexpected offset [28271502]. Current committed Offset [348493662]
2017-08-25 14:08:05.578 o.a.s.k.s.i.OffsetManager [WARN] topic-partition [indexing-6] has unexpected offset [28301556]. Current committed Offset [312766169]
Also, on Storms Web UI there are no error whatsoever, throughput is just increasing.