Uploaded image for project: 'Metron (Retired)'
  1. Metron (Retired)
  2. METRON-1225

Indexing still writing to HDFS but not committing on Kafka

    XMLWordPrintableJSON

Details

    • Bug
    • Status: To Do
    • Major
    • Resolution: Unresolved
    • 0.4.1
    • None
    • None
    • 0.4.1.1.3.0.0

    Description

      There is a scenario were the Indexing topology can continue writing to HDFS while nothing is committing on the source Kafka topic. This leads to the same data being written twice (or even multiple times) to HDFS.

      To reproduce:
      -Start indexing topology (in my case the ES output is off by disabling all ES writer bolts
      -deactivate it long enough for the Indexing offset on Kafka to become obsolete ( Indexing topo offset < Indexing topic start offset )
      -Re Activate indexing topology

      Under these circumstances writing to HDFS will continue while the work is never committed on Kafka:

      HDFS CHECKS (still increasing):

      ~$ hdfs dfs -du /apps/metron/indexing/indexed/2017-10-03
      86007422868 /apps/metron/indexing/indexed/2017-10-03/checkpoint_lea
      27604282769 /apps/metron/indexing/indexed/2017-10-03/suricata

      ~$ hdfs dfs -du /apps/metron/indexing/indexed/2017-10-03
      86093180848 /apps/metron/indexing/indexed/2017-10-03/checkpoint_lea
      27604282769 /apps/metron/indexing/indexed/2017-10-03/suricata

      ~$ hdfs dfs -du /apps/metron/indexing/indexed/2017-10-03
      86192718462 /apps/metron/indexing/indexed/2017-10-03/checkpoint_lea
      27604282769 /apps/metron/indexing/indexed/2017-10-03/suricata

      KAFKA TOPICS OFFSET CHECKS AT THE SAME TIME:

      ~$ sudo /usr/hdp/2.5.3.0-37/kafka/bin/kafka-consumer-offset-checker.sh --zookeeper $ZOOK --security-protocol SASL_PLAINTEXT --topic indexing --group indexing

      2017-10-03 11:19:00
      Group Topic Pid Offset logSize Lag Owner
      indexing indexing 0 109267273 162048745 52781472 none
      indexing indexing 1 109267278 162048799 52781521 none
      indexing indexing 2 109267270 162048823 52781553 none
      indexing indexing 3 109267276 162048844 52781568 none
      indexing indexing 4 109267271 162048852 52781581 none
      indexing indexing 5 109267267 162048860 52781593 none
      indexing indexing 6 109267277 162048881 52781604 none
      indexing indexing 7 109267275 162048890 52781615 none
      indexing indexing 8 109267272 162048903 52781631 none
      indexing indexing 9 109267277 162048914 52781637 none
      indexing indexing 10 109267277 162048921 52781644 none
      indexing indexing 11 109267271 162048924 52781653 none

      2017-10-03 11:20:06
      Group Topic Pid Offset logSize Lag Owner
      indexing indexing 0 109267273 162063031 52795758 none
      indexing indexing 1 109267278 162063060 52795782 none
      indexing indexing 2 109267270 162063106 52795836 none
      indexing indexing 3 109267276 162063115 52795839 none
      indexing indexing 4 109267271 162063136 52795865 none
      indexing indexing 5 109267267 162063132 52795865 none
      indexing indexing 6 109267277 162063143 52795866 none
      indexing indexing 7 109267275 162063138 52795863 none
      indexing indexing 8 109267272 162063139 52795867 none
      indexing indexing 9 109267277 162063143 52795866 none
      indexing indexing 10 109267277 162063142 52795865 none
      indexing indexing 11 109267271 162063139 52795868 none

      2017-10-03 11:21:39
      Group Topic Pid Offset logSize Lag Owner
      indexing indexing 0 109267273 162079436 52812163 none
      indexing indexing 1 109267278 162079442 52812164 none
      indexing indexing 2 109267270 162079434 52812164 none
      indexing indexing 3 109267276 162079437 52812161 none
      indexing indexing 4 109267271 162079434 52812163 none
      indexing indexing 5 109267267 162079431 52812164 none
      indexing indexing 6 109267277 162079437 52812160 none
      indexing indexing 7 109267275 162079437 52812162 none
      indexing indexing 8 109267272 162079435 52812163 none
      indexing indexing 9 109267277 162079441 52812164 none
      indexing indexing 10 109267277 162079441 52812164 none
      indexing indexing 11 109267271 162079434 52812163 none

      WORKER LOG AT THE SAME TIME (no sign of any problems here):

      2017-10-03 10:48:15.647 o.a.m.w.h.SourceHandler [INFO] File rotation took 11 ms
      2017-10-03 10:48:16.991 o.a.m.w.h.SourceHandler [INFO] Rotating output file...
      2017-10-03 10:48:17.000 o.a.m.w.h.SourceHandler [INFO] Performing 0 file rotation actions.
      2017-10-03 10:48:17.000 o.a.m.w.h.SourceHandler [INFO] File rotation took 9 ms
      2017-10-03 10:49:00.150 o.a.m.w.h.SourceHandler [INFO] Rotating output file...
      2017-10-03 10:49:00.160 o.a.m.w.h.SourceHandler [INFO] Performing 0 file rotation actions.
      2017-10-03 10:49:00.160 o.a.m.w.h.SourceHandler [INFO] File rotation took 10 ms
      2017-10-03 10:49:03.037 o.a.m.w.h.SourceHandler [INFO] Rotating output file...
      2017-10-03 10:49:03.045 o.a.m.w.h.SourceHandler [INFO] Performing 0 file rotation actions.
      2017-10-03 10:49:03.046 o.a.m.w.h.SourceHandler [INFO] File rotation took 9 ms
      2017-10-03 10:58:14.541 o.a.m.w.h.SourceHandler [INFO] Rotating output file...
      2017-10-03 10:58:14.551 o.a.m.w.h.SourceHandler [INFO] Performing 0 file rotation actions.
      2017-10-03 10:58:14.551 o.a.m.w.h.SourceHandler [INFO] File rotation took 9 ms
      2017-10-03 10:58:18.736 o.a.m.w.h.SourceHandler [INFO] Rotating output file...
      2017-10-03 10:58:18.746 o.a.m.w.h.SourceHandler [INFO] Performing 0 file rotation actions.
      2017-10-03 10:58:18.746 o.a.m.w.h.SourceHandler [INFO] File rotation took 10 ms
      2017-10-03 10:58:53.615 o.a.m.w.h.SourceHandler [INFO] Rotating output file...
      2017-10-03 10:58:53.626 o.a.m.w.h.SourceHandler [INFO] Performing 0 file rotation actions.
      2017-10-03 10:58:53.626 o.a.m.w.h.SourceHandler [INFO] File rotation took 11 ms
      2017-10-03 10:59:03.560 o.a.m.w.h.SourceHandler [INFO] Rotating output file...
      2017-10-03 10:59:03.569 o.a.m.w.h.SourceHandler [INFO] Performing 0 file rotation actions.
      2017-10-03 10:59:03.569 o.a.m.w.h.SourceHandler [INFO] File rotation took 9 ms
      2017-10-03 11:09:10.687 o.a.m.w.h.SourceHandler [INFO] Rotating output file...
      2017-10-03 11:09:10.697 o.a.m.w.h.SourceHandler [INFO] Performing 0 file rotation actions.
      2017-10-03 11:09:10.697 o.a.m.w.h.SourceHandler [INFO] File rotation took 10 ms
      2017-10-03 11:09:15.455 o.a.m.w.h.SourceHandler [INFO] Rotating output file...
      2017-10-03 11:09:15.464 o.a.m.w.h.SourceHandler [INFO] Performing 0 file rotation actions.
      2017-10-03 11:09:15.464 o.a.m.w.h.SourceHandler [INFO] File rotation took 9 ms
      2017-10-03 11:10:20.685 o.a.m.w.h.SourceHandler [INFO] Rotating output file...
      2017-10-03 11:10:20.696 o.a.m.w.h.SourceHandler [INFO] Performing 0 file rotation actions.
      2017-10-03 11:10:20.696 o.a.m.w.h.SourceHandler [INFO] File rotation took 11 ms
      2017-10-03 11:10:32.027 o.a.m.w.h.SourceHandler [INFO] Rotating output file...
      2017-10-03 11:10:32.037 o.a.m.w.h.SourceHandler [INFO] Performing 0 file rotation actions.
      2017-10-03 11:10:32.037 o.a.m.w.h.SourceHandler [INFO] File rotation took 10 ms

      I was quite suprised to NOT encounter any log messages like these (taken from other case, other scenario): that usually indicate that topology work cannot be committed:

      2017-08-25 14:08:05.578 o.a.s.k.s.i.OffsetManager [WARN] topic-partition [indexing-1] has unexpected offset [28352483]. Current committed Offset [348312803]
      2017-08-25 14:08:05.578 o.a.s.k.s.i.OffsetManager [WARN] topic-partition [indexing-11] has unexpected offset [28271502]. Current committed Offset [348493662]
      2017-08-25 14:08:05.578 o.a.s.k.s.i.OffsetManager [WARN] topic-partition [indexing-6] has unexpected offset [28301556]. Current committed Offset [312766169]

      Also, on Storms Web UI there are no error whatsoever, throughput is just increasing.

      Attachments

        Activity

          People

            Unassigned Unassigned
            jasperknulst Jasper Knulst
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: