Uploaded image for project: 'Spot (Retired)'
  1. Spot (Retired)
  2. SPOT-149

[ML] obtaining parquet file for different modes

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Implemented
    • None
    • None

    Description

      Hi guys, i'm trying to run spot-ml alone, without running spot-ingest before. I have made a few steps to create the good parquet file for the "dns" mode but it seems i have a "next on empty iterator" error during the process in the LDA wrapper val ldaModel = lda.run(ldaCorpus) line 116. I dont know whats wrong with my datas. This is the structure and it seems to fullfill the requiered fields.
      -----------------------------------------------------------------------------------------------------------------+

      frame_time frame_len ip_dst ip_src dns_qry_name dns_qry_class dns_qry_type dns_qry_rcode dns_a unix_tstamp

      -----------------------------------------------------------------------------------------------------------------+

      Jul 8 2016 07:27... 318 172.16.0.197 10.0.3.190 4lam9a1ki27mb9p1h... Internet (IN) null null 53.46.53.51 1467955661
      Jul 8 2016 07:12... 314 172.16.0.184 10.0.4.43 4lam9a1ki27mb9p1h... Internet (IN) null null 53.46.50.51 1467954727
      Jul 8 2016 07:57... 314 172.16.0.172 10.0.4.73 4lam9a1ki27mb9p1h... Internet (IN) null null 50.55.46.50 1467957468
      Jul 8 2016 07:43... 322 172.16.0.196 10.0.3.149 4lam9a1ki27mb9p1h... Internet (IN) null null 49.52.46.49 1467956586
      Jul 8 2016 07:19... 318 172.16.0.183 10.0.3.243 4lam9a1ki27mb9p1h... Internet (IN) null null 49.52.46.49 1467955147
      Jul 8 2016 07:29... 318 172.16.0.183 10.0.3.203 4lam9a1ki27mb9p1h... Internet (IN) null null 53.46.49.57 1467955740
      Jul 8 2016 07:42... 318 172.16.0.168 10.0.4.23 4lam9a1ki27mb9p1h... Internet (IN) null null 49.52.46.49 1467956540

      -----------------------------------------------------------------------------------------------------------------+
      i have modified the ml_ops.sh like this :
      ~/spark/spark-1.6.0-bin-hadoop2.6/bin/spark-submit --class "org.apache.spot.SuspiciousConnects" \
      --master spark://127.0.1.1:7077 \
      target/scala-2.10/spot-ml-assembly-1.1.jar \
      --analysis "dns" \
      --input "file:///home/<user>/Desktop/spot-ml/sources/parquet/quatriemeTest" \
      --ldatopiccount 10 \
      --scored hdfs/result \
      --threshold 0.2 \
      --maxresults 50 \
      --ldamaxiterations 20 \

      I hope someone has a tip for me because i'm a bit blocked...

      You can find attached the datas in the .txt file and my ml_ops.sh file.
      the four last files are my parquet file, they are in a repertory called quatriemeTest but i couldnt upload it.

      Thank you for your help.

      Attachments

        1. part-00001-f6b5d565-2948-44ae-a1c7-e65b06f87a7e.snappy.parquet
          3 kB
          Ricardo Barona
        2. part-00000-f6b5d565-2948-44ae-a1c7-e65b06f87a7e.snappy.parquet
          3 kB
          Ricardo Barona
        3. part-r-00001-2961294b-fd9a-49f4-ad7b-9759dd158ec2.gz.parquet
          3 kB
          quentin
        4. part-r-00000-2961294b-fd9a-49f4-ad7b-9759dd158ec2.gz.parquet
          3 kB
          quentin
        5. _metadata
          4 kB
          quentin
        6. _common_metadata
          1.0 kB
          quentin
        7. quatriemeTest.txt
          1 kB
          quentin
        8. ml_opstmp.sh
          0.4 kB
          quentin

        Issue Links

          Activity

            People

              rabarona Ricardo Barona
              kizzp quentin
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: