Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Implemented
-
None
-
None
Description
Hi guys, i'm trying to run spot-ml alone, without running spot-ingest before. I have made a few steps to create the good parquet file for the "dns" mode but it seems i have a "next on empty iterator" error during the process in the LDA wrapper val ldaModel = lda.run(ldaCorpus) line 116. I dont know whats wrong with my datas. This is the structure and it seems to fullfill the requiered fields.
-----------------------------------------------------------------------------------------------------------------+
frame_time | frame_len | ip_dst | ip_src | dns_qry_name | dns_qry_class | dns_qry_type | dns_qry_rcode | dns_a | unix_tstamp |
-----------------------------------------------------------------------------------------------------------------+
Jul 8 2016 07:27... | 318 | 172.16.0.197 | 10.0.3.190 | 4lam9a1ki27mb9p1h... | Internet (IN) | null | null | 53.46.53.51 | 1467955661 |
Jul 8 2016 07:12... | 314 | 172.16.0.184 | 10.0.4.43 | 4lam9a1ki27mb9p1h... | Internet (IN) | null | null | 53.46.50.51 | 1467954727 |
Jul 8 2016 07:57... | 314 | 172.16.0.172 | 10.0.4.73 | 4lam9a1ki27mb9p1h... | Internet (IN) | null | null | 50.55.46.50 | 1467957468 |
Jul 8 2016 07:43... | 322 | 172.16.0.196 | 10.0.3.149 | 4lam9a1ki27mb9p1h... | Internet (IN) | null | null | 49.52.46.49 | 1467956586 |
Jul 8 2016 07:19... | 318 | 172.16.0.183 | 10.0.3.243 | 4lam9a1ki27mb9p1h... | Internet (IN) | null | null | 49.52.46.49 | 1467955147 |
Jul 8 2016 07:29... | 318 | 172.16.0.183 | 10.0.3.203 | 4lam9a1ki27mb9p1h... | Internet (IN) | null | null | 53.46.49.57 | 1467955740 |
Jul 8 2016 07:42... | 318 | 172.16.0.168 | 10.0.4.23 | 4lam9a1ki27mb9p1h... | Internet (IN) | null | null | 49.52.46.49 | 1467956540 |
-----------------------------------------------------------------------------------------------------------------+
i have modified the ml_ops.sh like this :
~/spark/spark-1.6.0-bin-hadoop2.6/bin/spark-submit --class "org.apache.spot.SuspiciousConnects" \
--master spark://127.0.1.1:7077 \
target/scala-2.10/spot-ml-assembly-1.1.jar \
--analysis "dns" \
--input "file:///home/<user>/Desktop/spot-ml/sources/parquet/quatriemeTest" \
--ldatopiccount 10 \
--scored hdfs/result \
--threshold 0.2 \
--maxresults 50 \
--ldamaxiterations 20 \
I hope someone has a tip for me because i'm a bit blocked...
You can find attached the datas in the .txt file and my ml_ops.sh file.
the four last files are my parquet file, they are in a repertory called quatriemeTest but i couldnt upload it.
Thank you for your help.