Details
-
Test
-
Status: Closed
-
Major
-
Resolution: Information Provided
-
None
-
None
Description
Hello,
I'm trying spot-ml with a large pcap capture. The capture is from
https://mcfp.felk.cvut.cz/publicDatasets/CTU-Mixed-Capture-1/
After ingesting the flows and dns in Apache Spot, the results are always empty. This is how I ran spot-ml:
./ml_ops.sh 20170601 flow 1e-20 400
17/06/01 15:15:00 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /user/spot/flow/scored_results/20170601/scores
17/06/01 15:15:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/06/01 15:15:03 INFO Remoting: Starting remoting
17/06/01 15:15:03 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@172.18.0.4:41981]
17/06/01 15:15:13 WARN spark.SparkContext: Dynamic Allocation and num executors both set, thus dynamic allocation disabled.
17/06/01 15:15:19 INFO SuspiciousConnectsAnalysis: Loading data from: /user/spot/flow/hive/y=2017/m=06/d=01/
17/06/01 15:15:29 INFO SuspiciousConnectsAnalysis: Starting flow suspicious connects analysis.
17/06/01 15:15:30 INFO SuspiciousConnectsAnalysis: Fitting probabilistic model to data
17/06/01 15:15:30 INFO SuspiciousConnectsAnalysis: Training netflow suspicious connects model from /user/spot/flow/hive/y=2017/m=06/d=01/
17/06/01 15:15:32 INFO SuspiciousConnectsAnalysis: 0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
17/06/01 15:15:32 INFO SuspiciousConnectsAnalysis: calculating byte cuts ...
17/06/01 15:15:35 INFO SuspiciousConnectsAnalysis: 93.0,146.0,204.0,288.0,313.0,630.0,905.0,1581.0,2840.0,9.5489883E7
17/06/01 15:15:35 INFO SuspiciousConnectsAnalysis: calculating pkt cuts
17/06/01 15:15:36 INFO SuspiciousConnectsAnalysis: 2.0,4.0,6.0,9.0,164961.0
17/06/01 15:15:43 INFO SuspiciousConnectsAnalysis: Running Spark LDA with params alpha = 1.02 beta = 1.001 Max iterations = 20 Optimizer = em
[Stage 61:==================================================> (187 + 3) / 200]17/06/01 15:16:20 WARN netlib.BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
17/06/01 15:16:20 WARN netlib.BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
17/06/01 15:23:15 INFO SuspiciousConnectsAnalysis: Identifying outliers
17/06/01 15:23:15 INFO SuspiciousConnectsAnalysis: Netflow suspicious connects analysis completed.
17/06/01 15:23:15 INFO SuspiciousConnectsAnalysis: Saving results to : /user/spot/flow/scored_results/20170601/scores
17/06/01 15:23:55 WARN SuspiciousConnectsAnalysis: Saving invalid records to /user/spot/flow/scored_results/20170601/scores/invalid_records
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
17/06/01 15:23:57 WARN SuspiciousConnectsAnalysis: Total records discarded due to NULL values in key fields: 12 . Please go to /user/spot/flow/scored_results/20170601/scores/invalid_records for more details.
real 8m56.911s
user 1m41.616s
sys 0m11.856s
root@hadoop-master:~/incubator-spot/spot-ml# hadoop fs -ls /user/spot/flow/scored_results/20170601/scores
Found 3 items
rw-rr- 2 root supergroup 0 2017-06-01 15:23 /user/spot/flow/scored_results/20170601/scores/_SUCCESS
rw-rr- 2 root supergroup 0 2017-06-01 15:23 /user/spot/flow/scored_results/20170601/scores/flow_results.csv
drwxr-xr-x - root supergroup 0 2017-06-01 15:23 /user/spot/flow/scored_results/20170601/scores/invalid_records
root@hadoop-master:~/incubator-spot/spot-ml#
With DNS capture I don't get any suspicious connections too. Is this the normal behaviour? ML seems to work fine because it reaches to more than 1000 stages. For DNS ml, what does USER_DOMAIN_CMD mean?
Thanks,