Uploaded image for project: 'Spot'
  1. Spot
  2. SPOT-116

[Ingest] HDFS nfcapd files are not converted into parquet

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.0
    • Labels:
    • Environment:
      5.8.0-1.cdh5.8.0.p0.42, Spark 1.6.0

      Description

      The nfcapd files that are in HDFS are converted into empty parquet files. Checked the nfcapd files and they are not empty.

      This is the screen stdout:

      2017-02-24 22:25:10,732 - SPOT.INGEST.WATCHER - INFO - -------------------------------------- New File detected --------------------------------------
      2017-02-24 22:25:10,732 - SPOT.INGEST.WATCHER - INFO - File: /home/spot/netflow-files/nfcapd.20170224222000
      2017-02-24 22:25:10,732 - SPOT.INGEST.WATCHER - INFO - File /home/spot/netflow-files/nfcapd.20170224222000 added to the queue
      2017-02-24 22:25:10,732 - SPOT.INGEST.WATCHER - INFO - ------------------------------------------------------------------------------------------------
      2017-02-24 22:25:10,733 - SPOT.INGEST.WATCHER - INFO - -------------------------------------- New File detected --------------------------------------
      2017-02-24 22:25:10,733 - SPOT.INGEST.WATCHER - INFO - File: /home/spot/netflow-files/nfcapd.current.6266
      2017-02-24 22:25:10,733 - SPOT.INGEST.WATCHER - WARNING - File extension not supported: /home/spot/netflow-files/nfcapd.current.6266
      2017-02-24 22:25:10,733 - SPOT.INGEST.WATCHER - WARNING - File won't be ingested
      2017-02-24 22:25:10,733 - SPOT.INGEST.WATCHER - INFO - ------------------------------------------------------------------------------------------------
      2017-02-24 22:25:11,272 - SPOT.INGEST.FLOW.31183 - INFO - SPOT.Utils: Creating hdfs folder: hadoop fs -mkdir -p /user/spot/pipelines/flow/binary/20170224/22
      2017-02-24 22:25:13,715 - SPOT.INGEST.FLOW.31183 - INFO - SPOT.Utils: Loading file to hdfs: hadoop fs -moveFromLocal /home/spot/netflow-files/nfcapd.20170224222000 /user/spot/pipelines/flow/binary/20170224/22/nfcapd.20170224222000
      2017-02-24 22:25:16,422 - SPOT.INGEST.FLOW.31183 - INFO - Sending file to worker number: 1
      2017-02-24 22:25:16,552 - SPOT.INGEST.FLOW.31183 - INFO - File /home/spot/netflow-files/nfcapd.20170224222000 has been successfully sent to Kafka Topic to: SPOT-INGEST-flow_internals-17_28_51
      2017-02-24 22:30:10,729 - SPOT.INGEST.WATCHER - INFO - -------------------------------------- New File detected --------------------------------------
      2017-02-24 22:30:10,729 - SPOT.INGEST.WATCHER - INFO - File: /home/spot/netflow-files/nfcapd.20170224222500
      2017-02-24 22:30:10,729 - SPOT.INGEST.WATCHER - INFO - File /home/spot/netflow-files/nfcapd.20170224222500 added to the queue
      2017-02-24 22:30:10,729 - SPOT.INGEST.WATCHER - INFO - ------------------------------------------------------------------------------------------------
      2017-02-24 22:30:10,729 - SPOT.INGEST.WATCHER - INFO - -------------------------------------- New File detected --------------------------------------
      2017-02-24 22:30:10,730 - SPOT.INGEST.WATCHER - INFO - File: /home/spot/netflow-files/nfcapd.current.6266
      2017-02-24 22:30:10,730 - SPOT.INGEST.WATCHER - WARNING - File extension not supported: /home/spot/netflow-files/nfcapd.current.6266
      2017-02-24 22:30:10,730 - SPOT.INGEST.WATCHER - WARNING - File won't be ingested
      2017-02-24 22:30:10,730 - SPOT.INGEST.WATCHER - INFO - ------------------------------------------------------------------------------------------------
      2017-02-24 22:30:11,565 - SPOT.INGEST.FLOW.31176 - INFO - SPOT.Utils: Creating hdfs folder: hadoop fs -mkdir -p /user/spot/pipelines/flow/binary/20170224/22
      2017-02-24 22:30:14,076 - SPOT.INGEST.FLOW.31176 - INFO - SPOT.Utils: Loading file to hdfs: hadoop fs -moveFromLocal /home/spot/netflow-files/nfcapd.20170224222500 /user/spot/pipelines/flow/binary/20170224/22/nfcapd.20170224222500
      2017-02-24 22:30:16,825 - SPOT.INGEST.FLOW.31176 - INFO - Sending file to worker number: 0
      2017-02-24 22:30:16,944 - SPOT.INGEST.FLOW.31176 - INFO - File /home/spot/netflow-files/nfcapd.20170224222500 has been successfully sent to Kafka Topic to: SPOT-INGEST-flow_internals-17_28_51
      

      The YARN job:

      2017-02-24 19:35:47,812 INFO [Thread-68] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copying hdfs://tanuki.akainix.local:8020/user/spot/.staging/job_1480610160914_0358/job_1480610160914_0358_1_conf.xml to hdfs://tanuki.akainix.local:8020/user/history/done_intermediate/spot/job_1480610160914_0358_conf.xml_tmp
      2017-02-24 19:35:47,844 INFO [Thread-68] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copied to done location: hdfs://tanuki.akainix.local:8020/user/history/done_intermediate/spot/job_1480610160914_0358_conf.xml_tmp
      2017-02-24 19:35:47,853 INFO [Thread-68] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: hdfs://tanuki.akainix.local:8020/user/history/done_intermediate/spot/job_1480610160914_0358.summary_tmp to hdfs://tanuki.akainix.local:8020/user/history/done_intermediate/spot/job_1480610160914_0358.summary
      2017-02-24 19:35:47,854 INFO [Thread-68] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: hdfs://tanuki.akainix.local:8020/user/history/done_intermediate/spot/job_1480610160914_0358_conf.xml_tmp to hdfs://tanuki.akainix.local:8020/user/history/done_intermediate/spot/job_1480610160914_0358_conf.xml
      2017-02-24 19:35:47,857 INFO [Thread-68] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: hdfs://tanuki.akainix.local:8020/user/history/done_intermediate/spot/job_1480610160914_0358-1487975734022-spot-INSERT+INTO+TABLE+spotdb.f...spotdb.flow_tmp%28Stage-1487975747716-1-0-SUCCEEDED-root.users.spot-1487975739310.jhist_tmp to hdfs://tanuki.akainix.local:8020/user/history/done_intermediate/spot/job_1480610160914_0358-1487975734022-spot-INSERT+INTO+TABLE+spotdb.f...spotdb.flow_tmp%28Stage-1487975747716-1-0-SUCCEEDED-root.users.spot-1487975739310.jhist
      2017-02-24 19:35:47,857 INFO [Thread-68] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopped JobHistoryEventHandler. super.stop()
      2017-02-24 19:35:47,858 INFO [Thread-68] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1480610160914_0358_m_000000_0
      2017-02-24 19:35:47,858 INFO [Thread-68] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: Opening proxy : levante.akainix.local:8041
      2017-02-24 19:35:47,877 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1480610160914_0358_m_000000_0 TaskAttempt Transitioned from SUCCESS_FINISHING_CONTAINER to SUCCEEDED
      2017-02-24 19:35:47,878 INFO [Thread-68] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Setting job diagnostics to 
      2017-02-24 19:35:47,878 INFO [Thread-68] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: History url is http://tanuki.akainix.local:19888/jobhistory/job/job_1480610160914_0358
      2017-02-24 19:35:47,886 INFO [Thread-68] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Waiting for application to be successfully unregistered.
      2017-02-24 19:35:48,888 INFO [Thread-68] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Final Stats: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:0 RackLocal:0
      2017-02-24 19:35:48,889 INFO [Thread-68] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Deleting staging directory hdfs://tanuki.akainix.local:8020 /user/spot/.staging/job_1480610160914_0358
      2017-02-24 19:35:48,900 INFO [Thread-68] org.apache.hadoop.ipc.Server: Stopping server on 39525
      2017-02-24 19:35:48,901 INFO [IPC Server listener on 39525] org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 39525
      2017-02-24 19:35:48,901 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
      2017-02-24 19:35:48,901 INFO [TaskHeartbeatHandler PingChecker] org.apache.hadoop.mapreduce.v2.app.TaskHeartbeatHandler: TaskHeartbeatHandler thread interrupted
      2017-02-24 19:35:48,902 INFO [Ping Checker] org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: TaskAttemptFinishingMonitor thread interrupted
      

      Regards,
      Joaquín Silva

        Attachments

        1. screen.stdout
          19 kB
          Joaquín Silva

          Activity

            People

            • Assignee:
              EverLoSa Everardo Lopez Sandoval
              Reporter:
              JoaquinS Joaquín Silva
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: