Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7443

Enable PCAP Plugin to Reassemble TCP Streams

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments



      One common task in network forensics is reassembling TCP streams from captured network data.  This PR adds this capability to Drill.


      To enable TCP re-sessionization, in the configuration for the PCAP reader, simply set the variable: sessionizeTCPStreams to true.

      This can also be accomplished at query time by using the table() method.

      SELECT * FROM table(dfs.test.`attack-trace.pcap` (type => 'pcap', sessionizeTCPStreams=> true))


      When this option is enabled, Drill will ignore all packets that are not TCP packets.
      Executing a query with this option enables changes the results Drill will return from PCAP files.

      You will get the following columns:

      • session_start_time: The start time of the session
      • session_end_time: The ending time of the session
      • session_duration: The duration of the session. This will be a Drill PERIOD datatype.
      • total_packet_count: The number of packets in the session
      • connection_time: The amount of time it took for the TCP handshake to be completed. Useful for network diagnostics
      • src_ip: The IP address of the initiating machine
      • dst_ip: The IP address of the remote machine
      • src_port: The port of the originating machine
      • dst_port: The port of the remote machine
      • src_mac_address: The MAC address of the originating machine
      • dst_mac_address: The MAC address of the remote machine
      • tcp_session: This is the session hash for the TCP session. (Long)
      • is_corrupt: True/false if the session contains corrupted packets
      • data_from_originator: The data sent from the originator
      • data_from_remote: The data sent from the remote machine
      • data_volume_from_remote: The number of bytes sent from the remote host
      • data_volume_from_origin: The number of bytes sent from the originating machine
      • packet_count_from_origin: The number of packets sent from the originating machine
      • packet_count_from_remote: The number of packets sent from the remote machine



          This comment will be Viewable by All Users Viewable by All Users


            cgivre Charles Givre
            cgivre Charles Givre
            Arina Ielchiieva Arina Ielchiieva
            0 Vote for this issue
            3 Start watching this issue




                Issue deployment