One common task in network forensics is reassembling TCP streams from captured network data. This PR adds this capability to Drill.
To enable TCP re-sessionization, in the configuration for the PCAP reader, simply set the variable: sessionizeTCPStreams to true.
This can also be accomplished at query time by using the table() method.
SELECT * FROM table(dfs.test.`attack-trace.pcap` (type => 'pcap', sessionizeTCPStreams=> true))
When this option is enabled, Drill will ignore all packets that are not TCP packets.
Executing a query with this option enables changes the results Drill will return from PCAP files.
You will get the following columns:
- session_start_time: The start time of the session
- session_end_time: The ending time of the session
- session_duration: The duration of the session. This will be a Drill PERIOD datatype.
- total_packet_count: The number of packets in the session
- connection_time: The amount of time it took for the TCP handshake to be completed. Useful for network diagnostics
- src_ip: The IP address of the initiating machine
- dst_ip: The IP address of the remote machine
- src_port: The port of the originating machine
- dst_port: The port of the remote machine
- src_mac_address: The MAC address of the originating machine
- dst_mac_address: The MAC address of the remote machine
- tcp_session: This is the session hash for the TCP session. (Long)
- is_corrupt: True/false if the session contains corrupted packets
- data_from_originator: The data sent from the originator
- data_from_remote: The data sent from the remote machine
- data_volume_from_remote: The number of bytes sent from the remote host
- data_volume_from_origin: The number of bytes sent from the originating machine
- packet_count_from_origin: The number of packets sent from the originating machine
- packet_count_from_remote: The number of packets sent from the remote machine