PCAP Parser

Parse PCAP files and visualize network data

Project maintained by Hersh500 Hosted on GitHub Pages — Theme by mattgraham

An Open-Source R Package to Parse through Packet Capture Files and Display Graphs

Use Wireshark or another packet capture utility to capture IP traffic data, and use PCAP Parser to analyze traffic.

How it Works

Packet Capture Files in the libpcap format store data in this order:

PCAP Parser follows this format to gather relevant information about where data originates from and where it goes. This information is provided in the packet capture file in the IPv4 Header.

struct ipv4_hdr_s {
    uint8 vers_hdrlen;
    uint8 dscp_ecn;
    uint16 total_len;         /* NETWORK ORDER */
    uint16 identification;         /* NETWORK ORDER */
    uint16 flags_frag_ofs;        /* NETWORK ORDER */
    uint8 ttl;
    uint8 proto; 
    uint16 hdr_checksum;         /* NETWORK ORDER */
    uint32 src_ip;         /* NETWORK ORDER */
    uint32 dst_ip;         /* NETWORK ORDER */
};

The "Proto" flag describes the protocol (TCP/UDP/ICMP) of the packet. Based on this, the following data in the packet is copied to a TCP/UDP/ICMP header structure, using the copy_bytes function:

void copy_bytes (void *_from, void *_to, int num)
{
    int i;
    uint8 *from = (uint8 *)_from;
    uint8 *to = (uint8 *)_to;

    for (i = 0; i < num; i++) {
        to[i] = from[i];
    }
#if 0
    while (i < num) 
        *to = *from;
        to = to + 1;
        from = from + 1;
    i++;
#endif
}

The structure of the TCP Header is as follows:

struct tcp_hdr_s {
    uint16 src_port;        /* NETWORK ORDER */
    uint16 dst_port;         /* NETWORK ORDER */
    uint32 seq_num;         /* NETWORK ORDER */
    uint32 ack_num;        /* NETWORK ORDER */
    uint16 ofs_ctrl;        /* NETWORK ORDER */        
    uint16 window_size;         /* NETWORK ORDER */
    uint16 checksum;         /* NETWORK ORDER */
    uint16 urgent_pointer;         /* NETWORK ORDER */
};

TCP Packets exchanged (syn, syn+ack...fin) between two machines are tracked by a linked list of Flows. The Linked List structure:

struct flow_s {
    uint32 flow_id;
    uint32 src_ip; 
    uint32 dst_ip; 
    uint16 src_port; 
    uint16 dst_port; 
    uint32 num_pkts;
    uint32 seq_num; 
    uint8 is_open;
    uint32 num_bytes1; /* from initiator */
    uint32 num_bytes2; /* from responder */
    uint32 start_time; /* first syn */
    uint32 end_time; /* fin_ack or ack */
    uint8 closed; 
    uint32 num_init_pkts;
    uint32 num_resp_pkts;
    uint32 src_timestamps [MAX_NUM_PACKETS]; /* timestamps on pkts from initiator i.e. who sent first syn */
    uint32 dst_timestamps [MAX_NUM_PACKETS]; /* timestamps on pkts from responder */
    uint32 src_seq_nums [MAX_NUM_PACKETS];
    uint32 src_ack_nums [MAX_NUM_PACKETS];
    uint32 dst_seq_nums [MAX_NUM_PACKETS];
    uint32 dst_ack_nums [MAX_NUM_PACKETS];
    uint32 packets [MAX_NUM_PACKETS];
    struct flow_s *next;
};

Each flow is closed after the fin packet is sent. This way, a linked list of flows is assembled for the R module to visualize.

Network Order

Read about Endianness here

Certain values in the IP Header code are commented "Network Order". These values are stored in Big-Endian format, and thus need to be converted to Little-Endian format, which is how multi-byte values are stored on Intel x86 systems. This is done for integers with the following function:

unsigned int _int_switcher(unsigned int *x)
{
    char *b1;
    char temp;

    b1 = (char *) x;
    temp = *b1;
    *b1 = *(b1+3);
    *(b1+3) = temp;

    temp = *(b1+1);
    *(b1+1) = *(b1+2);
    *(b1+2) = temp;
    return (*x);
}

Definitions

typedef unsigned int uint32;
typedef unsigned short uint16;
typedef signed int int32;
typedef unsigned char uint8;
typedef unsigned char mac_addr_t [6];

Visualization

The R visualization package is still in progress. To maximize readability, the ArcDiagrams R Package will be used to visualize various flows. The data is exchanged between the R program and parser through function calls from the R package. For example, this function creates a flow table:

get_flow_table <- function () {
    src_ips <- .Call("get_src_ipaddr_vector")
    dst_ips <- .Call("get_dst_ipaddr_vector")
    src_ports <- .Call("get_src_port_vector")
    dst_ports <- .Call("get_dst_port_vector")
    start_time <- .Call("get_start_time_vector")
    flow_id <- .Call("get_flow_id_vector")

    flow_table <- data.frame (SrcIP = src_ips, DstIP = dst_ips, SrcPort = src_ports, DstPorts = dst_ports, 
                                    StartTime = start_time, FlowId = flow_id)
    return(flow_table)
}

In Progress

GeoIP Support: Based on the IP Address, the parser will assemble a table of locations. The goal of this is to show connections on a world map.
Company Identification Based on MAC Addr.
Visualization
IPv6, UDP, ICMP Support