For the duration of the Spring 2021 semester at Northeastern University, I worked with Dr. David Choffnes and Dr. Daniel Dubois as a Research Apprentice at the IoT lab at the Boston campus to produce a data visualization tool that could depict network traffic in real time; a project titled VizIoT. I had inherited the project from several other students, and under the guidance of Dr. Choffnes and Dr. Dubois, expanded upon the existing application to find new and innovative ways of visualizing network traffic. The project may be viewed at the following IP address:
The repository for the project can be found here:
The Stack
The project is comprised of four main components: a frontend written in JSReact, a backend written with NodeJS, a MongoDB instance, and Python scripts designed to capture, categorize, and store packet data mirrored by a network interface at the IoT lab. Throughout my research, I modified and introduced new functionality across all four components of the application.
In the frontend, I implemented three new visualizations using JSReact: a connection table depicting sent and received traffic by individual connections over each device in the lab, a live line graph categorizing network traffic by Sent/Received traffic, and a live line graph categorizing network traffic by transport layer and application layer networking protocols. I also implemented both push and pull models for retreiving and storing data for each of these visualizations.
In the backend, I used NodeJS to facilitate data processing and communication with the frontend client. I implemented an API that allows the frontend to request mappings between MAC addresses and human-friendly device names and aggregate packet data over specified intervals per connection and per device depending on the needs of the visualization. I also utilized existing data models via the Mongoose library to retrieve packet and device data stored in a local MongoDB instance.
The MongoDB instance queried by the backend component contains collections for packet data, MAC-to-device mappings, and IP-to-destination mappings. While device data is persistent in the instance, the packet data collection implements an index on timestamps associated with each packet and associates a TTL with each timestamp to ensure that only recent data is collected in order to limit storage space used by the database. Since the application is interested in live data, this ensures a relatively small footprint on the server running the application while providing data necessary to populating the different visualzations.
The Python scripts responsible for capturing, formatting, and inserting data into the MongoDB instance utilize the Scapy library to capture packet data over a local network interface. The code injects a function of our own design to perform the necessary data processing prior to insertion via the Pymongo library. The Python scripts also ensure the creation or existence of the aforementioned index on the timestamp field within the packet data collection. Python scripts are also used to insert MAC-to-device mappings via local text files and the Pymongo library.
Connection Table
One of the three visualizations I designed and implemented is what I have dubbed the Connection Table. This visualization deliniates the different IP addresses with which a device communicates. In many instances, a single device will transmit data to multiple locations, so a single device may have multiple entries in the same table. While it does not distinguish how data is being transmitted, each entry in the table contains a graph that shows how much data is being sent and received over each connection, where the top half of the graph (in red) is sent data, and the bottom half of the graph (in blue) is received data. Each entry also contains two hard metrics describing the total data sent over the last 5 second and 30 seconds over the connection. Connections are sorted from most overall data (the sum of sent and received data) over the last 30 seconds. Using DNS lookups and a local IP-to-country database, each entry in the table also lists the IP or public hostname if one exists and the country with where that IP address is listed.
The visualization also contains a table on the left side of the page which allows the user to select which devices they would like to view. This allows the user to ignore certain connections if they are not relevant to the device(s) being observed. While the table displays five entries currently, the application may be configured to allow for any number of entries to be shown.
I would also like to note that I did not use a graphing library or API for the graphs in the connection table; I designed the graph myself using Scalable Vector Graphics and good old-fashioned arithmetic.
Sent/Received
The Sent/Received visualization is a live line graph that uses a rolling x-axis to display data coming into and going out of devices in the IoT lab. The main graph on this page shows the cumulative sent and received traffic for all devices in the IoT lab. The metrics displayed above the live line graph display the sum total traffic both sent and received by the devices within the lab over the last sixty seconds.
Since the main graph only shows the cumulative data between all devices, it does not paint a clear picture of any single device. To provide low-level data per each device, I also implemented a live line graph with metric data per each device. However, this data is not specific to a particular connection, and instead displays cumulative sent and received traffic over all connections for a particular device.
I should note that the live line graph was already implemented by one of the previous students whom I inherited the project from. However, the original implementation only supported a single line, as can be seen in the image at the top of the page, and so I again used Scalable Vector Graphics to draw additional lines over the existing implementation. I also implemented custom colorization of these lines, and allowed for a variable number of lines to be displayed in the graph, as can be seen below in the Protocol visualization.
Protocol
The Protocol visualization is very similar to the Sent/Received visualization; it is comprised of a live line graph with a rolling x-axis, except instead of monitoring whether packets are incoming or outgoing, it checks both transport layer and application layer protocols used to send each packet.
Again, since the main graph is a visualization of the total packets sent via a particular protocol across all devices, individual device cards corresponding to devices in the IoT lab were implemented in this visualization as well.
Both the Sent/Received and Protocol visualizations use a React component that generates both the device cards and the main live line graph from simple data structures outlined in the project documentation. They are very similar because they are, in part, meant to examplify how easy it is to reuse the existing code base to produce new visualizations for a particular purpose. It is likely that I will not be the last student to work on this project, and it is my hope that other researchers will be able to use the VizIoT application to help inform their understanding of the behavior of IoT devices. Thus, one of the goals of this project was to create a visualization which lets the user customize a live representation of something that would be difficult to create from scratch.
Comentarios