Get DuckDB swimming in your Packet Pond in 10 Minutes!

Matt Franz
6 min readAug 31, 2024

--

At some point over the last few years I discovered DuckDB as an alternative lightweight SQL database that is super fast and has support for a wide variety of file formats, but until now I’d only used CSV and Parquet, but recently a pcap extension was released, so I thought I’d try it out! Fortunately now there are binaries available so you don’t have to go through the headache of a building at DuckDB extension.

Installation

If you don’t already have it installed, install DuckDB 1.0 on on your platform.

Following the instructions from the updated blog post at the bottom, you then install the pre-compiled extension.

$ duckdb -unsigned
v1.0.0 1f98600c2c
Enter ".help" for usage hints.
D SET custom_extension_repository='https://w3c2.c20.e2-5.dev/ppcap/latest';
D install ppcap;
D load ppcap;

If you want to see all the extensions available, use the following query:

D select extension_name from duckdb_extensions() where loaded = True;
┌────────────────┐
│ extension_name │
│ varchar │
├────────────────┤
│ autocomplete │
│ fts │
│ icu │
│ inet │
│ jemalloc │
│ json │
│ parquet │
│ ppcap │
│ shell │
│ tpch │
├────────────────┤
│ 10 rows │
└────────────────┘

This creates the following files in ~/.duckdb

$ ls -alR
.:
total 12
drwxr-xr-x 3 mfranz mfranz 4096 Aug 31 12:54 .
drwxr-x--- 41 mfranz mfranz 4096 Aug 31 12:56 ..
drwxr-xr-x 3 mfranz mfranz 4096 Aug 31 12:54 extensions
./extensions:
total 12
drwxr-xr-x 3 mfranz mfranz 4096 Aug 31 12:54 .
drwxr-xr-x 3 mfranz mfranz 4096 Aug 31 12:54 ..
drwxr-xr-x 3 mfranz mfranz 4096 Aug 31 12:54 v1.0.0
./extensions/v1.0.0:
total 12
drwxr-xr-x 3 mfranz mfranz 4096 Aug 31 12:54 .
drwxr-xr-x 3 mfranz mfranz 4096 Aug 31 12:54 ..
drwxr-xr-x 2 mfranz mfranz 4096 Aug 31 12:55 linux_amd64_gcc4
./extensions/v1.0.0/linux_amd64_gcc4:
total 22980
drwxr-xr-x 2 mfranz mfranz 4096 Aug 31 12:55 .
drwxr-xr-x 3 mfranz mfranz 4096 Aug 31 12:54 ..
-rw-rw-r-- 1 mfranz mfranz 19712350 Aug 31 12:54 httpfs.duckdb_extension
-rw-rw-r-- 1 mfranz mfranz 131 Aug 31 12:54 httpfs.duckdb_extension.info
-rw-rw-r-- 1 mfranz mfranz 3797542 Aug 31 12:55 ppcap.duckdb_extension
-rw-rw-r-- 1 mfranz mfranz 144 Aug 31 12:55 ppcap.duckdb_extension.info

Loading your PCAP

You don’t have to install the extension each time so the next time you run, you just load it it then create a table from your pcap

$ duckdb -unsigned
v1.0.0 1f98600c2c
Enter ".help" for usage hints.
D
D load ppcap;
D
D create table bridged as from read_pcap("~/bridged.pcap");

All your normal SQL commands work

D describe bridged;
┌─────────────┬─────────────┬──────┬─────┬─────────┬───────┐
│ column_name │ column_type │ null │ key │ default │ extra │
├─────────────┼─────────────┼──────┼─────┼─────────┼───────┤
│ timestamp │ TIMESTAMP │ YES │ │ │ │
│ source_ip │ VARCHAR │ YES │ │ │ │
│ dest_ip │ VARCHAR │ YES │ │ │ │
│ source_port │ INTEGER │ YES │ │ │ │
│ dest_port │ INTEGER │ YES │ │ │ │
│ length │ INTEGER │ YES │ │ │ │
│ tcp_session │ VARCHAR │ YES │ │ │ │
│ source_mac │ VARCHAR │ YES │ │ │ │
│ dest_mac │ VARCHAR │ YES │ │ │ │
│ protocols │ VARCHAR[] │ YES │ │ │ │
│ payload │ BLOB │ YES │ │ │ │
│ tcp_flags │ VARCHAR[] │ YES │ │ │ │
│ tcp_seq_num │ UINTEGER │ YES │ │ │ │
└─────────────┴─────────────┴──────┴─────┴─────────┴───────┘
D select count(*) from bridged;
┌──────────────┐
│ count_star() │
├──────────────┤
│ 104851 │
└──────────────┘
D .mode line
D pragma database_size;
database_name = memory
database_size = 0 bytes
block_size = 0
total_blocks = 0
used_blocks = 0
free_blocks = 0
wal_size = 0 bytes
memory_usage = 114.6 MiB
memory_limit = 37.5 GiB

Slicing and Dicing with SQL

What are the IP protocols active on the network. Mostly UDP! (I do wish this field had been an INTEGER instead of a VARCHAR.

D select protocols, count(*) as cnt from bridged group by protocols order by cnt desc;
┌─────────────────────────┬────────┐
│ protocols │ cnt │
├─────────────────────────┼────────┤
│ [Ethernet, IP, UDP] │ 101644 │
│ [Ethernet, IP, TCP] │ 2831 │
│ [Ethernet, IP, Unknown] │ 346 │
│ [Ethernet, IP, ICMP] │ 30 │
└─────────────────────────┴────────┘

Let’s find all the IPs on my local network

D SELECT distinct (source_ip) from bridged where source_ip like '192.168.12%';
┌────────────────┐
│ source_ip │
├────────────────┤
│ 192.168.12.131 │
│ 192.168.12.159 │
│ 192.168.12.113 │
│ 192.168.12.1 │
└────────────────┘

Let’s go down to the link layer and see where most of the frames are going from!

D SELECT DISTINCT (source_mac, dest_mac) from bridged;
┌────────────────────────────────────────┐
│ main.row(source_mac, dest_mac) │
├────────────────────────────────────────┤
│ (00:16:3e:9b:21:27, 33:33:00:00:00:16) │
│ (00:16:3e:7d:5c:ca, 33:33:00:00:00:16) │
│ (d6:06:10:3e:e9:ed, 56:1c:4b:ac:c3:15) │
│ (3a:99:83:4e:f6:9f, 33:33:00:00:00:16) │
│ (c8:99:b2:e1:47:17, 01:80:c2:00:00:13) │
│ (3a:99:83:4e:f6:9f, 33:33:00:00:00:02) │
│ (56:1c:4b:ac:c3:15, d6:06:10:3e:e9:ed) │
│ (56:1c:4b:ac:c3:15, 01:00:5e:7f:ff:fa) │
│ (c8:99:b2:e1:47:17, 33:33:00:00:00:01) │
│ (3a:99:83:4e:f6:9f, 01:00:5e:00:00:fb) │
│ (84:47:09:19:c3:16, 33:33:00:00:00:16) │
│ (00:16:3e:53:19:f5, 33:33:00:00:00:16) │
│ (00:16:3e:aa:1b:f3, 33:33:00:00:00:16) │
│ (52:54:00:8b:d8:ac, 33:33:00:00:00:16) │
│ (56:1c:4b:ac:c3:15, c8:99:b2:e1:47:17) │
│ (d6:06:10:3e:e9:ed, 33:33:00:00:00:16) │
│ (c8:99:b2:e1:47:1c, 01:80:c2:00:00:13) │
│ (c8:99:b2:e1:47:17, 56:1c:4b:ac:c3:15) │
│ (3a:99:83:4e:f6:9f, 33:33:00:00:00:fb) │
│ (3a:99:83:4e:f6:9f, 01:00:5e:00:00:16) │
│ (c8:99:b2:e1:47:17, 33:33:ff:45:46:08) │
└────────────────────────────────────────┘
D select dest_mac, count(*) as cnt from bridged group by dest_mac order by cnt desc;
┌───────────────────┬───────┐
│ dest_mac │ cnt │
├───────────────────┼───────┤
│ 56:1c:4b:ac:c3:15 │ 52469 │
│ d6:06:10:3e:e9:ed │ 51857 │
| c8:99:b2:e1:47:17 │ 492 │
│ 33:33:00:00:00:16 │ 16 │
│ 01:00:5e:7f:ff:fa │ 4 │
│ 33:33:00:00:00:fb │ 3 │
│ 01:00:5e:00:00:fb │ 3 │
│ 01:80:c2:00:00:13 │ 2 │
│ 01:00:5e:00:00:16 │ 2 │
│ 33:33:00:00:00:02 │ 1 │
│ 33:33:ff:45:46:08 │ 1 │
│ 33:33:00:00:00:01 │ 1 │
└───────────────────┴───────┘

The first entry is my workstation so it makes sense most of the comms are with my router.

D select source_ip, count(*) as cnt from bridged where dest_mac = '56:1c:4b:ac:c3:15' and source_ip like '192.168.12%' group by source_ip order by cnt desc limit 25;
┌────────────────┬───────┐
│ source_ip │ cnt │
├────────────────┼───────┤
│ 192.168.12.159 │ 51977 │
│ 192.168.12.1 │ 36 │
└────────────────┴───────┘

Let’s see the unique ports

D select distinct(dest_port) from bridged where dest_mac = 'c8:99:b2:e1:47:17' and source_port != 0;
┌───────────┐
│ dest_port │
├───────────┤
│ 1900 │
│ 3478 │
│ 443 │
│ 80 │
│ 53 │
│ 5351 │
└───────────┘
D select dest_port, count(*) as cnt from bridged where dest_mac = 'c8:99:b2:e1:47:17' and source_port != 0 group by dest_port order by cnt desc;
┌───────────┬─────┐
│ dest_port │ cnt │
├───────────┼─────┤
│ 443 │ 181 │
│ 3478 │ 71 │
│ 80 │ 37 │
│ 53 │ 26 │
│ 5351 │ 8 │
│ 1900 │ 2 │

Looking at the packets!

Getting used to switching output formats is useful depending on your screen resolution.

D select * from bridged where length < 15 limit 3;
timestamp = 2024-08-31 16:56:25
source_ip = 192.168.12.131
dest_ip = 209.177.145.120
source_port = 46017
dest_port = 443
length = 14
tcp_session = 192.168.12.131:46017 -> 209.177.145.120:443
source_mac = 56:1c:4b:ac:c3:15
dest_mac = c8:99:b2:e1:47:17
protocols = [Ethernet, IP, TCP]
payload = \x00\x00\x01\x01\x08\x0A\xF5\xB3K.\x8B\xFB\x96\x1C
tcp_flags = [ACK]
tcp_seq_num = 3986401156
  timestamp = 2024-08-31 16:56:25
source_ip = 192.168.12.159
dest_ip = 192.168.12.131
source_port = 24679
dest_port = 6443
length = 14
tcp_session = 192.168.12.159:24679 -> 192.168.12.131:6443
source_mac = d6:06:10:3e:e9:ed
dest_mac = 56:1c:4b:ac:c3:15
protocols = [Ethernet, IP, TCP]
payload = \x00\x00\x01\x01\x08\x0A\xAE\x87H\xD5\xBC\xF6d\x12
tcp_flags = [ACK]
tcp_seq_num = 1817817319
timestamp = 2024-08-31 16:56:25
source_ip = 192.168.12.159
dest_ip = 192.168.12.131
source_port = 24679
dest_port = 6443
length = 14
tcp_session = 192.168.12.159:24679 -> 192.168.12.131:6443
source_mac = d6:06:10:3e:e9:ed
dest_mac = 56:1c:4b:ac:c3:15
protocols = [Ethernet, IP, TCP]
payload = \x00\x00\x01\x01\x08\x0A\xAE\x87H\xD5\xBC\xF6d\x12
tcp_flags = [ACK]
tcp_seq_num = 1817817319

Originally published on https://blog.mdfranz.com/

<script defer data-domain=”mdfranz.medium.com” src=”https://plausible.io/js/script.js"></script>

--

--

Matt Franz
Matt Franz

Written by Matt Franz

Occasional Thoughts on History, Books, Tech and Engineering Management

No responses yet