Migrating from Fathom Lite to Umami
For years I’ve been running Fathom Lite to track visitors on this website. With track I don’t mean in the creepy way, just some privacy respecting stats on which posts get the most traffic. But, Fathom Lite has been archived for years (I didn’t know). With a recent homelab overhaul (more on that in a later post) I replaced it with Umami. It’s MIT-licensed, self-hosted, and lightweight. The challenge: migrating six years of historical data from Fathom over to Umami.
Full disclosure: Claude Code helped me a lot writing, testing and running the script mentioned below. I use Claude extensively these days to manage almost all aspects of my homelab setup.
The problem
Fathom stores hourly aggregates in SQLite: “42 pageviews on /blog/post/ between 14:00-15:00”. Umami stores individual events: one website_event row per pageview, linked to a session. There’s no aggregate import.
My dataset: 277K hourly rows, 666K pageviews, five sites, dating back to March 2020.
The migration script expands each aggregate back into discrete event rows:
- Creates N
website_eventrows per hourly bucket, timestamps spread randomly within the hour - Generates deterministic session UUIDs for the reported visitor count
- Distributes pageviews across sessions round-robin
- Attaches referrer domains from
referrer_statsproportionally
The result is synthetic but re-aggregates correctly in Umami’s dashboard. Page paths, pageview counts, visitor counts, and referrers are preserved exactly. Browser/OS/device data is NULL because Fathom never tracked it.
Key design decisions
Deterministic UUIDs for idempotency
All IDs use uuid5 with a fixed namespace, so re-running produces identical rows. Combined with ON CONFLICT DO NOTHING, the script is safe to re-run after crashes.
NAMESPACE_UUID = uuid.UUID("a1b2c3d4-e5f6-7890-abcd-ef1234567890")
def det_uuid(namespace: str, *parts: str) -> str:
return str(uuid.uuid5(NAMESPACE_UUID, f"{namespace}:{'|'.join(parts)}"))
Streaming batches to avoid OOM
The first version buffered all 666K events in memory and OOM-killed the 1GB LXC container than runs Umami. The fix: stream rows from SQLite, flush to PostgreSQL every 2,000 events. Peak memory dropped from ~500MB to ~50MB.
for row in iter_page_stats(fathom_db, site_id):
# expand row into events, append to batch
if len(event_batch) >= batch_size:
flush_batch(pg_cur, session_batch, event_batch)
pg_conn.commit()
session_batch.clear()
event_batch.clear()
Running it
Prerequisites: Python 3.11+, psycopg2, access to both the Fathom SQLite DB and Umami’s PostgreSQL.
# Grab the Fathom database (mine ran on docker)
scp root@<old-host>:/var/lib/docker/volumes/fathom/_data/fathom.db /tmp/fathom.db
# Explore what you have
sqlite3 /tmp/fathom.db "SELECT * FROM sites;"
sqlite3 /tmp/fathom.db "SELECT MIN(ts), MAX(ts), SUM(pageviews) FROM page_stats WHERE site_id = 1;"
Create a website in Umami’s UI and grab its UUID, then:
# Dry run -- reads data, reports what it would insert
python3 migrate-fathom-to-umami.py \
--fathom-db /tmp/fathom.db \
--fathom-site-id 1 \
--umami-website-id <umami-uuid> \
--umami-dsn "postgresql://umami:<password>@<host>:5432/umami" \
--dry-run
# Run for real
python3 migrate-fathom-to-umami.py \
--fathom-db /tmp/fathom.db \
--fathom-site-id 1 \
--umami-website-id <umami-uuid> \
--umami-dsn "postgresql://umami:<password>@<host>:5432/umami"
Counting rows in /tmp/fathom.db (site_id=1)...
277,894 hourly rows, 666,664 total pageviews to expand
Streaming page_stats -> Umami events...
100,639 events / 90,669 sessions (15,231/277,894 rows)
...
Done! 666,664 events, 611,077 sessions written.
Next, verify in Umami by setting the date range to cover your historical data.
The full script
#!/usr/bin/env python3
"""
Migrate Fathom Lite (SQLite) page_stats and referrer_stats into Umami v3 (PostgreSQL).
Fathom stores hourly aggregated stats (pageviews, visitors per path per hour).
This script expands those aggregates back into individual website_event + session
rows in Umami's schema. Since Fathom didn't track browser/OS/device, those
fields are left NULL.
Streams data in batches to keep memory usage low (~50MB).
Usage:
python3 scripts/migrate-fathom-to-umami.py \\
--fathom-db /tmp/fathom.db \\
--fathom-site-id 1 \\
--umami-website-id 9bb62e2b-202b-4d4e-a135-965d95fa037b \\
--umami-dsn "postgresql://umami:<password>@10.1.10.75:5432/umami" \\
[--batch-size 2000] [--dry-run]
"""
import argparse
import random
import sqlite3
import uuid
import psycopg2
import psycopg2.extras
NAMESPACE_UUID = uuid.UUID("a1b2c3d4-e5f6-7890-abcd-ef1234567890")
def det_uuid(namespace: str, *parts: str) -> str:
"""Deterministic UUID from namespace + parts (idempotent across re-runs)."""
return str(uuid.uuid5(NAMESPACE_UUID, f"{namespace}:{'|'.join(parts)}"))
def parse_referrer_domain(hostname: str) -> str | None:
"""'https://www.google.com' -> 'www.google.com'"""
if not hostname:
return None
h = hostname.strip()
for prefix in ("https://", "http://"):
if h.startswith(prefix):
h = h[len(prefix) :]
return h.rstrip("/") or None
def load_referrer_lookup(fathom_db: str, site_id: int) -> dict:
"""Load referrer_stats indexed by (ts, pathname)."""
conn = sqlite3.connect(fathom_db)
conn.row_factory = sqlite3.Row
rows = conn.execute(
"""
SELECT rs.ts, p.name AS pathname, h.name AS ref_hostname, rs.pageviews
FROM referrer_stats rs
JOIN hostnames h ON rs.hostname_id = h.id
JOIN pathnames p ON rs.pathname_id = p.id
WHERE rs.site_id = ?
""",
(site_id,),
).fetchall()
conn.close()
lookup: dict[tuple[str, str], list[tuple[str | None, int]]] = {}
for row in rows:
key = (row["ts"], row["pathname"])
lookup.setdefault(key, []).append(
(
parse_referrer_domain(row["ref_hostname"]),
row["pageviews"],
)
)
return lookup
def iter_page_stats(fathom_db: str, site_id: int):
"""Yield page_stats rows one at a time."""
conn = sqlite3.connect(fathom_db)
conn.row_factory = sqlite3.Row
cur = conn.execute(
"""
SELECT ps.ts, h.name AS hostname, p.name AS pathname,
ps.pageviews, ps.visitors
FROM page_stats ps
JOIN hostnames h ON ps.hostname_id = h.id
JOIN pathnames p ON ps.pathname_id = p.id
WHERE ps.site_id = ?
ORDER BY ps.ts
""",
(site_id,),
)
for row in cur:
yield dict(row)
conn.close()
def count_page_stats(fathom_db: str, site_id: int) -> tuple[int, int]:
"""Return (row_count, total_pageviews) for progress reporting."""
conn = sqlite3.connect(fathom_db)
cur = conn.execute(
"SELECT COUNT(*), COALESCE(SUM(pageviews), 0) FROM page_stats WHERE site_id = ?",
(site_id,),
)
row_count, total_pv = cur.fetchone()
conn.close()
return row_count, total_pv
def flush_batch(cur, session_batch: list[tuple], event_batch: list[tuple]):
"""Insert a batch of sessions and events."""
if session_batch:
psycopg2.extras.execute_values(
cur,
"""INSERT INTO session (session_id, website_id, created_at)
VALUES %s ON CONFLICT (session_id) DO NOTHING""",
session_batch,
template="(%s, %s, %s::timestamptz)",
)
if event_batch:
psycopg2.extras.execute_values(
cur,
"""INSERT INTO website_event
(event_id, website_id, session_id, visit_id, created_at,
url_path, referrer_domain, event_type, hostname)
VALUES %s ON CONFLICT (event_id) DO NOTHING""",
event_batch,
template="(%s, %s, %s, %s, %s::timestamptz + interval '1 second' * %s, %s, %s, %s, %s)",
)
def migrate(args):
rng = random.Random(42)
print(f"Counting rows in {args.fathom_db} (site_id={args.fathom_site_id})...")
row_count, total_pv = count_page_stats(args.fathom_db, args.fathom_site_id)
print(f" {row_count:,} hourly rows, {total_pv:,} total pageviews to expand")
print("Loading referrer_stats lookup...")
ref_lookup = load_referrer_lookup(args.fathom_db, args.fathom_site_id)
print(f" {len(ref_lookup):,} referrer entries loaded")
pg_conn = None
pg_cur = None
if not args.dry_run:
pg_conn = psycopg2.connect(args.umami_dsn)
pg_cur = pg_conn.cursor()
seen_sessions: set[str] = set()
session_batch: list[tuple] = []
event_batch: list[tuple] = []
events_written = 0
sessions_written = 0
rows_processed = 0
print("Streaming page_stats -> Umami events...")
for row in iter_page_stats(args.fathom_db, args.fathom_site_id):
ts = row["ts"]
pathname = row["pathname"]
hostname = parse_referrer_domain(row["hostname"])
pageviews = row["pageviews"]
visitors = row["visitors"]
rows_processed += 1
if pageviews == 0:
continue
# Build referrer pool for this (ts, pathname)
ref_pool = []
for domain, count in ref_lookup.get((ts, pathname), []):
ref_pool.extend([domain] * count)
num_visitors = max(visitors, 1)
visitor_sids = []
for v in range(num_visitors):
sid = det_uuid("session", args.umami_website_id, ts, pathname, str(v))
visitor_sids.append(sid)
if sid not in seen_sessions:
seen_sessions.add(sid)
session_batch.append((sid, args.umami_website_id, f"{ts}+00:00"))
sessions_written += 1
created_at = f"{ts}+00:00"
for pv in range(pageviews):
sid = visitor_sids[pv % num_visitors]
vid = det_uuid("visit", sid, ts, pathname, str(pv))
eid = det_uuid("event", args.umami_website_id, ts, pathname, str(pv))
ref_domain = ref_pool[pv % len(ref_pool)] if ref_pool else None
offset_s = rng.randint(0, 3599)
event_batch.append(
(
eid,
args.umami_website_id,
sid,
vid,
created_at,
offset_s,
pathname,
ref_domain,
1,
hostname,
)
)
events_written += 1
# Flush when batch is full
if len(event_batch) >= args.batch_size:
if not args.dry_run:
flush_batch(pg_cur, session_batch, event_batch)
pg_conn.commit()
session_batch.clear()
event_batch.clear()
if events_written % 50_000 < args.batch_size:
print(
f" {events_written:,} events / {sessions_written:,} sessions "
f"({rows_processed:,}/{row_count:,} rows)"
)
# Final flush
if session_batch or event_batch:
if not args.dry_run:
flush_batch(pg_cur, session_batch, event_batch)
pg_conn.commit()
if pg_conn:
pg_cur.close()
pg_conn.close()
print(
f"\nDone! {events_written:,} events, {sessions_written:,} sessions "
f"{'(dry-run)' if args.dry_run else 'written'}."
)
def main():
parser = argparse.ArgumentParser(
description="Migrate Fathom Lite stats to Umami v3"
)
parser.add_argument(
"--fathom-db", required=True, help="Path to Fathom SQLite database"
)
parser.add_argument(
"--fathom-site-id", type=int, default=1, help="Fathom site ID to migrate"
)
parser.add_argument("--umami-website-id", required=True, help="Umami website UUID")
parser.add_argument("--umami-dsn", required=True, help="PostgreSQL DSN")
parser.add_argument(
"--batch-size", type=int, default=2000, help="Insert batch size"
)
parser.add_argument(
"--dry-run", action="store_true", help="Don't write to database"
)
args = parser.parse_args()
migrate(args)
if __name__ == "__main__":
main()
Takeaways
- Dry-run first when generating 666K INSERTs against a production database.
- Stream, don’t buffer. Hundreds of thousands of Python dicts will OOM small containers.
- Deterministic UUIDs = safe re-runs. Crashes mid-migration are a non-issue.
- Aggregates can be expanded. You lose per-session browsing paths, but pageview/visitor/referrer totals survive the round-trip. Good enough for historical trends.