Migrating from Fathom Lite to Umami

April 1, 2026 ∙ 7 min read ∙ analytics migration umami fathom homelab

For years I’ve been running Fathom Lite to track visitors on this website. With track I don’t mean in the creepy way, just some privacy respecting stats on which posts get the most traffic. But, Fathom Lite has been archived for years (I didn’t know). With a recent homelab overhaul (more on that in a later post) I replaced it with Umami. It’s MIT-licensed, self-hosted, and lightweight. The challenge: migrating six years of historical data from Fathom over to Umami.

Full disclosure: Claude Code helped me a lot writing, testing and running the script mentioned below. I use Claude extensively these days to manage almost all aspects of my homelab setup.

The problem

Fathom stores hourly aggregates in SQLite: “42 pageviews on /blog/post/ between 14:00-15:00”. Umami stores individual events: one website_event row per pageview, linked to a session. There’s no aggregate import.

My dataset: 277K hourly rows, 666K pageviews, five sites, dating back to March 2020.

The migration script expands each aggregate back into discrete event rows:

Creates N website_event rows per hourly bucket, timestamps spread randomly within the hour
Generates deterministic session UUIDs for the reported visitor count
Distributes pageviews across sessions round-robin
Attaches referrer domains from referrer_stats proportionally

The result is synthetic but re-aggregates correctly in Umami’s dashboard. Page paths, pageview counts, visitor counts, and referrers are preserved exactly. Browser/OS/device data is NULL because Fathom never tracked it.

Key design decisions

Deterministic UUIDs for idempotency

All IDs use uuid5 with a fixed namespace, so re-running produces identical rows. Combined with ON CONFLICT DO NOTHING, the script is safe to re-run after crashes.

NAMESPACE_UUID = uuid.UUID("a1b2c3d4-e5f6-7890-abcd-ef1234567890")

def det_uuid(namespace: str, *parts: str) -> str:
    return str(uuid.uuid5(NAMESPACE_UUID, f"{namespace}:{'|'.join(parts)}"))

Streaming batches to avoid OOM

The first version buffered all 666K events in memory and OOM-killed the 1GB LXC container than runs Umami. The fix: stream rows from SQLite, flush to PostgreSQL every 2,000 events. Peak memory dropped from ~500MB to ~50MB.

for row in iter_page_stats(fathom_db, site_id):
    # expand row into events, append to batch
    if len(event_batch) >= batch_size:
        flush_batch(pg_cur, session_batch, event_batch)
        pg_conn.commit()
        session_batch.clear()
        event_batch.clear()

Running it

Prerequisites: Python 3.11+, psycopg2, access to both the Fathom SQLite DB and Umami’s PostgreSQL.

# Grab the Fathom database (mine ran on docker)
scp root@<old-host>:/var/lib/docker/volumes/fathom/_data/fathom.db /tmp/fathom.db

# Explore what you have
sqlite3 /tmp/fathom.db "SELECT * FROM sites;"
sqlite3 /tmp/fathom.db "SELECT MIN(ts), MAX(ts), SUM(pageviews) FROM page_stats WHERE site_id = 1;"

Create a website in Umami’s UI and grab its UUID, then:

# Dry run -- reads data, reports what it would insert
python3 migrate-fathom-to-umami.py \
    --fathom-db /tmp/fathom.db \
    --fathom-site-id 1 \
    --umami-website-id <umami-uuid> \
    --umami-dsn "postgresql://umami:<password>@<host>:5432/umami" \
    --dry-run

# Run for real
python3 migrate-fathom-to-umami.py \
    --fathom-db /tmp/fathom.db \
    --fathom-site-id 1 \
    --umami-website-id <umami-uuid> \
    --umami-dsn "postgresql://umami:<password>@<host>:5432/umami"

Counting rows in /tmp/fathom.db (site_id=1)...
  277,894 hourly rows, 666,664 total pageviews to expand
Streaming page_stats -> Umami events...
  100,639 events / 90,669 sessions (15,231/277,894 rows)
  ...
Done! 666,664 events, 611,077 sessions written.

Next, verify in Umami by setting the date range to cover your historical data.

The full script

#!/usr/bin/env python3
"""
Migrate Fathom Lite (SQLite) page_stats and referrer_stats into Umami v3 (PostgreSQL).

Fathom stores hourly aggregated stats (pageviews, visitors per path per hour).
This script expands those aggregates back into individual website_event + session
rows in Umami's schema. Since Fathom didn't track browser/OS/device, those
fields are left NULL.

Streams data in batches to keep memory usage low (~50MB).

Usage:
    python3 scripts/migrate-fathom-to-umami.py \\
        --fathom-db /tmp/fathom.db \\
        --fathom-site-id 1 \\
        --umami-website-id 9bb62e2b-202b-4d4e-a135-965d95fa037b \\
        --umami-dsn "postgresql://umami:<password>@10.1.10.75:5432/umami" \\
        [--batch-size 2000] [--dry-run]
"""

import argparse
import random
import sqlite3
import uuid

import psycopg2
import psycopg2.extras


NAMESPACE_UUID = uuid.UUID("a1b2c3d4-e5f6-7890-abcd-ef1234567890")


def det_uuid(namespace: str, *parts: str) -> str:
    """Deterministic UUID from namespace + parts (idempotent across re-runs)."""
    return str(uuid.uuid5(NAMESPACE_UUID, f"{namespace}:{'|'.join(parts)}"))


def parse_referrer_domain(hostname: str) -> str | None:
    """'https://www.google.com' -> 'www.google.com'"""
    if not hostname:
        return None
    h = hostname.strip()
    for prefix in ("https://", "http://"):
        if h.startswith(prefix):
            h = h[len(prefix) :]
    return h.rstrip("/") or None


def load_referrer_lookup(fathom_db: str, site_id: int) -> dict:
    """Load referrer_stats indexed by (ts, pathname)."""
    conn = sqlite3.connect(fathom_db)
    conn.row_factory = sqlite3.Row
    rows = conn.execute(
        """
        SELECT rs.ts, p.name AS pathname, h.name AS ref_hostname, rs.pageviews
        FROM referrer_stats rs
        JOIN hostnames h ON rs.hostname_id = h.id
        JOIN pathnames p ON rs.pathname_id = p.id
        WHERE rs.site_id = ?
    """,
        (site_id,),
    ).fetchall()
    conn.close()

    lookup: dict[tuple[str, str], list[tuple[str | None, int]]] = {}
    for row in rows:
        key = (row["ts"], row["pathname"])
        lookup.setdefault(key, []).append(
            (
                parse_referrer_domain(row["ref_hostname"]),
                row["pageviews"],
            )
        )
    return lookup


def iter_page_stats(fathom_db: str, site_id: int):
    """Yield page_stats rows one at a time."""
    conn = sqlite3.connect(fathom_db)
    conn.row_factory = sqlite3.Row
    cur = conn.execute(
        """
        SELECT ps.ts, h.name AS hostname, p.name AS pathname,
               ps.pageviews, ps.visitors
        FROM page_stats ps
        JOIN hostnames h ON ps.hostname_id = h.id
        JOIN pathnames p ON ps.pathname_id = p.id
        WHERE ps.site_id = ?
        ORDER BY ps.ts
    """,
        (site_id,),
    )
    for row in cur:
        yield dict(row)
    conn.close()


def count_page_stats(fathom_db: str, site_id: int) -> tuple[int, int]:
    """Return (row_count, total_pageviews) for progress reporting."""
    conn = sqlite3.connect(fathom_db)
    cur = conn.execute(
        "SELECT COUNT(*), COALESCE(SUM(pageviews), 0) FROM page_stats WHERE site_id = ?",
        (site_id,),
    )
    row_count, total_pv = cur.fetchone()
    conn.close()
    return row_count, total_pv


def flush_batch(cur, session_batch: list[tuple], event_batch: list[tuple]):
    """Insert a batch of sessions and events."""
    if session_batch:
        psycopg2.extras.execute_values(
            cur,
            """INSERT INTO session (session_id, website_id, created_at)
               VALUES %s ON CONFLICT (session_id) DO NOTHING""",
            session_batch,
            template="(%s, %s, %s::timestamptz)",
        )
    if event_batch:
        psycopg2.extras.execute_values(
            cur,
            """INSERT INTO website_event
               (event_id, website_id, session_id, visit_id, created_at,
                url_path, referrer_domain, event_type, hostname)
               VALUES %s ON CONFLICT (event_id) DO NOTHING""",
            event_batch,
            template="(%s, %s, %s, %s, %s::timestamptz + interval '1 second' * %s, %s, %s, %s, %s)",
        )


def migrate(args):
    rng = random.Random(42)

    print(f"Counting rows in {args.fathom_db} (site_id={args.fathom_site_id})...")
    row_count, total_pv = count_page_stats(args.fathom_db, args.fathom_site_id)
    print(f"  {row_count:,} hourly rows, {total_pv:,} total pageviews to expand")

    print("Loading referrer_stats lookup...")
    ref_lookup = load_referrer_lookup(args.fathom_db, args.fathom_site_id)
    print(f"  {len(ref_lookup):,} referrer entries loaded")

    pg_conn = None
    pg_cur = None
    if not args.dry_run:
        pg_conn = psycopg2.connect(args.umami_dsn)
        pg_cur = pg_conn.cursor()

    seen_sessions: set[str] = set()
    session_batch: list[tuple] = []
    event_batch: list[tuple] = []
    events_written = 0
    sessions_written = 0
    rows_processed = 0

    print("Streaming page_stats -> Umami events...")
    for row in iter_page_stats(args.fathom_db, args.fathom_site_id):
        ts = row["ts"]
        pathname = row["pathname"]
        hostname = parse_referrer_domain(row["hostname"])
        pageviews = row["pageviews"]
        visitors = row["visitors"]
        rows_processed += 1

        if pageviews == 0:
            continue

        # Build referrer pool for this (ts, pathname)
        ref_pool = []
        for domain, count in ref_lookup.get((ts, pathname), []):
            ref_pool.extend([domain] * count)

        num_visitors = max(visitors, 1)
        visitor_sids = []
        for v in range(num_visitors):
            sid = det_uuid("session", args.umami_website_id, ts, pathname, str(v))
            visitor_sids.append(sid)
            if sid not in seen_sessions:
                seen_sessions.add(sid)
                session_batch.append((sid, args.umami_website_id, f"{ts}+00:00"))
                sessions_written += 1

        created_at = f"{ts}+00:00"
        for pv in range(pageviews):
            sid = visitor_sids[pv % num_visitors]
            vid = det_uuid("visit", sid, ts, pathname, str(pv))
            eid = det_uuid("event", args.umami_website_id, ts, pathname, str(pv))
            ref_domain = ref_pool[pv % len(ref_pool)] if ref_pool else None
            offset_s = rng.randint(0, 3599)

            event_batch.append(
                (
                    eid,
                    args.umami_website_id,
                    sid,
                    vid,
                    created_at,
                    offset_s,
                    pathname,
                    ref_domain,
                    1,
                    hostname,
                )
            )
            events_written += 1

        # Flush when batch is full
        if len(event_batch) >= args.batch_size:
            if not args.dry_run:
                flush_batch(pg_cur, session_batch, event_batch)
                pg_conn.commit()
            session_batch.clear()
            event_batch.clear()

            if events_written % 50_000 < args.batch_size:
                print(
                    f"  {events_written:,} events / {sessions_written:,} sessions "
                    f"({rows_processed:,}/{row_count:,} rows)"
                )

    # Final flush
    if session_batch or event_batch:
        if not args.dry_run:
            flush_batch(pg_cur, session_batch, event_batch)
            pg_conn.commit()

    if pg_conn:
        pg_cur.close()
        pg_conn.close()

    print(
        f"\nDone! {events_written:,} events, {sessions_written:,} sessions "
        f"{'(dry-run)' if args.dry_run else 'written'}."
    )


def main():
    parser = argparse.ArgumentParser(
        description="Migrate Fathom Lite stats to Umami v3"
    )
    parser.add_argument(
        "--fathom-db", required=True, help="Path to Fathom SQLite database"
    )
    parser.add_argument(
        "--fathom-site-id", type=int, default=1, help="Fathom site ID to migrate"
    )
    parser.add_argument("--umami-website-id", required=True, help="Umami website UUID")
    parser.add_argument("--umami-dsn", required=True, help="PostgreSQL DSN")
    parser.add_argument(
        "--batch-size", type=int, default=2000, help="Insert batch size"
    )
    parser.add_argument(
        "--dry-run", action="store_true", help="Don't write to database"
    )
    args = parser.parse_args()
    migrate(args)


if __name__ == "__main__":
    main()

Takeaways

Dry-run first when generating 666K INSERTs against a production database.
Stream, don’t buffer. Hundreds of thousands of Python dicts will OOM small containers.
Deterministic UUIDs = safe re-runs. Crashes mid-migration are a non-issue.
Aggregates can be expanded. You lose per-session browsing paths, but pageview/visitor/referrer totals survive the round-trip. Good enough for historical trends.

Migrating from Fathom Lite to Umami

The problem

Key design decisions

Deterministic UUIDs for idempotency

Streaming batches to avoid OOM

Running it

The full script

Takeaways

Related Posts

Downsizing a Proxmox VE Cluster: From Three Nodes to Two Stand‑Alone Hosts

Proxmox: Replace a failed bootable ZFS disk in rpool

Peel before use: mistakes were made