Migrating from Fathom Lite to Umami

For years I’ve been running Fathom Lite to track visitors on this website. With track I don’t mean in the creepy way, just some privacy respecting stats on which posts get the most traffic. But, Fathom Lite has been archived for years (I didn’t know). With a recent homelab overhaul (more on that in a later post) I replaced it with Umami. It’s MIT-licensed, self-hosted, and lightweight. The challenge: migrating six years of historical data from Fathom over to Umami.

Full disclosure: Claude Code helped me a lot writing, testing and running the script mentioned below. I use Claude extensively these days to manage almost all aspects of my homelab setup.

The problem

Fathom stores hourly aggregates in SQLite: “42 pageviews on /blog/post/ between 14:00-15:00”. Umami stores individual events: one website_event row per pageview, linked to a session. There’s no aggregate import.

My dataset: 277K hourly rows, 666K pageviews, five sites, dating back to March 2020.

The migration script expands each aggregate back into discrete event rows:

  1. Creates N website_event rows per hourly bucket, timestamps spread randomly within the hour
  2. Generates deterministic session UUIDs for the reported visitor count
  3. Distributes pageviews across sessions round-robin
  4. Attaches referrer domains from referrer_stats proportionally

The result is synthetic but re-aggregates correctly in Umami’s dashboard. Page paths, pageview counts, visitor counts, and referrers are preserved exactly. Browser/OS/device data is NULL because Fathom never tracked it.

Key design decisions

Deterministic UUIDs for idempotency

All IDs use uuid5 with a fixed namespace, so re-running produces identical rows. Combined with ON CONFLICT DO NOTHING, the script is safe to re-run after crashes.

NAMESPACE_UUID = uuid.UUID("a1b2c3d4-e5f6-7890-abcd-ef1234567890")

def det_uuid(namespace: str, *parts: str) -> str:
    return str(uuid.uuid5(NAMESPACE_UUID, f"{namespace}:{'|'.join(parts)}"))

Streaming batches to avoid OOM

The first version buffered all 666K events in memory and OOM-killed the 1GB LXC container than runs Umami. The fix: stream rows from SQLite, flush to PostgreSQL every 2,000 events. Peak memory dropped from ~500MB to ~50MB.

for row in iter_page_stats(fathom_db, site_id):
    # expand row into events, append to batch
    if len(event_batch) >= batch_size:
        flush_batch(pg_cur, session_batch, event_batch)
        pg_conn.commit()
        session_batch.clear()
        event_batch.clear()

Running it

Prerequisites: Python 3.11+, psycopg2, access to both the Fathom SQLite DB and Umami’s PostgreSQL.

# Grab the Fathom database (mine ran on docker)
scp root@<old-host>:/var/lib/docker/volumes/fathom/_data/fathom.db /tmp/fathom.db

# Explore what you have
sqlite3 /tmp/fathom.db "SELECT * FROM sites;"
sqlite3 /tmp/fathom.db "SELECT MIN(ts), MAX(ts), SUM(pageviews) FROM page_stats WHERE site_id = 1;"

Create a website in Umami’s UI and grab its UUID, then:

# Dry run -- reads data, reports what it would insert
python3 migrate-fathom-to-umami.py \
    --fathom-db /tmp/fathom.db \
    --fathom-site-id 1 \
    --umami-website-id <umami-uuid> \
    --umami-dsn "postgresql://umami:<password>@<host>:5432/umami" \
    --dry-run

# Run for real
python3 migrate-fathom-to-umami.py \
    --fathom-db /tmp/fathom.db \
    --fathom-site-id 1 \
    --umami-website-id <umami-uuid> \
    --umami-dsn "postgresql://umami:<password>@<host>:5432/umami"
Counting rows in /tmp/fathom.db (site_id=1)...
  277,894 hourly rows, 666,664 total pageviews to expand
Streaming page_stats -> Umami events...
  100,639 events / 90,669 sessions (15,231/277,894 rows)
  ...
Done! 666,664 events, 611,077 sessions written.

Next, verify in Umami by setting the date range to cover your historical data.

The full script

#!/usr/bin/env python3
"""
Migrate Fathom Lite (SQLite) page_stats and referrer_stats into Umami v3 (PostgreSQL).

Fathom stores hourly aggregated stats (pageviews, visitors per path per hour).
This script expands those aggregates back into individual website_event + session
rows in Umami's schema. Since Fathom didn't track browser/OS/device, those
fields are left NULL.

Streams data in batches to keep memory usage low (~50MB).

Usage:
    python3 scripts/migrate-fathom-to-umami.py \\
        --fathom-db /tmp/fathom.db \\
        --fathom-site-id 1 \\
        --umami-website-id 9bb62e2b-202b-4d4e-a135-965d95fa037b \\
        --umami-dsn "postgresql://umami:<password>@10.1.10.75:5432/umami" \\
        [--batch-size 2000] [--dry-run]
"""

import argparse
import random
import sqlite3
import uuid

import psycopg2
import psycopg2.extras


NAMESPACE_UUID = uuid.UUID("a1b2c3d4-e5f6-7890-abcd-ef1234567890")


def det_uuid(namespace: str, *parts: str) -> str:
    """Deterministic UUID from namespace + parts (idempotent across re-runs)."""
    return str(uuid.uuid5(NAMESPACE_UUID, f"{namespace}:{'|'.join(parts)}"))


def parse_referrer_domain(hostname: str) -> str | None:
    """'https://www.google.com' -> 'www.google.com'"""
    if not hostname:
        return None
    h = hostname.strip()
    for prefix in ("https://", "http://"):
        if h.startswith(prefix):
            h = h[len(prefix) :]
    return h.rstrip("/") or None


def load_referrer_lookup(fathom_db: str, site_id: int) -> dict:
    """Load referrer_stats indexed by (ts, pathname)."""
    conn = sqlite3.connect(fathom_db)
    conn.row_factory = sqlite3.Row
    rows = conn.execute(
        """
        SELECT rs.ts, p.name AS pathname, h.name AS ref_hostname, rs.pageviews
        FROM referrer_stats rs
        JOIN hostnames h ON rs.hostname_id = h.id
        JOIN pathnames p ON rs.pathname_id = p.id
        WHERE rs.site_id = ?
    """,
        (site_id,),
    ).fetchall()
    conn.close()

    lookup: dict[tuple[str, str], list[tuple[str | None, int]]] = {}
    for row in rows:
        key = (row["ts"], row["pathname"])
        lookup.setdefault(key, []).append(
            (
                parse_referrer_domain(row["ref_hostname"]),
                row["pageviews"],
            )
        )
    return lookup


def iter_page_stats(fathom_db: str, site_id: int):
    """Yield page_stats rows one at a time."""
    conn = sqlite3.connect(fathom_db)
    conn.row_factory = sqlite3.Row
    cur = conn.execute(
        """
        SELECT ps.ts, h.name AS hostname, p.name AS pathname,
               ps.pageviews, ps.visitors
        FROM page_stats ps
        JOIN hostnames h ON ps.hostname_id = h.id
        JOIN pathnames p ON ps.pathname_id = p.id
        WHERE ps.site_id = ?
        ORDER BY ps.ts
    """,
        (site_id,),
    )
    for row in cur:
        yield dict(row)
    conn.close()


def count_page_stats(fathom_db: str, site_id: int) -> tuple[int, int]:
    """Return (row_count, total_pageviews) for progress reporting."""
    conn = sqlite3.connect(fathom_db)
    cur = conn.execute(
        "SELECT COUNT(*), COALESCE(SUM(pageviews), 0) FROM page_stats WHERE site_id = ?",
        (site_id,),
    )
    row_count, total_pv = cur.fetchone()
    conn.close()
    return row_count, total_pv


def flush_batch(cur, session_batch: list[tuple], event_batch: list[tuple]):
    """Insert a batch of sessions and events."""
    if session_batch:
        psycopg2.extras.execute_values(
            cur,
            """INSERT INTO session (session_id, website_id, created_at)
               VALUES %s ON CONFLICT (session_id) DO NOTHING""",
            session_batch,
            template="(%s, %s, %s::timestamptz)",
        )
    if event_batch:
        psycopg2.extras.execute_values(
            cur,
            """INSERT INTO website_event
               (event_id, website_id, session_id, visit_id, created_at,
                url_path, referrer_domain, event_type, hostname)
               VALUES %s ON CONFLICT (event_id) DO NOTHING""",
            event_batch,
            template="(%s, %s, %s, %s, %s::timestamptz + interval '1 second' * %s, %s, %s, %s, %s)",
        )


def migrate(args):
    rng = random.Random(42)

    print(f"Counting rows in {args.fathom_db} (site_id={args.fathom_site_id})...")
    row_count, total_pv = count_page_stats(args.fathom_db, args.fathom_site_id)
    print(f"  {row_count:,} hourly rows, {total_pv:,} total pageviews to expand")

    print("Loading referrer_stats lookup...")
    ref_lookup = load_referrer_lookup(args.fathom_db, args.fathom_site_id)
    print(f"  {len(ref_lookup):,} referrer entries loaded")

    pg_conn = None
    pg_cur = None
    if not args.dry_run:
        pg_conn = psycopg2.connect(args.umami_dsn)
        pg_cur = pg_conn.cursor()

    seen_sessions: set[str] = set()
    session_batch: list[tuple] = []
    event_batch: list[tuple] = []
    events_written = 0
    sessions_written = 0
    rows_processed = 0

    print("Streaming page_stats -> Umami events...")
    for row in iter_page_stats(args.fathom_db, args.fathom_site_id):
        ts = row["ts"]
        pathname = row["pathname"]
        hostname = parse_referrer_domain(row["hostname"])
        pageviews = row["pageviews"]
        visitors = row["visitors"]
        rows_processed += 1

        if pageviews == 0:
            continue

        # Build referrer pool for this (ts, pathname)
        ref_pool = []
        for domain, count in ref_lookup.get((ts, pathname), []):
            ref_pool.extend([domain] * count)

        num_visitors = max(visitors, 1)
        visitor_sids = []
        for v in range(num_visitors):
            sid = det_uuid("session", args.umami_website_id, ts, pathname, str(v))
            visitor_sids.append(sid)
            if sid not in seen_sessions:
                seen_sessions.add(sid)
                session_batch.append((sid, args.umami_website_id, f"{ts}+00:00"))
                sessions_written += 1

        created_at = f"{ts}+00:00"
        for pv in range(pageviews):
            sid = visitor_sids[pv % num_visitors]
            vid = det_uuid("visit", sid, ts, pathname, str(pv))
            eid = det_uuid("event", args.umami_website_id, ts, pathname, str(pv))
            ref_domain = ref_pool[pv % len(ref_pool)] if ref_pool else None
            offset_s = rng.randint(0, 3599)

            event_batch.append(
                (
                    eid,
                    args.umami_website_id,
                    sid,
                    vid,
                    created_at,
                    offset_s,
                    pathname,
                    ref_domain,
                    1,
                    hostname,
                )
            )
            events_written += 1

        # Flush when batch is full
        if len(event_batch) >= args.batch_size:
            if not args.dry_run:
                flush_batch(pg_cur, session_batch, event_batch)
                pg_conn.commit()
            session_batch.clear()
            event_batch.clear()

            if events_written % 50_000 < args.batch_size:
                print(
                    f"  {events_written:,} events / {sessions_written:,} sessions "
                    f"({rows_processed:,}/{row_count:,} rows)"
                )

    # Final flush
    if session_batch or event_batch:
        if not args.dry_run:
            flush_batch(pg_cur, session_batch, event_batch)
            pg_conn.commit()

    if pg_conn:
        pg_cur.close()
        pg_conn.close()

    print(
        f"\nDone! {events_written:,} events, {sessions_written:,} sessions "
        f"{'(dry-run)' if args.dry_run else 'written'}."
    )


def main():
    parser = argparse.ArgumentParser(
        description="Migrate Fathom Lite stats to Umami v3"
    )
    parser.add_argument(
        "--fathom-db", required=True, help="Path to Fathom SQLite database"
    )
    parser.add_argument(
        "--fathom-site-id", type=int, default=1, help="Fathom site ID to migrate"
    )
    parser.add_argument("--umami-website-id", required=True, help="Umami website UUID")
    parser.add_argument("--umami-dsn", required=True, help="PostgreSQL DSN")
    parser.add_argument(
        "--batch-size", type=int, default=2000, help="Insert batch size"
    )
    parser.add_argument(
        "--dry-run", action="store_true", help="Don't write to database"
    )
    args = parser.parse_args()
    migrate(args)


if __name__ == "__main__":
    main()

Takeaways

  • Dry-run first when generating 666K INSERTs against a production database.
  • Stream, don’t buffer. Hundreds of thousands of Python dicts will OOM small containers.
  • Deterministic UUIDs = safe re-runs. Crashes mid-migration are a non-issue.
  • Aggregates can be expanded. You lose per-session browsing paths, but pageview/visitor/referrer totals survive the round-trip. Good enough for historical trends.