How to Merge PGN Files in F#: Streaming, Performance, and Discriminated Unions
I needed to merge hundreds of Lichess PGN files into a single database for opening preparation. The existing tools either required Python dependencies, discarded game metadata, or merged games into variation trees rather than concatenating them cleanly. I wanted a standalone CLI that takes a folder of .pgn files and produces one merge.pgn — fast, memory-efficient, and correct.
The result is PGN Merger, a .NET CLI tool built in F# that merges chess PGN files with streaming I/O and ~64 KB memory usage.
Why F# for a CLI Tool
F# is a functional-first language on .NET. Three features make it a strong fit for file-processing CLIs.
Discriminated unions model success and failure at the type level:
type FileResult = | Success of string * int64 | Failure of string * stringThe compiler forces you to handle both cases. No nullable returns, no forgotten error checks.
Pattern matching makes argument parsing readable and exhaustive:
match args.[i] with| "--output" when i + 1 < args.Length -> loop (i + 2) { opts with OutputPath = args.[i + 1] }| "--recursive" -> loop (i + 1) { opts with Recursive = true }| "--help" | "-h" | "-?" -> None| arg when arg.StartsWith("--") -> printfn "Error: Unknown option '%s'" arg NoneFirst-class .NET interop means System.IO, System.Diagnostics.Stopwatch, and high-performance streams are available directly. No FFI overhead. You write F#, you get native .NET performance.
Streaming I/O: Merging PGN Files Without Loading Them into Memory
The naive merge approach loads entire files into memory before writing. For a folder with 10 GB of PGN files, that’s 10 GB in RAM before a single byte hits disk.
PGN Merger never loads an entire file into memory. It streams through each file with 64 KB buffers:
use inputStream = new FileStream(pgnFile, FileMode.Open, FileAccess.Read, FileShare.Read, 65536)use reader = new StreamReader(inputStream, Encoding.UTF8, true, 65536)The copy loop operates on a pre-allocated char array, allocated once per file:
let streamFileToWriter (reader: StreamReader) (writer: StreamWriter) bufferSize = let buffer = Array.zeroCreate<char> bufferSize let mutable charsRead = 0 while (charsRead <- reader.Read(buffer, 0, buffer.Length); charsRead > 0) do writer.Write(buffer, 0, charsRead)No strings created during streaming. Memory footprint is ~64 KB per file regardless of input size. You could merge a 50 GB PGN archive on a machine with 512 MB of RAM.
Both input and output streams use 64 KB buffers, which means writes are batched, the OS receives larger sequential requests, and the read-ahead cache can prefetch the next block. The access pattern is purely sequential: front-to-back reads, append-only writes. On a modern NVMe SSD, throughput is hundreds of MB/s, bottlenecked by the storage device, not the code.
Safety comes from F#‘s use bindings (deterministic Dispose() even on exceptions) and exhaustive exception handling per file. The tool returns non-zero exit codes on failure, making it script-friendly.
Discriminated Unions vs Exceptions for Error Handling
Most CLI tools use try-catch blocks and return -1 on any error. PGN Merger models every outcome explicitly:
type FileResult = | Success of string * int64 // filename, bytes written | Failure of string * string // filename, error messageEach file processed produces a FileResult. The merge loop collects these and reports partial success (exit code 2) when some files fail but others succeed. This is critical when you’re merging 10,000 files — one corrupted file shouldn’t kill the entire batch.
The compiler enforces exhaustiveness. If you add a new case to FileResult, every match expression breaks at compile time until you handle it. No silent failures.
Install and Try PGN Merger
# Install from NuGet (requires .NET 10.0 SDK or later)dotnet tool install --global PgnMerger
# Merge all PGN files in a folderpgn-merger ./your_pgn_folder
# With verbose output and custom filenamepgn-merger ./your_pgn_folder --verbose --output database.pgnBrowse the NuGet package page for version history, or check the project page for a feature overview, comparison table, and FAQ.
More chess tooling: I also built corentings/chess, a Go library for chess move generation, PGN encoding, and UCI interop. For tournament stories, read about my experience at the Karen Asrian Memorial in Armenia.
Related posts on performance and concurrency: If you enjoyed the streaming I/O patterns here, you might like my posts on worker pools in Go and parallel merge sort with goroutines.
Related Posts
Why I Built ZaString
On zero allocations, Span<T>, and the pursuit of performance without sacrifice.
TDD Isn't About Bugs — It's Your Permission to Refactor
Learn why test-driven development is really about permission to refactor, not catching bugs. With TypeScript examples, Result<T, E> patterns, and behavior-based testing from 3 years in production.
Generic Methods Coming to Go
Go just accepted the proposal for generic methods on concrete types. Here's what changes, what doesn't, and why it matters.