3 min left
687 words
3 minutes

How to Merge PGN Files in F#: Streaming, Performance, and Discriminated Unions

I needed to merge hundreds of Lichess PGN files into a single database for opening preparation. The existing tools either required Python dependencies, discarded game metadata, or merged games into variation trees rather than concatenating them cleanly. I wanted a standalone CLI that takes a folder of .pgn files and produces one merge.pgn — fast, memory-efficient, and correct.

The result is PGN Merger, a .NET CLI tool built in F# that merges chess PGN files with streaming I/O and ~64 KB memory usage.

Why F# for a CLI Tool#

F# is a functional-first language on .NET. Three features make it a strong fit for file-processing CLIs.

Discriminated unions model success and failure at the type level:

type FileResult =
| Success of string * int64
| Failure of string * string

The compiler forces you to handle both cases. No nullable returns, no forgotten error checks.

Pattern matching makes argument parsing readable and exhaustive:

match args.[i] with
| "--output" when i + 1 < args.Length ->
loop (i + 2) { opts with OutputPath = args.[i + 1] }
| "--recursive" ->
loop (i + 1) { opts with Recursive = true }
| "--help" | "-h" | "-?" ->
None
| arg when arg.StartsWith("--") ->
printfn "Error: Unknown option '%s'" arg
None

First-class .NET interop means System.IO, System.Diagnostics.Stopwatch, and high-performance streams are available directly. No FFI overhead. You write F#, you get native .NET performance.

Streaming I/O: Merging PGN Files Without Loading Them into Memory#

The naive merge approach loads entire files into memory before writing. For a folder with 10 GB of PGN files, that’s 10 GB in RAM before a single byte hits disk.

PGN Merger never loads an entire file into memory. It streams through each file with 64 KB buffers:

use inputStream = new FileStream(pgnFile, FileMode.Open, FileAccess.Read, FileShare.Read, 65536)
use reader = new StreamReader(inputStream, Encoding.UTF8, true, 65536)

The copy loop operates on a pre-allocated char array, allocated once per file:

let streamFileToWriter (reader: StreamReader) (writer: StreamWriter) bufferSize =
let buffer = Array.zeroCreate<char> bufferSize
let mutable charsRead = 0
while (charsRead <- reader.Read(buffer, 0, buffer.Length); charsRead > 0) do
writer.Write(buffer, 0, charsRead)

No strings created during streaming. Memory footprint is ~64 KB per file regardless of input size. You could merge a 50 GB PGN archive on a machine with 512 MB of RAM.

Both input and output streams use 64 KB buffers, which means writes are batched, the OS receives larger sequential requests, and the read-ahead cache can prefetch the next block. The access pattern is purely sequential: front-to-back reads, append-only writes. On a modern NVMe SSD, throughput is hundreds of MB/s, bottlenecked by the storage device, not the code.

Safety comes from F#‘s use bindings (deterministic Dispose() even on exceptions) and exhaustive exception handling per file. The tool returns non-zero exit codes on failure, making it script-friendly.

Discriminated Unions vs Exceptions for Error Handling#

Most CLI tools use try-catch blocks and return -1 on any error. PGN Merger models every outcome explicitly:

type FileResult =
| Success of string * int64 // filename, bytes written
| Failure of string * string // filename, error message

Each file processed produces a FileResult. The merge loop collects these and reports partial success (exit code 2) when some files fail but others succeed. This is critical when you’re merging 10,000 files — one corrupted file shouldn’t kill the entire batch.

The compiler enforces exhaustiveness. If you add a new case to FileResult, every match expression breaks at compile time until you handle it. No silent failures.

Install and Try PGN Merger#

Terminal window
# Install from NuGet (requires .NET 10.0 SDK or later)
dotnet tool install --global PgnMerger
# Merge all PGN files in a folder
pgn-merger ./your_pgn_folder
# With verbose output and custom filename
pgn-merger ./your_pgn_folder --verbose --output database.pgn

Browse the NuGet package page for version history, or check the project page for a feature overview, comparison table, and FAQ.


More chess tooling: I also built corentings/chess, a Go library for chess move generation, PGN encoding, and UCI interop. For tournament stories, read about my experience at the Karen Asrian Memorial in Armenia.

Related posts on performance and concurrency: If you enjoyed the streaming I/O patterns here, you might like my posts on worker pools in Go and parallel merge sort with goroutines.

How to Merge PGN Files in F#: Streaming, Performance, and Discriminated Unions
https://corentings.dev/blog/merge-pgn-files-fsharp/
Author
Corentin Giaufer Saubert
Published at
2026-06-05
License
CC BY-NC-SA 4.0
Share this post