Takes blueprint PDFs and images, extracts raster data, optionally geo-references it, and generates tile pyramids (DeepZoom, XYZ, Google Maps) suitable for web-based viewers. Inspired by libvips, built from scratch.
A pure-Rust image pyramiding library is outperforming the C heavyweight it was designed to replace. In head-to-head benchmarks on a 47-megapixel raster — the kind of image AEC firms tile every day from scanned blueprint PDFs — libviprs generates tiles 1.8× to 2.4× faster than libvips while using up to 10.7× less memory. Those aren't typos. And this isn't a rigged comparison: both libraries receive the same raw pixel buffer in process memory, both produce DeepZoom tile pyramids with identical parameters, and both are measured with the same clock. The difference is architectural.
Read the full benchmark →Features
- PDF extraction — extract embedded raster images from scanned blueprint PDFs via lopdf (pure Rust, no C dependencies)
- PDF rendering — render vector PDFs (AutoCAD exports, text, paths) via PDFium, with optional memory-budgeted rendering
- Image decoding — JPEG, PNG, TIFF via the
imagecrate - Tile pyramid generation — multi-threaded engine with backpressure, configurable tile size and overlap; optional streaming mode processes images in horizontal strips for bounded peak memory
- True source-side PDF streaming —
PdfiumStripSourcerenders one strip of a vector PDF at a time via PDFium's clipped matrix path, including correct/Rotatehandling, so multi-gigapixel pages stay inside the configured memory budget - Memory-budget policy —
StreamingConfig.budget_policyselectsBudgetPolicy::Error(fail loudly when a strip cannot fit) orBudgetPolicy::AutoAdjust(raise the budget to the smallest workable strip); the engine also pre-flights the budget against canvas geometry before any work starts and returns a typedBudgetExceedederror - Layout formats — DeepZoom (
.dzi+ directory tree), XYZ (z/x/y), and Google Maps (z/y/x, power-of-2 grids) - Centre support — centre image within the tile grid with even background padding on all sides
- Tile encoding — PNG, JPEG (configurable quality), or raw pixel output
- Blank tile optimization — emit full tiles or write 1-byte placeholders for uniform-color regions
- Edge tile background — configurable background color for padding partial tiles at image edges
- Geo-referencing — affine transform mapping pixel coordinates to geographic coordinates, GCP support
- Observability — progress events, per-level callbacks, peak memory tracking
- Object-storage sinks — write tiles directly to S3-compatible buckets via
s3://URIs; enabled with--features s3 - Manifest v1 + versioning —
ManifestV1/ManifestBuilderemit a versioned JSON manifest alongside the pyramid; unknown fields are ignored for forward compatibility - Blank-tile tolerance —
BlankTileStrategy::PlaceholderWithTolerancecollapses near-uniform tiles within a configurable per-channel delta, reducing output for slightly noisy scans - Tracing spans — structured spans (
libviprs::pipeline,libviprs::level,libviprs::tile) for OpenTelemetry-compatible collectors; enabled with--features tracing - Resumable jobs —
EngineBuilder::with_resume(ResumePolicy)writes a checkpoint file and supportsOverwrite,Resume, andVerifymodes; interrupted runs restart from the last completed tile - Retry + failure policies — per-tile retry with configurable back-off;
FailurePolicy::RetryThenSkipmarks a tile as missing rather than aborting the entire job - Checksums — tile-level Blake3 or SHA-256 digests written into or verified from the manifest; controlled by
ChecksumModeandChecksumAlgo - Deduplication — identical tiles are stored once and referenced via symlink, hardlink, or manifest-only pointer;
DedupeStrategy::Blankstargets only placeholder tiles;DedupeStrategy::Alldedupes every tile - Packfile archive sinks — write the whole pyramid into a
.tar,.tar.gz, or.ziparchive; enabled with--features packfile - Extended EngineResult metrics —
bytes_read,bytes_written,retry_count,queue_pressure_peak,duration,stage_durationsfor observability pipelines
Quick Start
Try it interactively: the CLI & API generator lets you tick flags and copy a tailored version of this snippet — see the pyramid base setup and toggle features in the generator panel.
use libviprs::{
extract_page_image,
BlankTileStrategy,
EngineBuilder,
EngineKind,
FsSink,
Layout,
PyramidPlanner,
TileFormat,
};
use std::path::Path;
// ──────────────────────────────────────────────────────────────────────
// 1. Decode the source.
// extract_page_image pulls the embedded raster out of a scanned PDF;
// use libviprs::decode_file for plain image inputs (PNG/JPEG/TIFF).
// ──────────────────────────────────────────────────────────────────────
let raster = extract_page_image(
Path::new("blueprint.pdf"), // input path (PDF here; any image works too)
1, // 1-based PDF page number
).unwrap();
// ──────────────────────────────────────────────────────────────────────
// 2. Plan the pyramid.
// PyramidPlanner computes per-level dimensions, tile counts, and
// canvas size — no pixels are touched yet.
// ──────────────────────────────────────────────────────────────────────
let planner = PyramidPlanner::new(
raster.width(), // source width in pixels
raster.height(), // source height in pixels
256, // tile size (DeepZoom default; 512 for HiDPI)
0, // pixel overlap between adjacent tiles
Layout::DeepZoom, // tile naming convention (also: Xyz, Google)
).unwrap();
let plan = planner.plan();
// ──────────────────────────────────────────────────────────────────────
// 3. Configure where the tiles get written.
// FsSink writes to a local directory; libviprs also ships
// ObjectStoreSink (S3) and PackfileSink (.tar/.tar.gz/.zip).
// ──────────────────────────────────────────────────────────────────────
let sink = FsSink::new(
"output_tiles", // output directory (created if missing)
plan.clone(), // pyramid plan tile paths are derived from
)
.with_format(TileFormat::Png); // also: TileFormat::Jpeg { quality: u8 } | Raw
// ──────────────────────────────────────────────────────────────────────
// 4. Run the engine.
// EngineKind::Auto picks monolithic / streaming / mapreduce based on
// the source kind and any with_memory_budget value supplied.
// ──────────────────────────────────────────────────────────────────────
let result = EngineBuilder::new(
&raster, // source raster from step 1
plan, // pyramid plan from step 2
sink, // tile sink from step 3
)
.with_engine(EngineKind::Auto) // auto-select engine
.with_concurrency(4) // worker threads for tile extraction
.with_blank_strategy(BlankTileStrategy::Placeholder) // collapse uniform tiles into 1-byte placeholders
.run()
.unwrap();
println!(
"{} tiles across {} levels ({} blank tiles skipped)",
result.tiles_produced, // total tile files written
result.levels_processed, // number of pyramid levels
result.tiles_skipped, // tiles emitted as blank placeholders
);
Modules
| Module | Description |
|---|---|
source | Image decoding (JPEG, PNG, TIFF) into canonical Raster |
pdf | PDF parsing (lopdf) and optional rendering (PDFium) |
raster | Pixel buffer, region views, format normalization |
pixel | Pixel format definitions (Gray8, RGB8, RGBA8, 16-bit) |
planner | Tile math, level computation, layout generation |
resize | Downscaling for pyramid levels |
engine | Multi-threaded tile extraction with backpressure |
streaming | Strip-based streaming engine for memory-bounded pyramid generation |
streaming_mapreduce | Parallel MapReduce engine — concurrent strip rendering within a memory budget |
sink | Tile output (filesystem, memory, slow sink for testing) |
geo | Affine geo-transform, GCP solving, bounding box |
observe | Progress events, memory tracking |
Streaming Engine
Large raster images — scanned blueprints at 300 DPI, aerial surveys, architectural sheets — can easily exceed available RAM when the monolithic engine materialises the full canvas. The streaming engine solves this by processing the pyramid in horizontal strips, reducing peak memory by orders of magnitude while producing pixel-exact output.
Unlike libvips, which implements a fully lazy demand-driven pipeline where each pixel is computed on demand through a complex DAG of operations, libviprs takes a pragmatic middle ground: strips are processed eagerly within each band, but the full canvas is never materialised. This keeps the pipeline architecture simple and auditable while delivering the memory savings that matter for AEC-scale images.
| libvips | libviprs (monolithic) | libviprs (streaming) | |
|---|---|---|---|
| Peak memory | O(tile_size²) | O(canvas²) | O(canvas_w × strip_h) |
| 16820×11888 Google+centre | ~1 MB | ~5.1 GB | ~50 MB |
| Evaluation model | Fully lazy (per-region) | Fully eager | Semi-lazy (per-strip) |
| Downscale | On-demand per-region | Full-level passes | Strip passes (same box filter) |
| Trade-off | Complex pipeline plumbing | Simple but memory-hungry | Middle ground |
The caller sets a memory budget; the engine maximises strip height within that budget to balance memory and throughput. When the budget is large enough for the full canvas, the monolithic engine is used automatically — no code changes needed.
For vector PDFs, PdfiumStripSource renders one strip at a time directly from the PDF using PDFium's clipped matrix path (with full /Rotate support), so the full page bitmap is never materialised. BudgetPolicy::Error fails loudly if the chosen budget cannot fit the smallest workable strip; BudgetPolicy::AutoAdjust raises the budget and continues. The engine also pre-flights the budget against canvas geometry before allocating, surfacing a typed BudgetExceeded error rather than running until OOM.
Phase 3 Hardening
Phase 3 adds production-hardening features: durable storage sinks, resumable jobs, retry policies, checksums, deduplication, structured tracing, and versioned manifests. All features are opt-in via Cargo feature flags.
Object-storage sinks (--features s3)
Write tiles directly to an S3-compatible bucket via ObjectStoreSink. The sink runs against any caller-provided ObjectStore backend.
use libviprs::{EngineBuilder, ObjectStoreSink, ObjectStoreConfig, TileFormat};
let sink = ObjectStoreSink::new(store, ObjectStoreConfig::default(), plan.clone())
.with_format(TileFormat::Png);
EngineBuilder::new(&raster, plan, sink).run().unwrap();
Manifest v1 + versioning
ManifestBuilder produces a ManifestV1 JSON sidecar next to the pyramid root. The manifest carries a schema_version field; readers ignore unknown keys, so older consumers continue working as new fields are added. Attach the builder to an FsSink via with_manifest.
use libviprs::{FsSink, ManifestBuilder, ChecksumAlgo, TileFormat};
let manifest = ManifestBuilder::new(&plan)
.checksum_algo(ChecksumAlgo::Blake3);
let sink = FsSink::new("output_tiles", plan.clone())
.with_format(TileFormat::Png)
.with_manifest(manifest);
// serialises to: {"schema_version":1, "tiles": [...], ...}
Blank-tile tolerance
Use BlankTileStrategy::PlaceholderWithTolerance to treat tiles whose every channel varies by at most max_channel_delta as blank. Useful for slightly noisy scans where pure-uniform detection misses near-white regions.
use libviprs::{BlankTileStrategy, EngineBuilder};
let result = EngineBuilder::new(&raster, plan, sink)
.with_blank_strategy(BlankTileStrategy::PlaceholderWithTolerance {
max_channel_delta: 2,
})
.run()
.unwrap();
CLI equivalent: --blank-tolerance 2
Tracing spans (--features tracing)
When the tracing feature is enabled, the engine emits structured spans compatible with any tracing-subscriber, including OpenTelemetry exporters. Span names: libviprs::pipeline, libviprs::level, libviprs::tile.
# Cargo.toml
libviprs = { version = "0.3", features = ["tracing"] }
CLI: --trace-level info (values: error, warn, info, debug, trace).
Resumable jobs
EngineBuilder::with_resume(ResumePolicy) writes a checkpoint file (.libviprs-job.json) at the output root after each tile. A re-run with ResumePolicy::resume() skips already-written tiles; ResumePolicy::verify() re-reads them and asserts their checksums.
use libviprs::{EngineBuilder, ResumePolicy};
EngineBuilder::new(&raster, plan, sink)
.with_resume(ResumePolicy::resume())
.run()
.unwrap();
CLI: --resume / --overwrite / --verify
Retry + failure policies
Tile writes that fail transiently (network blip, lock contention) can be retried automatically. FailurePolicy::RetryThenSkip records the missing tile in the manifest instead of aborting the job.
use libviprs::{EngineBuilder, FailurePolicy, RetryPolicy};
let result = EngineBuilder::new(&raster, plan, sink)
.with_failure_policy(FailurePolicy::RetryThenFail)
.with_retry(RetryPolicy { max_attempts: 3, backoff_ms: 200 })
.run()
.unwrap();
CLI: --retry-max 3 --retry-backoff 200 --failure-policy retry-then-fail
Checksums
The engine can compute a digest for every tile and embed it in the manifest (EmitOnly) or verify existing digests on re-run (Verify).
use libviprs::{ChecksumAlgo, ChecksumMode, FsSink, TileFormat};
let sink = FsSink::new("output_tiles", plan.clone())
.with_format(TileFormat::Png)
.with_checksums(ChecksumAlgo::Blake3)
.with_checksum_mode(ChecksumMode::EmitOnly);
CLI: --manifest-emit-checksums --checksum-algo blake3
Deduplication
Identical tile content is stored once. The engine tries symlink first, then hardlink, then falls back to a manifest-only reference (for sinks that do not support links, such as S3).
use libviprs::{ChecksumAlgo, DedupeStrategy, EngineBuilder};
// Dedupe only blank placeholder tiles (fast, low overhead)
EngineBuilder::new(&raster, plan.clone(), sink)
.with_dedupe(DedupeStrategy::Blanks)
.run()
.unwrap();
// Dedupe every tile using Blake3 content hashes
EngineBuilder::new(&raster, plan, sink)
.with_dedupe(DedupeStrategy::All { algo: ChecksumAlgo::Blake3 })
.run()
.unwrap();
CLI: --dedupe-blanks / --dedupe-all
Packfile archive sinks (--features packfile)
Archive the entire pyramid into a single file instead of a directory tree. Useful for cold storage, artifact upload, or reproducible builds.
# .tar, .tar.gz, and .zip are all supported
viprs pyramid blueprint.pdf --sink packfile://output.tar.gz
Extended EngineResult metrics
The EngineResult struct now carries detailed I/O and timing counters for integration with monitoring systems.
let result = EngineBuilder::new(&raster, plan, sink).run().unwrap();
println!(
"read {} bytes, wrote {} bytes, {} retries, peak queue {}, {:?}",
result.bytes_read, result.bytes_written,
result.retry_count, result.queue_pressure_peak,
result.duration,
);
Feature Flags
| Feature | Default | Description |
|---|---|---|
default | none | No features enabled by default; core engine, filesystem sink, DeepZoom/XYZ/Google layouts, PNG/JPEG/raw encoding always available |
pdfium | no | Vector PDF rendering via a dynamically-linked libpdfium |
pdfium-static | no | Same as pdfium but links statically (larger binary, no runtime dep) |
s3 | no | ObjectStoreSink against a caller-injected ObjectStore |
tracing | no | Structured spans via the tracing crate for OpenTelemetry and other subscribers |
packfile | no | Archive sink: write the pyramid into a .tar, .tar.gz, or .zip file |
Enable multiple features in Cargo.toml:
libviprs = { version = "0.3", features = ["s3", "tracing", "packfile"] }
Requirements
- Rust 1.85+ (edition 2024)
- libpdfium shared library or static archive (only if using the
pdfium/pdfium-staticfeature) — prebuilt in libviprs-dep releases
Native Dependencies
PDFium is built from source and published as GitHub Releases under libviprs/libviprs-dep. Every release tag ships four archives covering the full matrix of {linux, musl} × {amd64, arm64}, and each archive contains both the shared library (libpdfium.so) and the static archive (libpdfium.a):
| Archive | libc | Use when |
|---|---|---|
pdfium-linux-x64.tgz | glibc | Debian, Ubuntu, RHEL, mainstream distros on x86_64 |
pdfium-linux-arm64.tgz | glibc | Debian, Ubuntu, … on aarch64 |
pdfium-musl-x64.tgz | musl | Alpine / musl-based distroless on x86_64 |
pdfium-musl-arm64.tgz | musl | Alpine / musl-based distroless on aarch64 |
Loading a glibc .so from a musl process — or vice versa — fails at dlopen time, so match the archive libc to the runtime you deploy against. Fully-static musl binaries that cannot dlopen should link libpdfium.a via pdfium-render/static.
The build tooling is documented in full at libviprs-dep/MANUAL.md (man-page-style reference: CLI, options, artifact layout, environment, exit statuses, troubleshooting). A pipeline-level overview lives at libviprs-dep/pdfium/README.md.
Related Crates
| Crate / Repo | Description |
|---|---|
libviprs-cli | Command-line interface (viprs binary) — CLI reference |
libviprs-tests | Integration tests and fixtures |
libviprs-dep | Prebuilt native dependencies (PDFium today) — build manual · releases |