Performance Benchmarks

How GetWebP CLI performs under various workloads, and why.

See also: README | Commands | Getting Started

Current Publication Snapshot #

As of 2026-04-23, the maintained benchmark publication snapshot is:

Metric	Value
Datasets	13
Source files	80
Result rows	234
Run count	5
Publication status	External-ready
Snapshot class	Main filewise baseline

Current directional leaders in that publication snapshot:

Format	Fastest	Smallest	Highest quality	Lowest memory
AVIF	`avifenc`	`squoosh-cli`	`sharp-cli`	`avifenc`
WebP	`cwebp`	`getwebp`	`getwebp`	`cwebp`

Important limits:

This is a directional benchmark with mean aggregates, not a significance-tested academic result.
Leaders should be read as current leaders within this publication snapshot, not universal winners.
Batch-hot methodology remains documented separately and is not the default public leaderboard.

Architecture Overview #

Understanding GetWebP's performance characteristics requires understanding its processing pipeline.

WASM Codec Pipeline #

GetWebP uses jSquash WASM codecs compiled from the same C/C++ libraries that power Google's Squoosh:

Codec	Origin	Purpose
`@jsquash/jpeg`	MozJPEG	JPEG decoding
`@jsquash/png`	Squoosh PNG (Rust via wasm-bindgen)	PNG decoding
`@jsquash/webp`	libwebp	WebP decoding and encoding
`bmp-js`	Pure JavaScript	BMP decoding

Each image goes through a three-stage pipeline:

Read file (I/O) --> Decode to RGBA (WASM) --> Encode to WebP (WASM) --> Write file (I/O)

Key implication: The WASM codecs run in the main thread (not native C). This means:

Decode + encode is CPU-bound and runs at roughly 60--80% of native cwebp speed for equivalent operations.
WASM initialization is a one-time cost. Four WASM modules (PNG decoder, JPEG decoder, WebP decoder, WebP encoder) are compiled from embedded binary blobs on first use, then cached for the session. This adds ~100--300ms to the first invocation.
Memory overhead per image equals the raw RGBA bitmap (width x height x 4 bytes) plus the encoded WebP buffer. A 4000x3000 photo requires ~48 MB of transient memory.

Concurrency Model #

GetWebP uses async concurrency with p-limit, not OS-level threads or worker threads.

                    +--> [decode + encode file 1] --+
Main thread ------+--> [decode + encode file 2] --+--> results
   (event loop)    +--> [decode + encode file N] --+

Pro plan defaults to os.cpus().length - 1 concurrent tasks (capped at 32). Configurable via --concurrency.
Free plan is forced serial (1 task at a time) with a 3-second delay between files.

Because WASM execution blocks the JavaScript event loop during decode/encode, true parallelism is limited. However, I/O operations (file reads and writes) overlap with CPU work in other tasks, yielding measurable throughput gains up to approximately the number of CPU cores.

Note: A worker_threads implementation exists in the codebase but is not active. The jSquash WASM modules do not currently initialize reliably inside Node.js worker threads. If this is resolved upstream, thread-level parallelism would unlock near-linear scaling.

Benchmark Methodology #

Test Environment #

Benchmark date: 2026-04-05. Results vary by hardware, OS, disk speed, and image content.

Parameter	Value
CPU	Apple Silicon (arm64)
OS	macOS (Darwin 25.3.0)
Runtime	Bun-compiled binary
Quality	`-q 80` unless noted
Runs	5 per test, median reported

Test Dataset #

Category	Count	Avg Size	Resolution Range
Small JPEG	50	~200 KB	800x600 -- 1200x900
Large JPEG	50	~3 MB	4000x3000 -- 6000x4000
PNG (photos)	50	~5 MB	3000x2000 -- 4000x3000
PNG (graphics/screenshots)	50	~800 KB	1920x1080
BMP	20	~10 MB	3000x2000
WebP (re-encode)	20	~400 KB	2000x1500

Measurement Method #

# Single file
time getwebp photo.jpg -o /tmp/out
 
# Batch (wall-clock time)
time getwebp ./dataset -o /tmp/out --concurrency 4

All times are wall-clock. Each test is run 3 times; the median is reported.

Single-File Benchmarks #

Time to convert a single image at default quality (-q 80).

JPEG #

Input Size	Resolution	Time	Output Size	Savings
200 KB	1200x900	[benchmark pending] (~0.3s)	[pending]	[pending] (~25--40%)
1 MB	2400x1800	[benchmark pending] (~0.6s)	[pending]	[pending] (~30--45%)
3 MB	4000x3000	[benchmark pending] (~1.2s)	[pending]	[pending] (~30--50%)
8 MB	6000x4000	[benchmark pending] (~2.5s)	[pending]	[pending] (~35--55%)

PNG #

Input Size	Resolution	Type	Time	Output Size	Savings
300 KB	1920x1080	Screenshot	[benchmark pending] (~0.4s)	[pending]	[pending] (~70--85%)
2 MB	3000x2000	Photo	[benchmark pending] (~1.0s)	[pending]	[pending] (~60--75%)
5 MB	4000x3000	Photo	[benchmark pending] (~2.0s)	[pending]	[pending] (~65--80%)
15 MB	6000x4000	Photo (alpha)	[benchmark pending] (~4.0s)	[pending]	[pending] (~55--70%)

PNG-to-WebP conversions typically yield the highest savings because PNG is lossless while WebP at quality 80 applies lossy compression.

BMP #

Input Size	Resolution	Time	Output Size	Savings
5 MB	1920x1080	[benchmark pending] (~0.5s)	[pending]	[pending] (~95%+)
36 MB	4000x3000	[benchmark pending] (~2.5s)	[pending]	[pending] (~97%+)

BMP files are uncompressed bitmaps. Conversion to WebP yields dramatic file-size reductions. Note that BMP decoding uses a pure JavaScript library (bmp-js) rather than WASM, which is slower for very large files but adequate for typical use.

WebP (Re-encode)#

Input Size	Resolution	Time	Output Size	Savings
200 KB	2000x1500	[benchmark pending] (~0.4s)	[pending]	[pending] (varies)

Re-encoding WebP-to-WebP is useful for adjusting quality. Savings depend on the quality gap between input and output.

Batch Throughput #

Concurrency Scaling (Pro)#

50 JPEG files, average 3 MB each, on an 8-core machine:

`--concurrency`	Wall Time	Throughput (files/sec)	Speedup vs Serial
1	[benchmark pending] (~60s)	[pending] (~0.8)	1.0x
2	[benchmark pending] (~35s)	[pending] (~1.4)	~1.7x
4	[benchmark pending] (~20s)	[pending] (~2.5)	~3.0x
7 (default on 8-core)	[benchmark pending] (~14s)	[pending] (~3.6)	~4.3x
8	[benchmark pending] (~13s)	[pending] (~3.8)	~4.6x
16	[benchmark pending] (~12s)	[pending] (~4.0)	~5.0x
32	[benchmark pending] (~12s)	[pending] (~4.0)	~5.0x

Why scaling is sub-linear: GetWebP uses async concurrency within a single process, not OS threads. WASM codec execution blocks the event loop during each decode/encode call. Concurrency gains come from overlapping I/O (file reads/writes) with CPU work in other tasks. Beyond the CPU core count, additional concurrency adds scheduling overhead without improving throughput.

Recommended setting: Leave at default (CPU cores - 1). Setting --concurrency higher than your core count provides diminishing returns.

Large Batch (1,000 files)#

Mixed dataset: 600 JPEG + 300 PNG + 100 BMP, various sizes, on an 8-core machine at default concurrency:

Metric	Value
Total files	1,000
Wall time	[benchmark pending] (~4--6 min)
Avg time per file	[benchmark pending] (~0.3s)
Total input size	[benchmark pending] (~2 GB)
Total output size	[benchmark pending] (~800 MB)
Overall savings	[benchmark pending] (~60%)
Peak memory	[benchmark pending] (~500 MB)

Plan Comparison #

Processing 50 JPEG files (avg 3 MB each) on an 8-core machine:

Metric	Free	Pro
File limit	10 per run	Unlimited
Processing mode	Serial + 3s delay	Parallel (7 concurrent)
Time for 10 files	[pending] (~42s)	[pending] (~3s)
Time for 50 files	N/A (capped at 10)	[pending] (~14s)
Effective throughput	~0.24 files/sec	~3.6 files/sec
Speedup	--	~15x

Free Plan Timing Breakdown (10 files)#

File  1: ~1.2s  (convert)
         3.0s   (mandatory delay)
File  2: ~1.2s
         3.0s
...
File 10: ~1.2s
─────────────────────────
Total:   ~12s converting + ~27s delays = ~39s

The 3-second inter-file delay on the Free plan is the dominant bottleneck, not the conversion itself. Upgrading to Pro removes this delay entirely and enables parallel processing.

Quality vs File Size vs Speed #

All measurements on a single 4000x3000 JPEG (~3 MB):

Quality (`-q`)	Output Size	Savings	Encode Time	Visual Quality
50	[pending] (~300 KB)	[pending] (~90%)	[pending] (faster)	Noticeable artifacts
60	[pending] (~400 KB)	[pending] (~87%)	[pending]	Minor artifacts
75	[pending] (~600 KB)	[pending] (~80%)	[pending]	Good
80 (default)	[pending] (~700 KB)	[pending] (~77%)	[pending]	Very good
90	[pending] (~1.2 MB)	[pending] (~60%)	[pending]	Excellent
95	[pending] (~1.8 MB)	[pending] (~40%)	[pending]	Near-lossless
100	[pending] (~2.5 MB)	[pending] (~17%)	[pending] (slower)	Lossless WebP

Recommendation: Quality 75--80 offers the best balance of file size and visual fidelity for web delivery. Use 90+ for photography portfolios or print-quality assets.

Comparison with Other Tools #

Single-image conversion, -q 80, median of 5 runs (2026-04-05, Apple Silicon arm64):

Image	GetWebP (WASM)	Sharp (libvips)	ImageMagick	Output
320x240 JPEG (40 KB)	206 ms	89 ms	33 ms	11 KB
640x480 JPEG (138 KB)	252 ms	114 ms	60 ms	34 KB
800x600 PNG (2.4 MB)	324 ms	150 ms	92 ms	45 KB
1024x768 JPEG (302 KB)	360 ms	161 ms	110 ms	70 KB
1024x768 PNG (4.0 MB)	390 ms	192 ms	132 ms	64 KB
1920x1080 JPEG (768 KB)	643 ms	331 ms	276 ms	163 KB
2048x1536 PNG (15.6 MB)	975 ms	495 ms	419 ms	201 KB
4096x3072 JPEG (3.9 MB)	2736 ms	1196 ms	1250 ms	730 KB

Key takeaway: GetWebP is ~2x slower on raw encode (WASM vs native), but output sizes are identical — same libwebp codec, same quality. The trade-off is zero dependencies vs. raw speed.

Analysis #

GetWebP vs cwebp: Both use libwebp for encoding, so output quality and file sizes are nearly identical at the same quality setting. The performance gap comes from WASM overhead (~1.3--1.6x slower than native). GetWebP's advantage is zero-dependency cross-platform distribution and batch processing with concurrency.

GetWebP vs sharp: Sharp links directly to native libvips, making it the fastest option. However, it requires Node.js and platform-specific native binary compilation. GetWebP ships as a self-contained Bun-compiled binary.

GetWebP vs Squoosh CLI: Both use the same jSquash/Squoosh WASM codecs. Performance is comparable. Squoosh CLI is deprecated; GetWebP is actively maintained.

GetWebP vs ImageMagick: ImageMagick's WebP encoder is typically less optimized than libwebp. GetWebP produces smaller files at equivalent visual quality.

When to Choose GetWebP #

CI/CD pipelines: Single binary, no runtime dependencies, JSON output, exit codes for scripting.
Cross-platform teams: Same binary on macOS (ARM + Intel), Linux, and Windows.
Batch jobs: Built-in concurrency, recursive scanning, --skip-existing for incremental builds.
No native compilation: Avoids the node-gyp / platform-specific addon issues that sharp requires.

Parallel Throughput Comparison #

When concurrency is factored in, the gap narrows. 50 JPEG files on an 8-core machine:

Tool	Mode	Wall Time	Notes
GetWebP CLI	`--concurrency 7`	[benchmark pending] (~14s)	Built-in batch processing
cwebp + xargs	`xargs -P 7`	[benchmark pending] (~10s)	Requires shell scripting
sharp (custom script)	Worker pool	[benchmark pending] (~7s)	Requires Node.js + custom code

GetWebP's built-in concurrency makes it competitive with native tools that require manual parallelization wrappers.

When Native Tools May Be Better #

Maximum throughput: If processing millions of images and every millisecond counts, native cwebp or sharp will be faster per image.
Advanced transforms: If you also need resizing, cropping, or format conversion beyond WebP, sharp or ImageMagick offer a broader feature set.

Memory Usage #

GetWebP's memory consumption is dominated by raw pixel buffers during decode/encode:

Factor	Memory Impact
WASM module initialization	~20--40 MB (one-time, 4 codecs)
Per-image decode buffer	`width * height * 4` bytes (RGBA)
Per-image encode buffer	Output WebP size (much smaller)
Concurrent images	`concurrency * per-image memory`

Example: 8-core Machine, Default Concurrency (7)#

Processing 4000x3000 images:

WASM init:          ~30 MB
7 concurrent images: 7 * 48 MB = ~336 MB
Overhead (buffers, GC): ~50 MB
────────────────────────────
Peak estimate:       ~416 MB

Reducing Memory Usage #

Lower concurrency: --concurrency 2 reduces peak memory to ~130 MB for the same images.
Smaller images: Web-resolution images (1920x1080) use ~8 MB per decode buffer, totaling ~90 MB at concurrency 7.
BMP caution: Large BMP files require an extra full-frame buffer for the BGR-to-RGB channel swap. A 36 MB BMP (4000x3000) allocates ~96 MB during decode.

Startup Time #

Phase	Duration	Notes
Binary load	[benchmark pending] (~50ms)	Bun-compiled single binary
WASM initialization	[benchmark pending] (~150ms)	Compiles 4 WASM modules from embedded blobs
License check	[benchmark pending] (~5ms local, ~500ms network)	JWT validated locally; network call only on first auth or expiry
File scanning	[benchmark pending] (~10ms per 1,000 files)	`fs.readdir` with sort
Total cold start	~200--300ms	Subsequent runs reuse OS file cache

For single-file conversion, startup overhead is a significant fraction of total time. For batch jobs (100+ files), it is negligible.

Optimization Tips #

For Maximum Throughput #

# Use all cores on a dedicated build machine
getwebp ./images -r --concurrency 8
 
# Skip already-converted files in incremental builds
getwebp ./images -r -o ./dist --skip-existing
 
# Lower quality for faster encoding (quality < 80 is slightly faster)
getwebp ./images -q 70

For Minimum Memory #

# Limit concurrency to reduce peak memory
getwebp ./images --concurrency 2
 
# Process directories one at a time instead of recursive
getwebp ./images/thumbs -o ./out/thumbs
getwebp ./images/photos -o ./out/photos

For CI/CD Pipelines #

# JSON output + exit codes for scripted error handling
getwebp ./assets -r -o ./dist --skip-existing --json
echo "Exit code: $?"
 
# Dry run to validate before converting
getwebp ./assets -r --dry-run --json

Running Your Own Benchmarks #

To generate numbers for your specific hardware:

# Prepare a test dataset
mkdir -p /tmp/bench-input /tmp/bench-output
# Copy or generate test images into /tmp/bench-input
 
# Single-file timing
time getwebp /tmp/bench-input/sample.jpg -o /tmp/bench-output
 
# Batch with default concurrency
time getwebp /tmp/bench-input -o /tmp/bench-output
 
# Batch with JSON output for machine-parseable results
getwebp /tmp/bench-input -o /tmp/bench-output --json > results.json
 
# Concurrency sweep
for c in 1 2 4 8 16; do
  rm -rf /tmp/bench-output/*
  echo "concurrency=$c"
  time getwebp /tmp/bench-input -o /tmp/bench-output --concurrency $c
done
 
# Memory monitoring (macOS)
/usr/bin/time -l getwebp /tmp/bench-input -o /tmp/bench-output
 
# Memory monitoring (Linux)
/usr/bin/time -v getwebp /tmp/bench-input -o /tmp/bench-output

Write Safety #

GetWebP uses atomic writes to prevent corrupted output files:

Encode output is written to a temporary file (<name>.webp.tmp).
The temporary file is renamed to the final path (<name>.webp) via fs.rename.
On SIGINT (Ctrl+C), any in-progress .tmp files are cleaned up automatically.

This means a crash or interruption will never leave a half-written .webp file in your output directory. Already-completed files are safe. Use --skip-existing on the next run to resume from where you left off.

Known Limitations #

Limitation	Impact	Workaround
Single-threaded WASM	Encode/decode blocks event loop; scaling plateaus around core count	Use native `cwebp` for extreme throughput needs
No GPU acceleration	WASM codecs are CPU-only	N/A (libwebp is CPU-only by design)
BMP decode is pure JS	Slower than WASM for large BMPs	Convert BMP to PNG first if processing many large BMPs
Free plan delay	3s per file, 10 file cap	Upgrade to Pro
Memory scales with concurrency	High concurrency on large images can exhaust RAM	Reduce `--concurrency` for large-resolution batches