libvipsDec 6, 20257 min read

Why libvips Pipelines Fail in CI and How to Reduce Risk

libvips is a powerful image processing library, and many teams use it through Sharp in Node.js projects. It can be fast and capable, but CI pipelines built around native image tooling can fail for reasons that have little to do with the images themselves: platform differences, missing codecs, package manager behavior, architecture mismatches, or cache problems.

The solution is not to avoid libvips in every project. The solution is to treat the image pipeline as production infrastructure and reduce the number of assumptions it makes about the runner.

"libvips is hard, use another tool" is not a serious diagnosis. libvips and Sharp are good choices when the project needs their capabilities. The useful question is narrower: which failures are environment failures, which are image failures, and which ones should block a build?

Identify the Failure Category#

When a CI image job fails, first separate dependency failure from conversion failure.

Dependency failures happen before the image logic is really tested:

  • package install fails
  • native binary cannot load
  • required system library is missing
  • architecture is unsupported
  • container image lacks a needed runtime component

Conversion failures happen after the tool runs:

  • unsupported input file
  • corrupt image
  • permission issue
  • output path missing
  • memory or disk pressure

These categories need different fixes. Retrying a corrupt input will not fix it. Changing conversion settings will not fix a missing native library.

Use a table in the incident note:

Failure categoryTypical signalBetter next action
Install failurePackage manager exits before the converter runsFix lockfile, install flags, registry, or runner image
Native load failureThe process starts but cannot load Sharp or libvipsCheck platform, architecture, bundled binaries, and system libraries
Codec failureOnly certain formats failVerify input support and decide whether that format belongs in the job
Per-file decode failureOne image path repeats in the errorReplace or repair the source file
Output failureConversion succeeds until write timeCheck output directory, permissions, and available disk
Resource pressureJobs fail only on large batches or high concurrencyReduce scope, lower concurrency, or split the batch

This classification keeps the team from applying the wrong fix and avoids treating every CI failure as if it has the same root cause.

Pin the Runtime Environment#

CI failures often appear when the local environment differs from the runner. Pin the parts that matter:

  • Node.js version
  • package manager version
  • lockfile
  • base Docker image if used
  • CPU architecture
  • operating system family

If the pipeline relies on Sharp, follow Sharp's own installation documentation for the platform and package manager in use. Do not assume a fix from one runner applies to every environment.

Capture the environment in the failing build, not after the fact:

node --version
npm --version
node -p "process.platform + ' ' + process.arch"
node -e "try { console.log(require('sharp').versions) } catch (error) { console.error(error.message); process.exit(1) }"

Keep that output with the CI log. If the failure appears only on linux arm64, only after a cache rebuild, or only after a package-manager upgrade, the evidence will be visible.

Keep Image Jobs Narrow#

The broader the job, the harder it is to debug. A CI step that installs dependencies, builds the site, converts every image, uploads artifacts, and deploys the result gives you too many possible failure points.

Split the workflow:

  1. install dependencies
  2. run a small image conversion check
  3. run the site build
  4. upload artifacts or reports

This makes failures easier to locate. It also lets you decide whether image optimization should block every build or only release workflows.

For pull requests, a small representative fixture set is often enough:

FixtureWhy include it
Large photoExercises memory and encoder behavior
Transparent PNGChecks alpha handling and edge quality
ScreenshotChecks text and UI-line clarity
Already compressed JPEGExposes cases where re-encoding saves little
Known bad fileVerifies that per-file errors are reported cleanly

Do not make every PR reprocess a historical media folder unless that is the product requirement. Use release jobs for the full pass and keep PR jobs focused on reproducibility.

Avoid Rebuilding Native Stacks Unnecessarily#

If the image conversion environment is stable, cache dependencies carefully. But do not let cache hide changes. A stale cache can make a pipeline pass until it is rebuilt from scratch, then fail during a release.

Run periodic clean builds or test the Docker image without cache before major releases. CI should prove the setup can be recreated, not only that yesterday's cache still works.

A healthy cache policy has both paths:

Daily PR build: use dependency cache
Weekly scheduled build: clean install, no dependency cache
Before release: rebuild Docker image without cache
After image-tool upgrade: run fixture set and full batch

If only the cached path is tested, the pipeline is not proving that a new runner can reproduce the image stack.

Use Structured Logs#

Image processing failures can be noisy. Structured output helps separate per-file failures from environment failures. If the conversion tool supports JSON or NDJSON output, save it as a CI artifact.

For libvips or Sharp pipelines, also capture the version information that matters: Node.js version, Sharp version, libvips version, runner OS, and architecture. That information turns a vague "works on my machine" bug into a reproducible issue.

When the job is routine WebP or AVIF conversion, GetWebP CLI gives you a simpler structured report:

npx -y getwebp ./src/images \
  -o ./dist/images \
  --recursive \
  --format webp \
  --json > getwebp-conversion.ndjson

The GetWebP JSON output reference documents newline-delimited JSON, a first-line version preamble, and conversion events such as convert.completed, convert.truncated, and convert.failed. Successful per-file records include file, outputPath, originalSize, newSize, savedRatio, quality, qualityMode, and status.

That matters in CI because the build can make decisions from records instead of console prose:

jq -r 'select(.type == "convert.completed") | .data.results[] | select(.status == "error") | [.file, .error] | @tsv' getwebp-conversion.ndjson

Do not count the job as healthy just because stdout ended with a friendly line. Check the event type, counts, and per-file statuses.

Consider a Focused CLI for Simple Jobs#

If the project only needs routine WebP or AVIF conversion, a focused CLI may reduce maintenance compared with a broader native image-processing stack. GetWebP CLI is designed for local conversion workflows with output directories, dry runs, JSON output, and clear exit codes.

That does not make it a replacement for every libvips use case. If your pipeline needs complex transformations, compositing, or deep integration with Node.js, Sharp may still be the right tool. The point is to match the tool to the job.

Use a fit table instead of a tool debate:

NeedBetter fit
Resize, composite, crop, transform, or integrate deeply in Node.jsSharp / libvips pipeline
Convert a source folder to WebP or AVIF with an output directoryFocused CLI workflow
Parse per-file conversion records in CICLI with NDJSON output
Keep complex image logic inside application codeSharp / libvips pipeline
Let an AI agent scan and convert local images safelyMCP tool workflow

The GetWebP CLI commands, LLM context document, and CI integration docs are the references to use when the focused CLI path fits the job, especially when CI logic depends on current exit-code behavior.

Review Outputs, Not Just Builds#

A passing CI build does not prove visual quality. After changing native dependencies, encoder settings, or runtime images, review a representative sample:

  • screenshots
  • product images
  • transparent assets
  • gradients
  • dark images
  • responsive variants

Google's WebP documentation explains the format side, but project-specific review is still required.

Keep the visual review tied to asset roles:

Asset typeWhat to inspect
ScreenshotText, icons, borders, and thin lines
Product imageTexture, color, edge detail, and zoom view
Transparent assetEdges on light and dark backgrounds
Gradient or dark photoBanding, blocking, and shadow detail
Responsive variantCrop and selected file at each breakpoint

This is the difference between a build check and a publishing-quality check.

Reduce Risk With a Fallback Plan#

Before a release, know what happens if the image job fails:

  • can the build use previously approved outputs?
  • can conversion run locally and commit artifacts?
  • can the job be skipped for documentation-only changes?
  • who owns dependency upgrades?

Make the policy explicit:

FailureRelease decision
Native dependency install failsBlock release unless approved outputs already exist
One corrupt unused image failsRemove from batch or mark non-blocking with evidence
Product or hero image fails conversionBlock until fixed or original delivery is approved
JSON report shows partial failureParse failed records before deciding
Visual review failsDo not publish the converted output even if CI passed
Network/auth failure in licensing stepRetry or fix credentials; do not hide the failure in a generic cache change

libvips-based pipelines can be excellent when maintained deliberately. CI risk drops when the environment is pinned, jobs are narrow, logs are structured, and the team has a fallback path for release pressure.

Jack avatar

Jack

GetWebP Editor

Jack writes GetWebP guides about local-first image conversion, WebP workflows, browser compatibility, and practical performance checks for teams that publish images on the web.