Disk Space and Temporary Files in Image Pipelines

Image conversion can fail for an unglamorous reason: the machine runs out of space. A pipeline may keep originals, resized working files, WebP outputs, AVIF outputs, fallback JPEGs, reports, and temporary files all at the same time. On a developer laptop or CI runner, that can be enough to stop a job halfway through.

Disk planning is part of production quality. It prevents partial output folders, confusing retries, and emergency cleanup during release work.

"Make sure you have enough disk space" does not define a production image pipeline. The runbook has to explain peak usage, temporary write behavior, cleanup boundaries, output completeness, and the evidence a reviewer should keep after the run.

Estimate Peak Disk Use#

Do not estimate only final output size. Peak disk use can be much higher because several versions may coexist:

original source images
resized working copies
generated WebP files
generated AVIF files
fallback files
temporary encoder files
conversion reports
failed or rejected outputs

If a folder contains 5 GB of original images, the job may need more than 5 GB free even if the final WebP outputs are much smaller. The safest estimate includes the source folder, all generated formats, and room for temporary work.

Use a simple preflight record before large jobs:

du -sh ./images/originals
find ./images/originals -type f | wc -l
getwebp ./images/originals --recursive --dry-run

The --dry-run step does not write outputs. It gives the team a chance to catch the wrong input folder, missing --recursive, unexpected file counts, and obvious scope mistakes before the job starts consuming disk.

For a practical peak estimate, budget for:

Item	Why it exists
Source folder	Originals should remain available for rollback and re-encoding
Output folder	Generated WebP or AVIF files are written separately when `--output` is used
Temporary output files	Encoders may write a temporary file before the final rename
Reports and manifests	NDJSON logs and manifests record what happened
Rejected review candidates	Quality experiments often create several versions before one is approved
Build artifacts	CI jobs may also hold dependencies, static-site output, caches, and uploaded artifacts

Keep Output Separate From Source#

Separate output folders make disk use easier to inspect:

images/
  originals/
  working/
  webp-output/
  avif-output/
  reports/

When outputs are mixed into the source tree, it becomes harder to see how much space generated files consume and which files are safe to delete.

Separation also helps cleanup. Generated folders can be removed and rebuilt if originals, settings, and review notes are preserved.

Use an explicit output directory:

getwebp ./images/originals \
  -o ./images/webp-output \
  --recursive \
  --json > ./images/reports/conversion.ndjson

GetWebP CLI preserves original files. If --output is omitted, converted files are written next to source files, which can be convenient for small manual jobs but makes cleanup and auditing harder in production pipelines. For batch work, a separate output directory is easier to inspect, archive, or remove.

Also watch for output-name collisions. If photo.jpg and photo.png are converted into the same output directory, both can map to photo.webp. The CLI warns about these conflicts, but the pipeline still needs a naming rule before publishing.

Clean QA Candidates Deliberately#

Quality testing often creates many candidates:

hero-q76.webp
hero-q82.webp
hero-q88.webp

These are useful during review, but they should not live forever in the production output folder. After the team chooses the approved setting, move rejected candidates to a short-term review folder or delete them according to the project policy.

Do not delete the original source file. Delete generated candidates only when the team can recreate them if needed.

For quality trials, keep candidates in a named review folder:

review-candidates/
  hero-q76.webp
  hero-q82.webp
  hero-q88.webp
approved-output/
  hero.webp
reports/
  hero-quality-decision.md

The decision note can be short: source filename, chosen output, reason, reviewer, and any known tradeoff. That is more useful than retaining every rejected file indefinitely.

Watch CI Runner Limits#

CI runners may have less disk space than developer machines. A job that works locally can fail in CI when it downloads dependencies, checks out the repository, builds the site, and converts images in the same workspace.

For CI, consider:

converting only changed folders
using a dry run for pull-request checks
uploading reports as artifacts instead of keeping large generated folders
cleaning temporary folders before the image step
splitting very large image jobs into explicit stages

GitHub's Actions documentation is useful for understanding workflow jobs and artifacts.

For CI, store small evidence and discard rebuildable output:

Keep	Usually discard after artifact upload
NDJSON conversion report	Temporary encoder files
Manifest with output fingerprints	Rejected quality candidates
Approved generated images, if they are deploy artifacts	Local build caches that can be recreated
Error log for failed files	Partial output from failed attempts

If the site build and image conversion happen in the same job, schedule cleanup before the image step and after artifact upload. Do not assume the runner has enough free space just because the repository checkout succeeded.

Avoid Half-Written Output Sets#

Running out of space can leave partial output. Some files exist, others do not, and timestamps may make the folder look newer than it really is. Treat disk failure like a batch failure, not a normal completion.

After a space-related failure:

preserve the error log
clean only generated or temporary files
confirm originals are intact
rerun the job after freeing space
review output completeness before publishing

Do not publish a folder just because some files were generated.

The GetWebP CLI writes converted output through a temporary *.tmp file and then renames it to the final output path. That reduces the chance of accepting a half-written final image, but it does not remove the need to handle disk failures. If the process is interrupted, verify whether any temporary files remain before rerunning.

For watch workflows, stale *.webp.tmp files are cleaned at watcher startup. That is helpful, but it is not a publishing signal. A restarted watcher still needs a review of processed, skipped, and failed files.

Monitor Large Source Types#

Some source folders are more likely to cause disk pressure:

camera originals
HEIC or HEIF phone photo batches
high-resolution product photography
design exports with multiple artboards
screenshots captured at large desktop sizes
duplicated CMS media exports

For these folders, run a dry run first and estimate output needs before generating every variant.

Large HEIC, HEIF, and AVIF inputs deserve special care because decoding can be resource-heavy even before the final file is written. If a folder mixes phone originals, screenshots, and already compressed WebP files, split it into smaller batches so disk and memory pressure are easier to understand.

Use separate reports for large batches:

getwebp ./phone-originals \
  -o ./webp-output/phone-originals \
  --recursive \
  --json > ./reports/phone-originals.ndjson

getwebp ./screenshots \
  -o ./webp-output/screenshots \
  --recursive \
  --json > ./reports/screenshots.ndjson

That structure makes it easier to see whether one source type is causing failures or unusual output growth.

Keep Reports Lightweight#

Structured reports are useful, but they should not become another storage problem. NDJSON or JSON reports should record paths, status, sizes, and errors, not embed image data.

Google's WebP documentation provides format background, but disk planning is a workflow concern. The format may reduce final transfer size while the pipeline still needs temporary storage to get there.

The GetWebP JSON output reference documents the fields worth keeping:

Field	Why it helps disk and cleanup decisions
`results[].file`	Identifies the source file
`results[].outputPath`	Identifies the generated file to review or delete
`results[].originalSize`	Shows how much source storage was involved
`results[].newSize`	Shows output storage cost
`results[].savedRatio`	Flags outputs that became larger than the original
`results[].status`	Separates successful, skipped, and failed files
`results[].error`	Explains per-file failures

For repeatable builds, also consider a manifest:

getwebp ./images/originals \
  -o ./images/webp-output \
  --recursive \
  --manifest ./images/reports/image-manifest.json \
  --json > ./images/reports/conversion.ndjson

A manifest is much smaller than the image folder and can record the relationship between source files and generated outputs. It should support cleanup and rollback, not replace visual review.

Make Cleanup Part of the Pipeline#

A good image pipeline ends with a cleanup decision:

keep originals
keep approved outputs
keep the conversion report
remove temporary files
remove rejected candidates after review
document anything retained for rollback

Make the rule explicit:

File class	Keep?	Reason
Originals	Yes	Required for future formats, quality changes, and rollback
Approved outputs	Yes, if deployed or published	They are the files the site actually uses
Conversion report	Yes, at least through release review	It explains what changed and what failed
Manifest	Yes for repeatable builds	It records output fingerprints and paths
Temporary `*.tmp` files	No after the run is complete	They are incomplete working files
Rejected candidates	Short retention only	Keep long enough for review, then delete by policy
Partial output from failed runs	Usually no	Rebuild after fixing the failure

Disk space problems are avoidable when teams plan for peak usage, not just final output size. That planning keeps conversion jobs predictable on laptops, CI runners, and production build machines.