Artikel ini saat ini hanya tersedia dalam bahasa Inggris.

Image OptimizationNov 12, 20257 min read

Disk Space and Temporary Files in Image Pipelines

Image conversion can fail for an unglamorous reason: the machine runs out of space. A pipeline may keep originals, resized working files, WebP outputs, AVIF outputs, fallback JPEGs, reports, and temporary files all at the same time. On a developer laptop or CI runner, that can be enough to stop a job halfway through.

Disk planning is part of production quality. It prevents partial output folders, confusing retries, and emergency cleanup during release work.

"Make sure you have enough disk space" does not define a production image pipeline. The runbook has to explain peak usage, temporary write behavior, cleanup boundaries, output completeness, and the evidence a reviewer should keep after the run.

Estimate Peak Disk Use#

Do not estimate only final output size. Peak disk use can be much higher because several versions may coexist:

  • original source images
  • resized working copies
  • generated WebP files
  • generated AVIF files
  • fallback files
  • temporary encoder files
  • conversion reports
  • failed or rejected outputs

If a folder contains 5 GB of original images, the job may need more than 5 GB free even if the final WebP outputs are much smaller. The safest estimate includes the source folder, all generated formats, and room for temporary work.

Use a simple preflight record before large jobs:

du -sh ./images/originals
find ./images/originals -type f | wc -l
getwebp ./images/originals --recursive --dry-run

The --dry-run step does not write outputs. It gives the team a chance to catch the wrong input folder, missing --recursive, unexpected file counts, and obvious scope mistakes before the job starts consuming disk.

For a practical peak estimate, budget for:

ItemWhy it exists
Source folderOriginals should remain available for rollback and re-encoding
Output folderGenerated WebP or AVIF files are written separately when --output is used
Temporary output filesEncoders may write a temporary file before the final rename
Reports and manifestsNDJSON logs and manifests record what happened
Rejected review candidatesQuality experiments often create several versions before one is approved
Build artifactsCI jobs may also hold dependencies, static-site output, caches, and uploaded artifacts

Keep Output Separate From Source#

Separate output folders make disk use easier to inspect:

images/
  originals/
  working/
  webp-output/
  avif-output/
  reports/

When outputs are mixed into the source tree, it becomes harder to see how much space generated files consume and which files are safe to delete.

Separation also helps cleanup. Generated folders can be removed and rebuilt if originals, settings, and review notes are preserved.

Use an explicit output directory:

getwebp ./images/originals \
  -o ./images/webp-output \
  --recursive \
  --json > ./images/reports/conversion.ndjson

GetWebP CLI preserves original files. If --output is omitted, converted files are written next to source files, which can be convenient for small manual jobs but makes cleanup and auditing harder in production pipelines. For batch work, a separate output directory is easier to inspect, archive, or remove.

Also watch for output-name collisions. If photo.jpg and photo.png are converted into the same output directory, both can map to photo.webp. The CLI warns about these conflicts, but the pipeline still needs a naming rule before publishing.

Clean QA Candidates Deliberately#

Quality testing often creates many candidates:

hero-q76.webp
hero-q82.webp
hero-q88.webp

These are useful during review, but they should not live forever in the production output folder. After the team chooses the approved setting, move rejected candidates to a short-term review folder or delete them according to the project policy.

Do not delete the original source file. Delete generated candidates only when the team can recreate them if needed.

For quality trials, keep candidates in a named review folder:

review-candidates/
  hero-q76.webp
  hero-q82.webp
  hero-q88.webp
approved-output/
  hero.webp
reports/
  hero-quality-decision.md

The decision note can be short: source filename, chosen output, reason, reviewer, and any known tradeoff. That is more useful than retaining every rejected file indefinitely.

Watch CI Runner Limits#

CI runners may have less disk space than developer machines. A job that works locally can fail in CI when it downloads dependencies, checks out the repository, builds the site, and converts images in the same workspace.

For CI, consider:

  • converting only changed folders
  • using a dry run for pull-request checks
  • uploading reports as artifacts instead of keeping large generated folders
  • cleaning temporary folders before the image step
  • splitting very large image jobs into explicit stages

GitHub's Actions documentation is useful for understanding workflow jobs and artifacts.

For CI, store small evidence and discard rebuildable output:

KeepUsually discard after artifact upload
NDJSON conversion reportTemporary encoder files
Manifest with output fingerprintsRejected quality candidates
Approved generated images, if they are deploy artifactsLocal build caches that can be recreated
Error log for failed filesPartial output from failed attempts

If the site build and image conversion happen in the same job, schedule cleanup before the image step and after artifact upload. Do not assume the runner has enough free space just because the repository checkout succeeded.

Avoid Half-Written Output Sets#

Running out of space can leave partial output. Some files exist, others do not, and timestamps may make the folder look newer than it really is. Treat disk failure like a batch failure, not a normal completion.

After a space-related failure:

  1. preserve the error log
  2. clean only generated or temporary files
  3. confirm originals are intact
  4. rerun the job after freeing space
  5. review output completeness before publishing

Do not publish a folder just because some files were generated.

The GetWebP CLI writes converted output through a temporary *.tmp file and then renames it to the final output path. That reduces the chance of accepting a half-written final image, but it does not remove the need to handle disk failures. If the process is interrupted, verify whether any temporary files remain before rerunning.

For watch workflows, stale *.webp.tmp files are cleaned at watcher startup. That is helpful, but it is not a publishing signal. A restarted watcher still needs a review of processed, skipped, and failed files.

Monitor Large Source Types#

Some source folders are more likely to cause disk pressure:

  • camera originals
  • HEIC or HEIF phone photo batches
  • high-resolution product photography
  • design exports with multiple artboards
  • screenshots captured at large desktop sizes
  • duplicated CMS media exports

For these folders, run a dry run first and estimate output needs before generating every variant.

Large HEIC, HEIF, and AVIF inputs deserve special care because decoding can be resource-heavy even before the final file is written. If a folder mixes phone originals, screenshots, and already compressed WebP files, split it into smaller batches so disk and memory pressure are easier to understand.

Use separate reports for large batches:

getwebp ./phone-originals \
  -o ./webp-output/phone-originals \
  --recursive \
  --json > ./reports/phone-originals.ndjson

getwebp ./screenshots \
  -o ./webp-output/screenshots \
  --recursive \
  --json > ./reports/screenshots.ndjson

That structure makes it easier to see whether one source type is causing failures or unusual output growth.

Keep Reports Lightweight#

Structured reports are useful, but they should not become another storage problem. NDJSON or JSON reports should record paths, status, sizes, and errors, not embed image data.

Google's WebP documentation provides format background, but disk planning is a workflow concern. The format may reduce final transfer size while the pipeline still needs temporary storage to get there.

The GetWebP JSON output reference documents the fields worth keeping:

FieldWhy it helps disk and cleanup decisions
results[].fileIdentifies the source file
results[].outputPathIdentifies the generated file to review or delete
results[].originalSizeShows how much source storage was involved
results[].newSizeShows output storage cost
results[].savedRatioFlags outputs that became larger than the original
results[].statusSeparates successful, skipped, and failed files
results[].errorExplains per-file failures

For repeatable builds, also consider a manifest:

getwebp ./images/originals \
  -o ./images/webp-output \
  --recursive \
  --manifest ./images/reports/image-manifest.json \
  --json > ./images/reports/conversion.ndjson

A manifest is much smaller than the image folder and can record the relationship between source files and generated outputs. It should support cleanup and rollback, not replace visual review.

Make Cleanup Part of the Pipeline#

A good image pipeline ends with a cleanup decision:

  • keep originals
  • keep approved outputs
  • keep the conversion report
  • remove temporary files
  • remove rejected candidates after review
  • document anything retained for rollback

Make the rule explicit:

File classKeep?Reason
OriginalsYesRequired for future formats, quality changes, and rollback
Approved outputsYes, if deployed or publishedThey are the files the site actually uses
Conversion reportYes, at least through release reviewIt explains what changed and what failed
ManifestYes for repeatable buildsIt records output fingerprints and paths
Temporary *.tmp filesNo after the run is completeThey are incomplete working files
Rejected candidatesShort retention onlyKeep long enough for review, then delete by policy
Partial output from failed runsUsually noRebuild after fixing the failure

Disk space problems are avoidable when teams plan for peak usage, not just final output size. That planning keeps conversion jobs predictable on laptops, CI runners, and production build machines.

Jack avatar

Jack

GetWebP Editor

Jack writes GetWebP guides about local-first image conversion, WebP workflows, browser compatibility, and practical performance checks for teams that publish images on the web.