Gitlab Pipeline – cache – What is cache in GitLab CI/CD?

DevOps

MOTOSHARE 🚗🏍️
Turning Idle Vehicles into Shared Rides & Earnings

From Idle to Income. From Parked to Purpose.
Earn by Sharing, Ride by Renting.
Where Owners Earn, Riders Move.
Owners Earn. Riders Move. Motoshare Connects.

With Motoshare, every parked vehicle finds a purpose. Owners earn. Renters ride.
🚀 Everyone wins.

Start Your Journey with Motoshare

The cache keyword in GitLab CI/CD is used to specify a list of files and directories that should be saved (cached) after a job finishes successfully and restored before the next run of that job (or other jobs with the same cache key). This helps to speed up your pipelines by reusing downloaded dependencies or other files that take time to generate, rather than fetching or rebuilding them every time.

This example will demonstrate caching node_modules for a Node.js project, which is a very common use case.


Example .gitlab-ci.yml with cache:

YAML

# .gitlab-ci.yml

# 1. Global Cache Definition (applied to all jobs by default unless overridden)
#    Alternatively, you can place this under 'default:cache:'
cache:
  key: "$CI_COMMIT_REF_SLUG" # Creates a cache per branch/tag
  paths:
    - node_modules/ # Cache the node_modules directory
  policy: pull-push # Default policy: pull the cache at the start, push (update) at the end
  when: on_success # Default: save cache only if the job succeeds

stages:
  - setup
  - build
  - test

# Job 1: Installs dependencies and builds the project
# This job will create and push the cache on its first successful run.
# Subsequent runs on the same branch (if node_modules is still valid) will pull the cache.
install_and_build:
  stage: setup
  image: node:18-alpine
  script:
    - echo "--- Install and Build Job ---"
    - echo "CI_COMMIT_REF_SLUG is: $CI_COMMIT_REF_SLUG, this determines cache scope if not overridden by file key."
    - echo "Checking if node_modules directory exists from a previous cache restore..."
    - if [ -d "node_modules" ]; then echo "node_modules found, restored from cache."; else echo "node_modules not found, will run full npm install."; fi
    - npm install # This will be faster if node_modules is restored from cache
    - echo "Finished npm install."
    - echo "Running build command..."
    - npm run build # Assuming you have a 'build' script in your package.json
    - echo "Build complete."

# Job 2: Uses the cache created by the 'install_and_build' job (if on the same branch)
# This job might run tests and would benefit from not re-installing all dependencies.
run_tests:
  stage: test
  image: node:18-alpine
  # This job will inherit the global cache settings by default.
  # It will attempt to 'pull' the cache (including node_modules/) before running.
  script:
    - echo "--- Test Job ---"
    - echo "Checking if node_modules directory exists from cache..."
    - if [ -d "node_modules" ]; then echo "node_modules found, restored from cache for testing."; else echo "node_modules not found for testing. This might happen if 'install_and_build' failed or cache key differs."; fi
    - echo "Running tests..."
    - npm test # Assuming you have a 'test' script in your package.json
    - echo "Tests complete."

# Job 3: Demonstrates a more specific cache key based on a dependency file
# This is often a better strategy for dependency caching.
lint_code:
  stage: test
  image: node:18-alpine
  cache: # Job-level cache overrides the global cache setting for this job
    key:
      files:
        - package-lock.json # Cache is specific to the content of package-lock.json
      prefix: "$CI_PROJECT_NAME" # Optional prefix for the key
    paths:
      - node_modules/
    policy: pull # This job only pulls the cache, doesn't update it (useful for linters/tests)
  script:
    - echo "--- Lint Job ---"
    - echo "Cache key for this job is based on package-lock.json."
    - if [ ! -d "node_modules" ]; then npm install; else echo "node_modules found from cache."; fi
    - echo "Running linter..."
    - npm run lint # Assuming you have a 'lint' script

Code language: PHP (php)

Explanation:

  1. Global cache: Block:
    • This top-level cache: definition (or default:cache:) sets default caching behavior for all jobs in the pipeline.
    • key: "$CI_COMMIT_REF_SLUG":
      • The key is crucial. It identifies the cache. Jobs with the same cache key will share the same cache.
      • $CI_COMMIT_REF_SLUG is a predefined GitLab variable representing the branch or tag name in a URL-friendly format. This means each branch will have its own independent cache.
      • While simple, this key might not be optimal if dependencies don’t change often within a branch.
    • paths::
      • An array of files or directories to be cached.
      • - node_modules/: This tells GitLab to cache the contents of the node_modules directory.
    • policy: pull-push: (Default behavior if not specified)
      • pull: At the beginning of the job, GitLab attempts to download and extract the cache.
      • push: At the end of a successful job, GitLab creates a new cache archive with the specified paths and uploads it.
    • when: on_success: (Default behavior if not specified)
      • The cache will only be saved (pushed) if the job finishes successfully. Other options include on_failure and always.
  2. install_and_build Job:
    • This job inherits the global cache settings.
    • First Run (or if cache doesn’t exist/key changes):
      • npm install will download all dependencies and create node_modules/.
      • After the script successfully completes, the node_modules/ directory will be archived and uploaded as a cache associated with the key (e.g., main if on the main branch).
    • Subsequent Runs (on the same branch, with the same cache key):
      • Before the script starts, GitLab will download and extract the node_modules/ cache.
      • npm install will then run. It should be much faster because most, if not all, dependencies are already present. It will only fetch updates or new packages.
  3. run_tests Job:
    • This job also inherits the global cache settings.
    • If it runs on the same branch (and thus uses the same cache key) as a previously successful install_and_build job, it will pull the node_modules/ cache. This means it doesn’t need to run npm install from scratch (though the example does it for demonstration, in a real scenario you might skip npm install if node_modules exists and is valid).
  4. lint_code Job (Job-Level Cache):
    • This job defines its own cache: block, which overrides the global cache settings specifically for this job.
    • key::
      • files: - package-lock.json: This is a more robust way to key your dependency cache. The cache key will be generated based on a checksum of the package-lock.json file. This means the cache is only reused if the exact dependencies haven’t changed. If package-lock.json changes, a new cache is created.
      • prefix: "$CI_PROJECT_NAME": (Optional) Adds a prefix to the generated key, which can help avoid key collisions if multiple projects share the same cache storage and use similar file names for keys.
    • paths: - node_modules/: Still caching node_modules.
    • policy: pull:
      • This job will only attempt to download (pull) an existing cache that matches its key.
      • It will not upload (push) or update the cache, even if its node_modules directory changes. This is often useful for jobs like linters or tests that consume dependencies but don’t modify them in a way that should update the shared cache.

Key Concepts and Behavior:

  • Purpose: To speed up pipeline execution by reducing the need to re-download dependencies or re-generate files on every job run.
  • How it Works:
    1. Restore (Pull): At the start of a job, GitLab Runner checks for a cache matching the job’s cache key. If found, it’s downloaded and extracted into the job’s working directory before the before_script runs.
    2. Save (Push): If the job’s cache policy includes push and the when condition is met (e.g., on_success), after the after_script completes, the runner archives the directories/files specified in paths and uploads them to the cache storage, associated with the cache key.
  • Cache key is Critical:
    • A good key ensures cache validity and effectiveness.
    • key: files: is highly recommended for dependency caches as it ties the cache to the state of your dependency lock files.
    • You can also use predefined variables like $CI_COMMIT_REF_SLUG, $CI_JOB_IMAGE, or create composite keys.
  • paths: Defines what gets cached. These paths are relative to the project directory ($CI_PROJECT_DIR).
  • policy:
    • pull-push: Downloads at start, uploads at end.
    • pull: Only downloads at start.
    • push: Only uploads at end (useful for jobs that only produce a cache but don’t need to download one).
  • when:
    • on_success: (Default) Cache is saved only if the job succeeds.
    • on_failure: Cache is saved only if the job fails.
    • always: Cache is saved regardless of job success or failure.
  • Cache vs. Artifacts:
    • Cache: Meant for temporary storage to speed up jobs (e.g., node_modules, downloaded libraries). Not guaranteed to be available (runners might clear caches). Shared between subsequent runs of the same job or different jobs sharing the same cache key.
    • Artifacts: Meant for passing build outputs between stages/jobs or for downloading after a pipeline completes (e.g., compiled binaries, test reports). Guaranteed to be available for the duration defined by artifact expiry settings.
  • Cache Scope:
    • By default, caches are stored per GitLab Runner.
    • For distributed caching (e.g., using S3), an administrator needs to configure it. This allows different runners to share the same cache.
  • Clearing Cache: You can clear pipeline caches manually via the GitLab UI (CI/CD settings or on the pipeline view). Changing the cache key also effectively starts a new cache.

Using cache effectively can significantly reduce the runtime of your GitLab CI/CD pipelines, especially for projects with many dependencies.

Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x