Gitlab Pipeline – parrallel – What is parrallel in GitLab CI/CD?

DevOps

MOTOSHARE 🚗🏍️
Turning Idle Vehicles into Shared Rides & Earnings

From Idle to Income. From Parked to Purpose.
Earn by Sharing, Ride by Renting.
Where Owners Earn, Riders Move.
Owners Earn. Riders Move. Motoshare Connects.

With Motoshare, every parked vehicle finds a purpose. Owners earn. Renters ride.
🚀 Everyone wins.

Start Your Journey with Motoshare

The parallel keyword in a GitLab CI/CD job definition allows you to run multiple instances of the same job concurrently. This is extremely useful for:

  • Speeding up test suites: By splitting tests across several parallel jobs (test sharding/splitting).
  • Building for multiple targets/architectures: Running the same build script with different configuration variables.
  • Processing data in chunks: Dividing a large dataset or a list of items among parallel jobs.

GitLab provides two main ways to use parallel:

  1. Fixed number of parallel jobs: parallel: <number>
  2. Matrix of parallel jobs: parallel:matrix: (using variable combinations)

Example .gitlab-ci.yml with parallel:

YAML

# .gitlab-ci.yml

stages:
  - test
  - build

default:
  image: alpine:latest

# Job 1: Demonstrates a fixed number of parallel jobs for sharding tests
parallel_test_execution:
  stage: test
  parallel: 3 # Run this job 3 times in parallel
  script:
    - echo "--- Parallel Test Execution ---"
    # CI_NODE_INDEX is 1-based (e.g., 1, 2, 3)
    # CI_NODE_TOTAL is the total number of parallel jobs (e.g., 3)
    - echo "This is test job instance $CI_NODE_INDEX of $CI_NODE_TOTAL."
    - echo "Simulating test execution for a subset of tests..."
    # Example: Logic to select a subset of tests based on CI_NODE_INDEX
    - |
      if [ "$CI_NODE_INDEX" -eq 1 ]; then
        echo "Running test group 1 (e.g., unit tests part 1)"
        sleep 5
      elif [ "$CI_NODE_INDEX" -eq 2 ]; then
        echo "Running test group 2 (e.g., unit tests part 2)"
        sleep 7
      elif [ "$CI_NODE_INDEX" -eq 3 ]; then
        echo "Running test group 3 (e.g., integration tests part 1)"
        sleep 6
      fi
    - echo "Test instance $CI_NODE_INDEX finished."

# Job 2: Demonstrates a matrix of parallel jobs for different configurations
build_multi_target:
  stage: build
  script:
    - echo "--- Building for Target: $TARGET_OS, Arch: $TARGET_ARCH ---"
    - echo "This is job instance $CI_NODE_INDEX of $CI_NODE_TOTAL in the matrix."
    - echo "Fetching dependencies for $TARGET_OS-$TARGET_ARCH..."
    - sleep 10 # Simulate build time
    - echo "Build complete for $TARGET_OS-$TARGET_ARCH."
    - mkdir -p "build_output/${TARGET_OS}_${TARGET_ARCH}"
    - echo "binary_for_${TARGET_OS}_${TARGET_ARCH}" > "build_output/${TARGET_OS}_${TARGET_ARCH}/app"
  parallel:
    matrix:
      # First set of combinations
      - TARGET_OS: ["linux", "windows"]
        TARGET_ARCH: ["amd64"]
      # Second set of combinations (can have different variables or values)
      - TARGET_OS: ["darwin"]
        TARGET_ARCH: ["amd64", "arm64"]
      # A single specific combination
      - TARGET_OS: ["linux"]
        TARGET_ARCH: ["arm64"]
        EXTRA_FLAG: ["--optimized"] # You can add other variables
  artifacts:
    paths:
      - build_output/
    expire_in: 1 hour

Code language: PHP (php)

Explanation:

  1. parallel: Keyword:
    • Used within a job’s definition to specify that the job should be run multiple times in parallel.
  2. parallel_test_execution Job (Fixed Number):
    • parallel: 3:
      • This tells GitLab to create and run 3 instances of the parallel_test_execution job concurrently (assuming enough runners are available).
      • In the GitLab UI, these jobs will appear as parallel_test_execution 1/3, parallel_test_execution 2/3, and parallel_test_execution 3/3.
    • Predefined CI/CD Variables:
      • $CI_NODE_INDEX: A 1-based index for the current parallel job instance (e.g., 1, 2, or 3 in this case).
      • $CI_NODE_TOTAL: The total number of parallel jobs in this set (e.g., 3 in this case).
    • Script Logic: The script uses $CI_NODE_INDEX to simulate running different groups of tests. This is a common pattern for test sharding, where a large test suite is divided among parallel jobs to reduce overall execution time.
  3. build_multi_target Job (Matrix):
    • parallel:matrix:: This allows you to define a list of variable combinations. GitLab will create a separate parallel job instance for each unique combination.
    • Variable Combinations:
      • The first matrix entry: YAML- TARGET_OS: ["linux", "windows"] TARGET_ARCH: ["amd64"] This will generate 2 jobs:
        1. TARGET_OS=linux, TARGET_ARCH=amd64
        2. TARGET_OS=windows, TARGET_ARCH=amd64
      • The second matrix entry: YAML- TARGET_OS: ["darwin"] TARGET_ARCH: ["amd64", "arm64"] This will generate 2 more jobs:
        1. TARGET_OS=darwin, TARGET_ARCH=amd64
        2. TARGET_OS=darwin, TARGET_ARCH=arm64
      • The third matrix entry: YAML- TARGET_OS: ["linux"] TARGET_ARCH: ["arm64"] EXTRA_FLAG: ["--optimized"] This generates 1 job:
        1. TARGET_OS=linux, TARGET_ARCH=arm64, EXTRA_FLAG=--optimized
      • Total Jobs: In this example, the build_multi_target job with this matrix configuration would generate 2 + 2 + 1 = 5 parallel job instances. Each instance will have the specified environment variables (TARGET_OS, TARGET_ARCH, EXTRA_FLAG) set accordingly.
    • Script Logic: The script uses these environment variables (e.g., $TARGET_OS, $TARGET_ARCH) to perform actions specific to that combination, like building for a particular operating system and architecture.
    • Artifacts: Each parallel job instance can produce its own artifacts. The example shows creating artifacts specific to the build target.

Key Concepts and Behavior:

  • Purpose: To significantly speed up pipelines by running identical or similar tasks concurrently or to build/test across a range of configurations.
  • Runner Availability: The actual level of parallelism depends on the number of available GitLab Runners that can pick up these jobs. If you define parallel: 10 but only have 2 runners available, only 2 jobs will run at a time.
  • Job Naming:
    • For parallel: <number>, jobs are named job_name X/Y.
    • For parallel:matrix:, jobs are typically named job_name: [VAR1:value1, VAR2:value2, ...] and also include the X/Y suffix based on their order in the expanded matrix.
  • CI_NODE_INDEX and CI_NODE_TOTAL: These are crucial for dividing work among the parallel instances, especially when using parallel: <number>. For parallel:matrix:, you typically rely on the matrix variables themselves to differentiate the work.
  • Resource Consumption: Using parallel can consume more CI/CD minutes and runner resources, so use it judiciously where the speed benefits outweigh the costs.
  • Dependencies and Artifacts: Each parallel job instance is treated as a separate job. If other jobs need a parallelized job, they would typically need all instances to complete (unless you specifically need a particular instance, which is more advanced). Artifacts from parallel jobs are usually named uniquely or aggregated in a later stage.

Using parallel effectively can drastically improve the efficiency of your CI/CD pipelines, especially for time-consuming tasks like extensive test suites or multi-platform builds.

Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x