The War between the new Pipeline Artifacts and Build Artifacts

I stumbled over the announcement of Pipeline Artifacts tasks that were supposed to be superior to the class build artifacts. If you don’t know what I mean, you can either read the announcement that came with the sprint 142 or you have a look at the screenshot below.

The azure Pipeline Artifact will replace the next generation of build artifacts. I knew they were there but didn’t really understand them and also just knew that they are going to be faster somehow.

From several short documents and blog posts I could gather the following information about Pipeline Artifacts:

Pipeline Artifacts make use of the same technology as Universal Packages
Packages can be created in the scope over multiple phases in a build
Effectiveness comes mostly from large build outputs
Universal Packages make use of client- and server-side deduplication, so I expect the same for Pipeline Artifacts
The speed of upload and download of files is optimized with parallelization
Files are only uploaded / download once in the scope of the whole organization. So files get only uploaded once if they are identical

All this sounds promising, so I went out to test the new technology. I also wanted to make this public, so I created a public project on Azure DevOps (which also gives me 10 free parallel pipelines to try this out).

Test Summary

This article describes the tests for the new pipeline artifacts against the classic build artifact approach. The basic cornerstones for the tests are

Two repositories, one with small files and one with larger files. Both around.
Four pipelines, each are a combination of using large files versus small files with then either using build artifacts or pipeline artifacts each.
The pipelines use the following task pairs
- Publish Pipeline Artifacts and Download Pipeline Artifacts
- Publish Build Artifacts and Download Build Artifacts
each pipeline consisting of seven phases where each phase having the same tasks but running on the agents pools below
- Hosted (Hosted)
- Hosted macOS (Hosted macOS)
- Hosted Ubuntu 1604 (Hosted Ubuntu 1604)
- Hosted VS2017 (Hosted VS2017)
- Hosted Windows 2019 with VS2019 (Hosted Windows 2019 with VS2019)
- Hosted Windows Container (Hosted Windows Container)
- Default (which includes one private agent running on windows 10)
Triggering 100 builds on all pipelines testing each agent for each pipeline

The results were interesting but I actually expected that at least in some of the cases. They show the different time needed to download artifacts and to publish artifacts.

As you can see, the best performance boost comes with lots of small files in your artifacts. For larger files, it does not take much longer and there is not much difference. But as promised, the benefits come mostly from large build outputs with lots of small files.

Since I tested this for all available agent pools, I crossed checked to see if one agent is faster than the other. The charts below take the average duration for each agent with upload and download tasks for Build and Pipeline Artifacts. What can easily be seen is, that the macOS based agent seems to handle the files the fastest. Why that is I can only guess, since I have no knowledge on macOS itself. But it’s still interesting to see!. Also private agents seem to be slower, which of course depends on many variables like size of the machine, OS, software that is installed and so on.

Generating Test Data

This project has two repositories. One with large files and one with small files. I used a basic script from Stephane van Gulick (PowerShell MVP) but changed it, so can create random file size, where I can choose either the total sum size of all files created or choose the maximum and minimum file size for one file and choose how many of them should be created with a random size within that range. The following shows the script for creating the files:

https://gist.github.com/SebastianSchuetze/2827d537105ed9382d6ed45763497038#file-create-randomfiles-ps1

So then those files are uploaded to two different repositories. Now we can work with it. I used this script to create large files with the command below. This creates 500 files of size from 5 – 12 MB.


Import-Module .\Create-RandomFiles.ps1
Create-RandomFiles -NumberOfFiles 500 -Path "C:\MyPath" -MinRandomSize 5000000 -MaxRandomSize 12000000

The second command created lots of small files with 5-300 KB.


Import-Module .\Create-RandomFiles.ps1
Create-RandomFiles -NumberOfFiles 10000 -Path "C:\MyPath" -MinRandomSize 5000 -MaxRandomSize 300000

Why is that? Because we now that there is an overhead when dealing with lots of files and I want to see how the performance improvement is for pipeline artifacts compared to build artifacts on different agents.

Both commands produced something between 1,5-2 GB of data, that is committed to the repositories.

Test Plan

My Test plan is basically to run build pipelines to publish and download pipeline artifacts as well as build artifacts. Furthermore, I wanted to do this with every hosted agent that was available at the time and also do this for two different repositories with small and big files.

Four different pipelines were used. Two pipelines for small files making use of both approach and two more for large files with the same approach. Each pipeline has a phase where each phase is using another agent. There are currently six different hosted agents pools plus a default one, which contains one private hosted by me in Azure. Below you can see the YAML pipeline template for pipeline artifacts plus the executed pipeline that is calling them.


jobs:
- job: ${{parameters.jobname}}
  workspace:
    clean: all
  pool: ${{parameters.pool}}
  dependsOn: ${{parameters.dependsOnJobName}}  
  steps:
  - task: PublishPipelineArtifact@0
    displayName: 'Publish Pipeline Artifact'
    inputs:
      artifactName: PipelineArtifact-${{parameters.jobname}}-$(Build.BuildNumber)
      targetPath: '$(Build.SourcesDirectory)'
    continueOnError: true
  - task: DownloadPipelineArtifact@0
    displayName: 'Download Pipeline Artifact'
    inputs:
      artifactName: PipelineArtifact-${{parameters.jobname}}-$(Build.BuildNumber)
      targetPath: '$(System.ArtifactsDirectory)'
    continueOnError: true

And the pipeline which calls the template above


trigger: none
jobs:
- template: pipeline-artifact-tasks.yml
  parameters:
    jobname: Hosted
    dependsOnJobName: ''
    pool: 
      vmImage: 'vs2015-win2012r2'

- template: pipeline-artifact-tasks.yml
  parameters:
    jobname: 'Hosted_VS2017'
    dependsOnJobName: 'Hosted'
    pool: 
      vmImage: 'vs2017-win2016'
  
- template: pipeline-artifact-tasks.yml
  parameters:
    jobname: 'Hosted_Windows_2019_with_VS2019'
    dependsOnJobName: 'Hosted_VS2017'
    pool: 
      vmImage: 'windows-2019'
  
- template: pipeline-artifact-tasks.yml
  parameters:
    jobname: 'Hosted_macOS'
    dependsOnJobName: 'Hosted_Windows_2019_with_VS2019'
    pool: 
      vmImage: 'macOS-10.13'
  
- template: pipeline-artifact-tasks.yml
  parameters:
    jobname: 'Hosted_Ubuntu_1604'
    dependsOnJobName: 'Hosted_macOS'
    pool: 
      vmImage: 'ubuntu-16.04'
  
- template: pipeline-artifact-tasks.yml
  parameters:
    jobname: 'Hosted_Windows_Container'
    dependsOnJobName: 'Hosted_Ubuntu_1604'
    pool: 
      vmImage: 'win1803'

- template: pipeline-artifact-tasks.yml
  parameters:
    jobname: 'Private'
    dependsOnJobName: 'Hosted_Windows_Container'
    pool: 
      name: 'default'

You can also check out these pipelines in both repositories if you want to know more about YAML pipelines, check out the introduction (part 1 and part 2 from Damian Brady (Microsoft Cloud Developer Advocate specialized in Azure DevOps). Or check the official documetation.

To run the test I made use of PowerShell and the VSTeam PowerShell module maintained by Donavan Brown. This one is basically calling the REST API of your Azure DevOps organization and making things more efficient for you. So, I used the script below to trigger 100 builds for each build definition.

Each build also has 7 phases. Which means practically that 2800 tests are started. That should be enough to get some good statistics.


$buildDefinitionIds = (15,16,17,19) #Inlcude an array of the build definition IDs
$projectName = "#You Pipeline Display Name#"

1..100 | ForEach-Object {

    foreach($buildDefId in $buildDefinitionIds){

        Add-VSTeamBuild -BuildDefinitionId $buildDefId -ProjectName $projectName

    }

}

Note: You still have to log in to your tenant to be able to use this script.

Getting and Evaluating Test Results

So now I had the result of 400 builds with every 7 phases where each phase is having a publish and a download task for the artifacts. Getting statistics effectively as a developer, of course, is done via scripting. Here I used mostly VSTeam as well to handle it.

Since the script is a bit long, I included it into Gist

https://gist.github.com/SebastianSchuetze/83b7320b77e829b0164af776d0b939e6

The script itself created a CSV file, where of course I used Excel to evaluate the results. You can download the excel file if you want and see my evaluation from the statistics together with the raw data.

Build Data with Evaluation Download

There is not much left to say, as to try out
Pipeline Artifact yourself! 🙂

Azure DevOps

Permalink: https://www.razorspoint.com/2019/03/13/the-war-between-the-new-pipeline-artifacts-and-build-artifacts/

Author

Sebastian Schütze

@razorspoint

Sebastian is an Azure Nerd with focus on DevOps and Azure DevOps (formerly VSTS) that converted from the big world of SharePoint and O365. He was working with O365 since 2013 and loved it ever since. As his focus shifted in 2017 to more DevOps related topics in the Microsoft Stack. He learned to love the possibilities of automation. Besides writing articles in his blog and German magazines, he is still contributing to the SharePoint Developer Community (and PnP SharePoint) to help to make the ALM part a smoother place to live in.

Comments 0

There are currently no comments.

Cancel Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

How to Get Notified of Azure DevOps Sprint Updates

Azure DevOps Sprint Update: Cross Staging Variables supported natively

How to Preview and Test a Changing YAML Pipeline on Azure DevOps