Automatic setup of new builds on switching stable branch
Closed, ResolvedPublic

Description

When switching the stable branch of a bigger product (i.e. one with lots of repos, like "KDE Applications"), getting CI to pick up that is quite complicated.

After updating the kde-build-metadata, one has to

  1. kick off a new run of the "DSL Job Seed" to trigger CI to pick up the new kde-build-metadata
  2. wait for that
  3. kick off a new run of the product Dependency builds
  4. wait for them to be done
  5. kick off manually an initial build for each and any project in the product

Especially 5) is a PITA.

Ideally all of this could be somehow automated, so one has to just pull one trigger, and the rest is done automatically.

What is the idea of the CI designers here how this use-case should be solved? Could you document this somewhere, so release maintainers know where to look up what to do with the latest CI setup?

At least 5) needs some automatic solution. For 17.08 I did this manually by going over the Web UI, and I will never do that again. There should be a way to automate this, and it surely is quicker implemented than doing that via Web UI for the billions of projects.

Restricted Application added a subscriber: sysadmin. · View Herald TranscriptNov 13 2017, 10:04 PM

For 5) i have https://paste.kde.org/p8mbzmloi not sure it still works or not though

bcooksley closed this task as Resolved.Nov 14 2017, 7:22 AM
bcooksley claimed this task.
bcooksley added a subscriber: bcooksley.

I've now setup jobs under the name 'Global Rebuild' which contain Pipeline scripts which will:

  1. Trigger the rebuild of the Dependency Rebuild job, and wait for it to complete
  2. Then trigger the rebuild of all other jobs in that Product/Branch Group/Platform combination, not waiting for them to complete

If the Dependency Rebuild fails it won't proceed and the job will stop there.

You'll still have to kick off the "DSL Job Seed" but that can't be avoided (because it is the ultimate parent of every other job on the CI system)

That should reduce your workload to a total of 4 clicks with a little bit of waiting between click 1 and 2.

Given the CI system is usually quite busy during the week, i've not triggered any of them yet to test them - i'd advise you save that for a weekend, as something like Applications will take a good 12 hours or so to churn through and will occupy the system completely during that time.

In regards to design and architecture: The reason why the rebuilds are needed is because Jenkins is only aware of what a Pipeline does when it is run. When you run the DSL Job Seed job, the Pipeline scripts are regenerated, replacing the old branch name with the new branch name in the case of an Applications release. This is also why all new jobs have to be run before they start triggering automatically.

The only way to "workaround" this as such would be for git.kde.org to know what the job names are on Jenkins so it could trigger the appropriate ones directly. This limits us strictly to the branches defined by branch groups and creates a strong interdependency between git.kde.org and Jenkins which isn't ideal.

It also means that if for some reason Jenkins is down for a bit we have no way of easily asking it to catch up on changes short of rebuilding everything (with our current approach we can just ask it to poll everything, and it'll only rebuild those repositories with changes)

Thanks! That seems a satisfying solution to me, given this will be needed only once in a few weeks.

Gven CI seems non-busy today, I just kicked off a "Global Rebuild KDevelop stable-kf5-qt5 SUSEQt5.9" for an exploration run :)

Gven CI seems non-busy today, I just kicked off a "Global Rebuild KDevelop stable-kf5-qt5 SUSEQt5.9" for an exploration run :)

Looks like that worked as intended, good job :)

One triggered project build failed during retrieving dependencies (https://build.kde.org/job/KDevelop%20kdevelop-pg-qt%20stable-kf5-qt5%20SUSEQt5.9/2/console) but that might have been the result of some commits to ci-tooling while that job was running? Other builds went fine.

I would like to propose to kick off the Global Rebuilds for KA-stable (or at least for one platform) today's CET midnight though, as the beta release snapshotting is in 2 days, and it would be good to have a clear picture about the build status.

In regards to "tarfile.ReadError: unexpected end of data" errors, i'm not sure what causes these. In theory the system should be completely atomic (with downloading to a scratch file then moving it over top of the final file). The error means this isn't working properly for some reason (my guess would be the scratch file is in /tmp which the system thinks is a separate file system as it's part of the Docker provided overlayfs).

These errors are infrequent though and happen sparingly (much less than the rsync errors we used to hit!)

Looks like you've triggered the Applications stable rebuilds already and they've completed successfully, which is nice to know.

In regards to "tarfile.ReadError: unexpected end of data" errors, i'm not sure what causes these. In theory the system should be completely atomic (with downloading to a scratch file then moving it over top of the final file). The error means this isn't working properly for some reason (my guess would be the scratch file is in /tmp which the system thinks is a separate file system as it's part of the Docker provided overlayfs).

These errors are infrequent though and happen sparingly (much less than the rsync errors we used to hit!)

On the run of Applications stable-kf5-qt5 SUSEQt5.9 I observed, that right after the dependency build was done and the project builds were started, the first set of builds (almost?) all failed during retrieving the dependencies:
https://build.kde.org/job/Applications%20libkdegames%20stable-kf5-qt5%20SUSEQt5.9/2/[2]
https://build.kde.org/job/Applications%20kimap%20stable-kf5-qt5%20SUSEQt5.9/2/[2]
https://build.kde.org/job/Applications%20akregator%20stable-kf5-qt5%20SUSEQt5.9/2/[2]
https://build.kde.org/job/Applications%20dolphin%20stable-kf5-qt5%20SUSEQt5.9/2/[2]
https://build.kde.org/job/Applications%20kalzium%20stable-kf5-qt5%20SUSEQt5.9/2/[2]
(url as passed in notification emails , that [2] seems some non-replaced variable?)
When I restarted those builds manually, they later all ran fine.

So given the timely coincidence, it seems the dependency build products are not yet completely synced to where the docker builds try to fetch them from? Just a guess, your playing ground :)

Looks like you've triggered the Applications stable rebuilds already and they've completed successfully, which is nice to know.

Yes, the FreeBSD and the openSUSE ones. The FreeBSD one actually went quite quickly, IIRC the whole rebuild was done after 3-4 hours? No idea about the openSUSE one, I went to bed when it was running.
I did not dare to trigger the windows one, as during all the time there was a longer queue at the windows build shop. Will have a look again at the WE, when there might be more silent times.

Another feeback from the use-case of running the Global Rebuild after changing the product branch:
for FreeBSD the Dependency build did not work on the first run, due to one thing which still needed fixing in the build-metadata (some dep from extragears had an unusable branch registered for stable). The run failing the first time is to be expected. But I did not get a notification about the Dep build failing,, only saw it by accident when logging at the web UI.
For the project builds though I got notifications, which seems fine.

So follow-up request: could the person triggering the Global Rebuild also get a notification about the success of the Dependcy build?

Example:
https://build.kde.org/view/CI%20Management/job/Dependency%20Build%20Applications%20stable-kf5-qt5%20FreeBSDQt5.7/39/
https://build.kde.org/view/CI%20Management/job/Dependency%20Build%20Applications%20stable-kf5-qt5%20SUSEQt5.9/7/

In regards to the 5 failures you saw, yes that will be a consequence of the builds all starting at exactly the same time (for all intents and purposes at least) which unsurprisingly manages to trigger the lack of atomicity much more than it otherwise would.

I've now incorporated email notifications into the Dependency Build jobs.