As we can have different contentsSuffixes, we store this value in the
metadata and read the correct value, if we retrieve/publish the package.
Details
- Reviewers
bcooksley - Maniphest Tasks
- T3689: Add abi compliance checker to CI
- Commits
- R857:39e3f7a6078a: Handle different Archive contentsSuffixes.
Diff Detail
- Repository
- R857 CI System Tooling
- Branch
- master
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 4719 Build 4737: arc lint + arc unit
helpers/helperslib/Packages.py | ||
---|---|---|
84 | Rather than writing this as a function, could we keep this as variable to reduce the number of changes below? | |
91 | Not sure why this needed to be functionised? | |
106 | Why was this removed? The code below will try to download something which we know will never exist without this, and urllib throws exceptions I believe (which will cause failures elsewhere) | |
115–118 | Newline removed? | |
130 | The server manifest always includes the full content of the individual archive manifest files, so this isn't needed. | |
158 | Whitespace change? | |
240 | cententSuffix? | |
244 | Instead of try... except wouldn't it be safer/nicer to do if 'contentsSuffix' in metadata? |
For retrievePackage we have the issue about race-condition:
- serverManifest is only downloaded once when initalizing the Archive -> may have outdated information
- localMetadata and package are more tight together IMO that's why it makes sense to always use localMetadata as reference.
Okay in our use cases the contentSuffix do not changes very often, we could also rely on serverManifest, but it feels wrongly.
As the localMetadata changes in several steps within the method (l109,l130,l146), I don't want to update always the localContentsPath and localContentsFilename. We could skip the read of files at l145, as we updated the localMetadata already at 130, when we downloaded something.
helpers/helperslib/Packages.py | ||
---|---|---|
84 | well phabricator shows \t as 4 spaces and not as 8. I fixed it to replace everything with \t | |
91 | Well I functioned this, because the localMetada changes within this method. Otherwise I need to update every time the localContentsPath if we have new metadata. | |
106 | I moved this part to line94 before the check for search of localMetadata. If it is not inside serverManifest, we don't need to try to find the localMetadata | |
130 | But the whole function always works with localMetadata. And I think it is better to use localMetadata, as serverManifest is only read once while initalizing the Archive. So localMetadata+content may be newer. | |
158 | it is a 8spaces -> \t to be consistent with the rest of the file - I can move this to a single commit, if you want to | |
244 | This is a more philosophical discussion. The python way is to use try/except:
if "contentsSuffix" in metadata: contentsSuffix = metadata['ContentsSuffix'] -> but we have someting better ( I often forget about this), that is exactly for our usecase: contentsSuffix = metadata.get("contentsSuffix" , self.contentsSuffix) |
helpers/helperslib/Packages.py | ||
---|---|---|
145 | this is replaced by l128, so not needed anymore? Okay there may be issues with move, but than this is raised anyways. |
We can rely on the serverManifest being the latest information here, as the window of time between the master manifest file being downloaded and parsed and resources fetched should be less than a minute at the maximum.
In the case of the ABI Reference information, you don't know about build results until they're published in the master manifest in any case as each result is stored by the latest Git commit hash. This also means that results won't change after they've been published, so for your usecase you can depend on the server manifest containing the latest information.
For the dependencies used by the rest of the system, at worst a build uses a slightly out of date result (and to date this hasn't been an issue, and is highly unlikely to occur in any case). The window of time where this would potentially happen is perhaps a maximum of 30-45 seconds.
If we were to fetch the individual metadata files that would defeat part of the reason for caching the files on the nodes locally (reducing master server load). In the case of some projects, they depend on upwards of 20-30 projects, and at least one of our build nodes is located in Canada (which has much greater latency to the master server), so this would cause builds for those projects to take much longer.
helpers/helperslib/Packages.py | ||
---|---|---|
124 | You're now reading the response twice? | |
244 | .get() does look like the cleanest and most proper solution yes, let's go with that. |
The metadata not being uptodate is getting only an issue, if the metadata is important. With the normal tar files the only field used from the metadata is the timestamp check, but this even used as internal stuff. Even the checksum is not used to check, that the download was correctly. As the metadata is not used for anything outside the class, you won't even notice if the metadata does not match the data.
On the other side I can fully understand your point of view . As we won't change the contentsSuffix, when there would be a update of the package.
Sorry, i'm not sure what you're concerned about happening here. While I appreciate your code is leaning on the metadata more, the window of time during which you could potentially have a newer individual metadata file vs. the server manifest is perhaps a couple of seconds at most (because we move the file over first, then update the individual metadata file, and finally rebuild the master manifest file).
I also note that a job would be comparing against prior runs of itself, so you are guaranteed that nobody else will publish a new result (Plus given you put the commit hash in the archive name, once published files should never change correct?).
To use KCoreAddons as an example, the only job that can add results to that job are the jobs for KCoreAddons itself, and because jobs can only run one at a time you'll never have a collision (because you're contained within the branch group and platform constraints as well).
In regards to publishing the updated manifest file, the publishPackage invocation doesn't return until the publishing process completes in full - so the capture command in the build job can't finish and exit until the server manifest is up to date.
I'm quite concerned about this because retrievePackage() is a substantial hot path (being called once per dependency for every build on the CI system, and every application has all of the Frameworks as a dependency) and has the possibility to add substantial load to the build-artifacts.kde.org server (plus additional latency and more points of failure should the network be slightly flaky).
understood.
I'm quite concerned about this because retrievePackage() is a substantial hot path (being called once per dependency for every build on the CI system, and every application has all of the Frameworks as a dependency) and has the possibility to add substantial load to the build-artifacts.kde.org server (plus additional latency and more points of failure should the network be slightly flaky).
But what you are concerned about? What load do you see is my patch adding? Before you had two times parsing the localMetadata form disk. Now this is only done once second time is parsed out of memory directly.
Accessing the local functions:
If localMetadata is available in cache and uptodate: 2 times
If data needs to be refreshed: we access the functions: 4 times
If the metadata is not available: we access the functions 3 times
I don't think the time executing those function will make anything slower. And there is no additional load for build-artifacts.kde.org, as we don't download more data from this server, the only thing that is different is that we first download the metadata parse it and afterwards download the contents file. Instead of download the contents and the metadata and parse the metadata afterwards.
And if network is flaky it will break in both cases, as we need the same data anyways and urllib will handle that for us.
If you say, that we should trust the serverMetadata anyways, than I would suggest to remove localMetadata completly from server and just write a localMetadata file whithin retrievePackage like this:
serverMetadata = self.serverManifest[ package ] data = {"timestamp": serverMetadata['timestamp']} with open(localMetadataPath, 'w') as f: yaml.dump( data, f, default_flow_style=False, allow_unicode=True )
Than we only have ONE source for the metadata ( serverMetadata) and the localMetadata is only used for checking, if the content is uptodate.
Okay, i've taken a closer look (now that it isn't a morning where I haven't had my morning cup of tea yet) and the final changes seem for the most part okay.
I've made some stylization changes and added a few extra comments locally, which i'll land to the repository shortly.