Collect more information from version control systems
Needs ReviewPublic

Authored by thomasfischer on Oct 14 2019, 6:59 PM.



In KBibTeX, there is a getgit.cmake file to collect information from Git regarding the source's history assuming that the source code came from a Git repository. ECMSourceVersionControl.cmake can be expanded making getgit.cmake's functionality available to all KDE projects.
Code from KBibTeX's getgit.cmake was refactored to fit ECMSourceVersionControl.cmake and similar code has been added to support Subversion and other version control systems.

The following information is extracted from version control systems:

  • ECM_SOURCE_VERSION_CONTROL_BRANCH will contain the name of the branch, such as for example master for Git or ^/trunk for Subversion
  • ECM_SOURCE_VERSION_CONTROL_REVISION is the current revision, such as the SHA1 hash in Git or r followed by a number in Subversion
  • ECM_SOURCE_VERSION_CONTROL_COMMIT_COUNT is the direct number of commits from the origin of the history until the current revision, excluding any other branches.

Different version control systems are supported to different degrees:

  • Git is fully supported thanks to the mature code from getgit.cmake
  • Subversion is well supported except for the question wheter to use last-changed-revision or just revision and how to correctly compute the commit count (some svn log ... | grep -c ... won't be available)
  • Bazaar is virtually incomplete. However, this seems to be a dead project anyway.

Due to its incomplete support, this revision requires input from people more familiar with Subversion, Mercurial, and Bazaar.

Test Plan
  1. Create a temporary directory and copy a .git directory from another project into this directory.
  2. Copy/link the patched ECMSourceVersionControl.cmake into this temporary directory.
  3. Create a CMakeLists.txt file which may look like this:
cmake_minimum_required(VERSION 3.7.2)

# Just by including this module, variable
# TRUE or FALSE, depending on if the source directory is
# under a known version control (Git, Subversion,
# Mercurial, Bazaar). Depending on the recognized version
# control, variable ECM_SOURCE_VERSION_CONTROL_WHICH may
# be set to either "git", "svn", "hg", or "bzr".

# The following two functions do not print anything except
# warnings, but set variables
# that no error occurred and the version control system is
# Git.

# The following function will print out a status
# message on the recognized Git revision and branch (so,
# it works only on Git repos). It will in its turn
# invoke ecm_source_version_control_probe_revison() and
# ecm_source_version_control_probe_branch() in order to
# collect the necessary data.

In another temporary directory, run cmake referring to the first temporary directory.

Diff Detail

R240 Extra CMake Modules
Lint Skipped
Unit Tests Skipped
thomasfischer created this revision.Oct 14 2019, 6:59 PM
Restricted Application added projects: Frameworks, Build System. · View Herald TranscriptOct 14 2019, 6:59 PM
Restricted Application added subscribers: kde-buildsystem, kde-frameworks-devel. · View Herald Transcript
thomasfischer requested review of this revision.Oct 14 2019, 6:59 PM
thomasfischer edited the summary of this revision. (Show Details)Oct 14 2019, 7:07 PM
thomasfischer added reviewers: sitter, kossebau.

Do we have a request for this outside kbibtex? My attitude towards adding things to ECM is always "is there more than one user".

ECM_SOURCE_VERSION_CONTROL_COMMIT_COUNT seems very specific and opinionated (what's origin? does it count commits in merges? also the question you raise vis a vis SVN), not necessarily cheap to do based on VCS type and history size and to be honest I am not sure what the use is of knowing how many commits there are in total, specifically because for decentralized systems like git there may be any number of local commits in that count. So I would not add it.

Branch and rev seems plenty useful so long as there's demand for it outside a single project. BZR and HG I would remove on account of being incomplete, possibly also SVN for lack of demand (?).

Lastly, I think that all of the variables should perhaps be individual functions. This file is included by every single KDE software implicitly, so we'd run >= 2 $CMD forks every single time cmake is run but for all we know 99% of all projects do not use the variables we set. What do you think?

I rewrote the patch to address the comments:

  1. Setting ECM_SOURCE_UNDER_VERSION_CONTROL has been refactored to set variable ECM_SOURCE_VERSION_CONTROL_WHICH as well to determine which of three known VCSes are used. ECM_SOURCE_VERSION_CONTROL_WHICH contains a string like "git" or "svn".
  2. The more expensive check where an external program ("git") has to be run got refactored into a macro (ecm_source_version_control_stat). Support for Subversion, Mercurial, and Bazaar got removed. Messages will be printed to show the status of the analysis. It is a macro, not a function, to have changes to variables like ECM_SOURCE_VERSION_CONTROL_BRANCH visible outside of the function.

To clarify ECM_SOURCE_VERSION_CONTROL_COMMIT_COUNT: it counts the number of commits in direct line of succession from the repository's initialization to the current commit. It does not include commits in other branches. Basically the number of commits listed in a plain git log. The commit count gives a quick indication of the progress of a repository (or branch) without requiring to look up the repo's commit messages or hashes.

Hm, how about separate functions though? With a single stat any given build still needs N process forks even when they only want 1 value.

To clarify ECM_SOURCE_VERSION_CONTROL_COMMIT_COUNT: it counts the number of commits in direct line of succession from the repository's initialization to the current commit. It does not include commits in other branches. Basically the number of commits listed in a plain git log. The commit count gives a quick indication of the progress of a repository (or branch) without requiring to look up the repo's commit messages or hashes.

I understand. This is super heavy though. I have just run it on all clones I have on this PC and for some repos that takes upwards of half a second to complete... on an SSD, so I also ran a quick check on our CI servers, which use HDDs and there it takes at least half a second with some repos (e.g. kdevelop) going up to 26 seconds (uncached)! I wouldn't mind terribly if the various things got split into their own functions (ecm_source_version_control_revison, ecm_source_version_control_branch, ecm_source_version_control_commit_count... or some names like that) but even then I have to question the use case behind the commit count information. So, what is the use case? What do you do with this information? It occurs to me that if you know the hash you could look up the commit count of that hash should you need it, but I struggle to imagine a scenario where that number is relevant.


I think you are leaking this variable into the parent scope. I am not super sure how to deal with this but I think I've seen _prefixes, or you use a function and explicitly forward into the PARENT_SCOPE (

Alternatively with a multi-function approach I'd probably just make it ECM_SOURCE_VERSION_CONTROL_EXECUTABLE so it only needs finding on the first call.

thomasfischer edited the test plan for this revision. (Show Details)Nov 3 2019, 6:08 PM

I updated the diff. I was quite surprised about the time it takes to compute the number of commits (26 seconds), thus I removed this functionality completely. As of now, only 'revision' and 'branch' are queried for.
There are three functions now:

  • ecm_source_version_control_probe_revison retrieves the source directory's Git revision and stores it in ECM_SOURCE_VERSION_CONTROL_REVISION. A check is made to return immediately if the revision got retrieved earlier. The Git executable is only searched for once, the path is stored in the private variable _ECM_SOURCE_VERSION_GIT_EXECUTABLE. Two warnings may be shown if source directory is either not under version control or the version control system is not Git. Private boolean variables are used to supress showing the same warning again once ecm_source_version_control_probe_branch (see below) gets invoked.
  • ecm_source_version_control_probe_branch is very similar to the function above, except for that it retrieves the branch and stores it in ECM_SOURCE_VERSION_CONTROL_BRANCH.
  • ecm_source_version_control_status will print out a status line about the current Git revision and branch. Invoking this function is optional. The function will invoke ecm_source_version_control_probe_revison and ecm_source_version_control_probe_branch to retrieve the necessary data. Nothing will be printed if no revision and no branch are known.

This is starting to look really good. All functions will need documenting in the header of that file so they show up on, see other modules for examples.


Do we really need this? It seems to me we could just spam it for every call, in the grand scheme of things it makes no difference but is less logic one has to worry about when extending this module.


I'd move the exec detection and missing reporting into its own helper which gets called by all "public" functions. Currently the logic is duped in both functions.


I think it's more idiomatic if you accept the variable name as a function argument instead of hardcoding one.

For example find_program, but really most helpers work like that off the top of my head.