Collect more information from version control systems
AcceptedPublic
Actions

Authored by thomasfischer on Oct 14 2019, 6:59 PM.

Details

Reviewers

sitter
kossebau

Summary

In KBibTeX, there is a getgit.cmake file to collect information from Git regarding the source's history assuming that the source code came from a Git repository. ECMSourceVersionControl.cmake can be expanded making getgit.cmake's functionality available to all KDE projects.
Code from KBibTeX's getgit.cmake was refactored to fit ECMSourceVersionControl.cmake.

By loading this module, two variables will be set:

A boolean variable ECM_SOURCE_UNDER_VERSION_CONTROL tells whether the current source directory is managed by a known version control system (VCS).
String variable ECM_SOURCE_VERSION_CONTROL_WHICH contains an identifier which VCS is used, e.g. "git".

In case Git is used, functions ecm_source_version_control_probe_revision and ecm_source_version_control_probe_branch allow the retrieve the checkout's revision (8-character hex string) and branch (e.g. 'master'), respectively.

The documentation inserted at the beginning of the module explains the set variables and provided functions in greater detail.

Test Plan

Create a temporary directory and copy a .git directory from another project into this directory in order to have some data to work with.
Copy/link the patched ECMSourceVersionControl.cmake into this temporary directory.
Create a CMakeLists.txt file which may look like this:

cmake_minimum_required(VERSION 3.7.2)
project(ECMSourceVersionControlTest)
find_package(ECM ${KF5_MIN_VERSION} REQUIRED NO_MODULE)
set(CMAKE_MODULE_PATH ${CMAKE_SOURCE_DIR})

# Just by loading this module, variable
# ECM_SOURCE_UNDER_VERSION_CONTROL may be set to either
# TRUE or FALSE, depending on if the source directory is
# under a known version control (Git, Subversion,
# Mercurial, Bazaar). Depending on the recognized version
# control, variable ECM_SOURCE_VERSION_CONTROL_WHICH may
# be set to either "git", "svn", "hg", or "bzr".
include(ECMSourceVersionControl)
message("ECM_SOURCE_UNDER_VERSION_CONTROL=" ${ECM_SOURCE_UNDER_VERSION_CONTROL})
message("ECM_SOURCE_VERSION_CONTROL_WHICH=" ${ECM_SOURCE_VERSION_CONTROL_WHICH})

# In order to retrieve the revision and branch
# of the checkout in the source directory, the
# following two functions may be helpful:
ecm_source_version_control_probe_revision(ECM_SOURCE_VERSION_CONTROL_REVISION)
ecm_source_version_control_probe_branch(ECM_SOURCE_VERSION_CONTROL_BRANCH)
message("ECM_SOURCE_VERSION_CONTROL_REVISION=" ${ECM_SOURCE_VERSION_CONTROL_REVISION})
message("ECM_SOURCE_VERSION_CONTROL_BRANCH=" ${ECM_SOURCE_VERSION_CONTROL_BRANCH})

# The following function will print out a status
# message on the recognized Git revision and branch (so,
# it works only on Git repos). It will in its turn
# invoke ecm_source_version_control_probe_revision() and
# ecm_source_version_control_probe_branch() in order to
# collect the necessary data.
ecm_source_version_control_status()

In another temporary directory, run cmake referring to the first temporary directory.

Diff Detail

Repository

R240 Extra CMake Modules

Lint

Lint Skipped

Unit

Unit Tests Skipped

thomasfischer created this revision.Oct 14 2019, 6:59 PM

Restricted Application added projects: Frameworks, Build System. · View Herald TranscriptOct 14 2019, 6:59 PM

Restricted Application added subscribers: kde-buildsystem, kde-frameworks-devel. · View Herald Transcript

thomasfischer requested review of this revision.Oct 14 2019, 6:59 PM

thomasfischer edited the summary of this revision. (Show Details)Oct 14 2019, 7:07 PM

thomasfischer added reviewers: sitter, kossebau.

Do we have a request for this outside kbibtex? My attitude towards adding things to ECM is always "is there more than one user".

ECM_SOURCE_VERSION_CONTROL_COMMIT_COUNT seems very specific and opinionated (what's origin? does it count commits in merges? also the question you raise vis a vis SVN), not necessarily cheap to do based on VCS type and history size and to be honest I am not sure what the use is of knowing how many commits there are in total, specifically because for decentralized systems like git there may be any number of local commits in that count. So I would not add it.

Branch and rev seems plenty useful so long as there's demand for it outside a single project. BZR and HG I would remove on account of being incomplete, possibly also SVN for lack of demand (?).

Lastly, I think that all of the variables should perhaps be individual functions. This file is included by every single KDE software implicitly, so we'd run >= 2 $CMD forks every single time cmake is run but for all we know 99% of all projects do not use the variables we set. What do you think?

I rewrote the patch to address the comments:

Setting ECM_SOURCE_UNDER_VERSION_CONTROL has been refactored to set variable ECM_SOURCE_VERSION_CONTROL_WHICH as well to determine which of three known VCSes are used. ECM_SOURCE_VERSION_CONTROL_WHICH contains a string like "git" or "svn".
The more expensive check where an external program ("git") has to be run got refactored into a macro (ecm_source_version_control_stat). Support for Subversion, Mercurial, and Bazaar got removed. Messages will be printed to show the status of the analysis. It is a macro, not a function, to have changes to variables like ECM_SOURCE_VERSION_CONTROL_BRANCH visible outside of the function.

To clarify ECM_SOURCE_VERSION_CONTROL_COMMIT_COUNT: it counts the number of commits in direct line of succession from the repository's initialization to the current commit. It does not include commits in other branches. Basically the number of commits listed in a plain git log. The commit count gives a quick indication of the progress of a repository (or branch) without requiring to look up the repo's commit messages or hashes.

Hm, how about separate functions though? With a single stat any given build still needs N process forks even when they only want 1 value.

In D24641#548394, @thomasfischer wrote:

To clarify ECM_SOURCE_VERSION_CONTROL_COMMIT_COUNT: it counts the number of commits in direct line of succession from the repository's initialization to the current commit. It does not include commits in other branches. Basically the number of commits listed in a plain git log. The commit count gives a quick indication of the progress of a repository (or branch) without requiring to look up the repo's commit messages or hashes.

I understand. This is super heavy though. I have just run it on all clones I have on this PC and for some repos that takes upwards of half a second to complete... on an SSD, so I also ran a quick check on our CI servers, which use HDDs and there it takes at least half a second with some repos (e.g. kdevelop) going up to 26 seconds (uncached)! I wouldn't mind terribly if the various things got split into their own functions (ecm_source_version_control_revison, ecm_source_version_control_branch, ecm_source_version_control_commit_count... or some names like that) but even then I have to question the use case behind the commit count information. So, what is the use case? What do you do with this information? It occurs to me that if you know the hash you could look up the commit count of that hash should you need it, but I struggle to imagine a scenario where that number is relevant.

modules/ECMSourceVersionControl.cmake
70	I think you are leaking this variable into the parent scope. I am not super sure how to deal with this but I think I've seen `_`prefixes, or you use a function and explicitly forward into the PARENT_SCOPE (https://cmake.org/cmake/help/v3.0/command/set.html). Alternatively with a multi-function approach I'd probably just make it ECM_SOURCE_VERSION_CONTROL_EXECUTABLE so it only needs finding on the first call.

thomasfischer edited the test plan for this revision. (Show Details)Nov 3 2019, 6:08 PM

I updated the diff. I was quite surprised about the time it takes to compute the number of commits (26 seconds), thus I removed this functionality completely. As of now, only 'revision' and 'branch' are queried for.
There are three functions now:

ecm_source_version_control_probe_revison retrieves the source directory's Git revision and stores it in ECM_SOURCE_VERSION_CONTROL_REVISION. A check is made to return immediately if the revision got retrieved earlier. The Git executable is only searched for once, the path is stored in the private variable _ECM_SOURCE_VERSION_GIT_EXECUTABLE. Two warnings may be shown if source directory is either not under version control or the version control system is not Git. Private boolean variables are used to supress showing the same warning again once ecm_source_version_control_probe_branch (see below) gets invoked.
ecm_source_version_control_probe_branch is very similar to the function above, except for that it retrieves the branch and stores it in ECM_SOURCE_VERSION_CONTROL_BRANCH.
ecm_source_version_control_status will print out a status line about the current Git revision and branch. Invoking this function is optional. The function will invoke ecm_source_version_control_probe_revison and ecm_source_version_control_probe_branch to retrieve the necessary data. Nothing will be printed if no revision and no branch are known.

This is starting to look really good. All functions will need documenting in the header of that file so they show up on api.kde.org, see other modules for examples.

modules/ECMSourceVersionControl.cmake
61	Do we really need this? It seems to me we could just spam it for every call, in the grand scheme of things it makes no difference but is less logic one has to worry about when extending this module.
75	I'd move the exec detection and missing reporting into its own helper which gets called by all "public" functions. Currently the logic is duped in both functions.
87	I think it's more idiomatic if you accept the variable name as a function argument instead of hardcoding one. For example `find_program`, but really most helpers work like that off the top of my head.

I tried to address the most recent comments:

Added documentation at the file's beginning in comments. Not sure how to test if syntax, though. Spelling, grammar, and wording should be checked as well (not a native speaker).
Functions ecm_source_version_control_probe_branch and ecm_source_version_control_probe_revision now have an 'outvar'. When invoking either of the functions, a variable name needs to be passed where the result is stored inside.
Locating the Git binary has been refactored into a 'private' function _ecm_source_version_control_detect_git.

The WARNING tags should be removed or changed to AUTHOR_WARNING I think. A stray call to one of the functions here isn't really something a user building a tarball, for example, can do anything about, so WARNING seems wrong.

Other than that I am happy with this.
Any input from the @kossebau or anyone else @kde-buildsystem

I guess nobody has further input then.
LGTM. Ship it.

This revision is now accepted and ready to land.Feb 24 2020, 4:03 PM

Revision Contents
Changeset List

			Path	Packages
M			modules/ECMSourceVersionControl.cmake (135 lines)

Diff 71145

View Options

modules/ECMSourceVersionControl.cmake

Diff	ID	Description	Created	Lint	Unit
Base		Base
Diff 1	67922		Oct 14 2019, 6:40 PM	★	★
Diff 2	68072		Oct 16 2019, 6:11 PM	★	★
Diff 3	69225		Nov 3 2019, 6:09 PM	★	★
Diff 4	71145		Dec 9 2019, 10:37 PM	★	★