Make tar archives reproducible by setting Pax headers
ClosedPublic

Authored by aaronpuchert on Nov 23 2019, 6:04 PM.

Details

Summary

When POSIXLY_CORRECT is set, GNU tar will add ctime, atime and the PID
of the tar process that created the archive, as pointed out in [1].
To circumvent this, we set the Pax headers manually as recommended, but
we only do this when SOURCE_DATE_EPOCH is set, i.e. reproducible builds
are desired.

[1] https://salsa.debian.org/reproducible-builds/reproducible-website/merge_requests/50/diffs

Test Plan

Repeated builds produce the same files now. Changing the input files'
timestamps seems without effect.

Diff Detail

Repository
R32 KDevelop
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
aaronpuchert created this revision.Nov 23 2019, 6:04 PM
Restricted Application added a project: KDevelop. · View Herald TranscriptNov 23 2019, 6:04 PM
aaronpuchert requested review of this revision.Nov 23 2019, 6:04 PM
kossebau accepted this revision.Jan 12 2020, 11:28 AM
kossebau added a subscriber: kossebau.

No insight myself, but would trust you you know what you are doing here :) and it also matches what I just read on https://reproducible-builds.org/docs/archives/

This revision is now accepted and ready to land.Jan 12 2020, 11:28 AM
This revision was automatically updated to reflect the committed changes.

No insight myself, but would trust you you know what you are doing here :) and it also matches what I just read on https://reproducible-builds.org/docs/archives/

Thanks! I have to admit, I'm not a huge expert either. I noticed that we're still not reproducible on openSUSE and wrote to @bmwiedemann. He then added the remarks about Pax headers to that page.

This should work fine with a producer GNU-tar >= 1.28 , but what will be the consumer of the tar file?

Asking because I remember the case of alpine Linux and its 'abuild' producer. Alpine package manager implemented its own tar and that would only be happy with

--pax-option=exthdr.name=%d/PaxHeaders/%f,atime:=0,ctime:=0

The other thing is about POSIXLY_CORRECT - it is only considered in recent tar git (no releases last time I checked) and only for embedding the PID of the tar process. So I would just not mention it in the commit message to make it easier to understand:

GNU tar will add ctime, atime and the PID
of the tar process that created the archive

...

This should work fine with a producer GNU-tar >= 1.28

We use the option only when SOURCE_DATE_EPOCH is set, and we assume that a recent enough tar is available then. Basically we say that if reproducible builds are desired, the proper tools should be available.

what will be the consumer of the tar file?

Not sure I understand this question. The archive is created as part of the build process and then installed, on most Linux distributions into /usr/share/kdevappwizard/templates.

Asking because I remember the case of alpine Linux and its 'abuild' producer. Alpine package manager implemented its own tar and that would only be happy with

--pax-option=exthdr.name=%d/PaxHeaders/%f,atime:=0,ctime:=0

Does this mean it won't be happy with delete=atime,delete=ctime? And what does it mean that the package manager has its own tar, is this is a different tar than is used in the distribution itself?

The other thing is about POSIXLY_CORRECT - it is only considered in recent tar git (no releases last time I checked) and only for embedding the PID of the tar process. So I would just not mention it in the commit message to make it easier to understand:

The change is already committed, so I can't change the message anymore. I didn't know you were here, otherwise I would have added you earlier as reviewer.