-
-
Notifications
You must be signed in to change notification settings - Fork 33.9k
Description
Proposal:
It would be handy if the shutil module provided a convenient way to opt in to the build artifact reproducibility features described in https://reproducible-builds.org/docs/archives/
Such an addition would likely make more sense as a new shutil.make_reproducible_archive function, rather than trying to shoehorn the new functionality into the existing shutil.make_archive API.
The specific problem that prompted this feature idea was encountering this traceback trying to set owner=0 and group=0 in shutil.make_archive:
Traceback (most recent call last):
[snip application details]
File "/home/acoghlan/...[snip]...", line 97, in create_archive
archive_with_extension = shutil.make_archive(
^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.12/shutil.py", line 1188, in make_archive
filename = func(base_name, base_dir, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.12/shutil.py", line 992, in _make_tarball
uid = _get_uid(owner)
^^^^^^^^^^^^^^^
File "/usr/lib64/python3.12/shutil.py", line 941, in _get_uid
result = getpwnam(name)
^^^^^^^^^^^^^^
TypeError: getpwnam() argument must be str, not int
tarfile itself does support setting numeric owner and group IDs (via addfile and the filter option on add),but the high level shutil wrapper assumes the owner and group will always be given via names that can be looked up on the current system, it doesn't allow them to be specified numerically.
While supporting numeric uids and gids in the high level API would be mildly helpful, it isn't necessarily the most useful way to address the limitation since the only value anyone would ever likely pass numerically is 0 (which can be worked around on many systems by passing "root" as a symbolic name), and their actual goal would be to indicate that the archive is intended to be a reproducible build artifact, so they actively don't want to include environmental details that are specific to that particular invocation.
As things are now, it isn't a massive burden to copy-and-paste the _make_tarball code from shutil.py and adapt it for build artifact creation purposes, but I also think there genuinely are two very different use cases for archive creation (backups where you want to reproduce the original environment as faithfully as possible, and build artifacts that you want to make as portable and build system independent as possible), so there's potentially merit in offering a separate high level API for the case that isn't as well served by the existing high level API.
Has this already been discussed elsewhere?
No response given
Links to previous discussion of this feature:
No response