Zipfile lib weird behaviour with seconds in modified time

Question:

Working with zipfile module I found something weird about how it works.

I’m zipping one file, which last modified attr time is: 13:40:31 (HH:MM:SS)
When I zip and unzip the file, its last mod time is 13:40:30 (lost 1 second)

Doing some tests around this, I used ZipInfo object to manually set the last modified time to 13:40:31 but still get 13:40:30.

I also tried setting to 13:40:41 and then I got 13:40:40.

Trying any other value to seconds, it works fine, so if I set it to 13:40:32, it’s ok when unzip the file.

Any clue about this? Am I missing something?

OS: Windows 10 (64 bits)
Python: 3.7

Test
Just compress any file and then unzip it and compare last modified time

file = 'testfile.txt'

zf = zipfile.ZipFile(file='test.zip', mode='w', compression=zipfile.ZIP_DEFLATED)

info = zipfile.ZipInfo(file, 
    date_time=(2020, 9, 23, 13, 40, 31))

zf.writestr(info, open(file, 'r').read(), zipfile.ZIP_DEFLATED, 6)
zf.close()
Asked By: webbi

||

Answers:

[EDIT: Updated to document Linux & Windows behaviour]

Legacy Behaviour

By default zip files store timestamps to a 2 second accuracy. This dates waaay back in time to when DOS ruled the world and every bit counted. Below is the definition of how it works from the Zip spec (APPNOTE.TXT)

4.4.6 date and time fields: (2 bytes each)
 
The date and time are encoded in standard MS-DOS format.
If input came from standard input, the date and time are
those at which compression was started for this data.
If encrypting the central directory and general purpose bit
flag 13 is set indicating masking, the value stored in the
Local Header will be zero. MS-DOS time format is different
from more commonly used computer time formats such as
UTC. For example, MS-DOS uses year values relative to 1980
and 2 second precision
.

 
Although the default legacy 2-second precision timestamp is still present in all zip files, most modern zip implementations also use one (or more) extended attributes to store the timestamp accurately to (at least) one second accuracy. These extended attributes take priority over the legacy 2-second precision timestamp in applications that support them.

Looks like Python doesn’t currently support these extended attributes. See Issue 49707 for the details.

A well-known exception to the support for better datetime support is the Windows right-click/Send-To/Compressed-folder — that still only supports only the old legacy 2 seconds granularity.

Linux/MacOS

On Linux (and some Windows) zip applications the predominant datetime extension, called the Extended Timestamp Extra Field, stores one or more of the modification, access & creation times in standard Unix/Linux format, namely the elapsed number of seconds since 1 January 1970 00:00:00 UTC. See "Extended Timestamp Extra Field" in extrafld.txt for the full details.

Windows

Some Windows zip implementation use the "NTFS" attributes extension to store the modification, creation and access time as a 64-bit value. The definition of the 64-bit value is shown below (taken from §4.5.5, "NTFS Extra Field (0x000a)" in APPNOTE.TXT)

They determine the number of 1.0E-07 seconds (1/10th microseconds!)
past WinNT "epoch", which is "01-Jan-1601 00:00:00 UTC".

One final point – it is valid for zip files to have both the Linux & Windows extended timestamp extensions at the same time. The unzipping application decides which to use based on the OS it is running on.

Answered By: pmqs
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.