Size of an open file object
Question:
Is there a way to find the size of a file object that is currently open?
Specifically, I am working with the tarfile module to create tarfiles, but I don’t want my tarfile to exceed a certain size. As far as I know, tarfile objects are file-like objects, so I imagine a generic solution would work.
Answers:
If you have the file descriptor, you can use fstat
to find out the size, if any. A more generic solution is to seek to the end of the file, and read its location there.
$ ls -la chardet-1.0.1.tgz
-rwxr-xr-x 1 vinko vinko 179218 2008-10-20 17:49 chardet-1.0.1.tgz
$ python
Python 2.5.1 (r251:54863, Jul 31 2008, 22:53:39)
[GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> f = open('chardet-1.0.1.tgz','rb')
>>> f.seek(0, os.SEEK_END)
>>> f.tell()
179218L
Adding ChrisJY’s idea to the example
>>> import os
>>> os.fstat(f.fileno()).st_size
179218L
>>>
Note: Based on the comments, f.seek(0, os.SEEK_END)
is must before calling f.tell()
, without which it would return a size of 0. The reason is that f.seek(0, os.SEEK_END)
moves the file object’s position to the end of the file.
Well, if the file object support the tell method, you can do:
current_size = f.tell()
That will tell you were it is currently writing. If you write in a sequential way this will be the size of the file.
Otherwise, you can use the file system capabilities, i.e. os.fstat
as suggested by others.
Another solution is using StringIO “if you are doing in-memory operations”.
with open(file_path, 'rb') as x:
body = StringIO()
body.write(x.read())
body.seek(0, 0)
Now body
behaves like a file object with various attributes like body.read()
.
body.len
gives the file size.
I was curious about the performance implications of both, since once you open a file, the name
attribute of the handle gives you the filename (so you can call os.stat
on it).
Here’s a function for the seek/tell method:
import io
def seek_size(f):
pos = f.tell()
f.seek(0, io.SEEK_END)
size = f.tell()
f.seek(pos) # back to where we were
return size
With a 65 MiB file on an SSD, Windows 10, this is some 6.5x faster than calling os.stat(f.name)
Is there a way to find the size of a file object that is currently open?
Specifically, I am working with the tarfile module to create tarfiles, but I don’t want my tarfile to exceed a certain size. As far as I know, tarfile objects are file-like objects, so I imagine a generic solution would work.
If you have the file descriptor, you can use fstat
to find out the size, if any. A more generic solution is to seek to the end of the file, and read its location there.
$ ls -la chardet-1.0.1.tgz
-rwxr-xr-x 1 vinko vinko 179218 2008-10-20 17:49 chardet-1.0.1.tgz
$ python
Python 2.5.1 (r251:54863, Jul 31 2008, 22:53:39)
[GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> f = open('chardet-1.0.1.tgz','rb')
>>> f.seek(0, os.SEEK_END)
>>> f.tell()
179218L
Adding ChrisJY’s idea to the example
>>> import os
>>> os.fstat(f.fileno()).st_size
179218L
>>>
Note: Based on the comments, f.seek(0, os.SEEK_END)
is must before calling f.tell()
, without which it would return a size of 0. The reason is that f.seek(0, os.SEEK_END)
moves the file object’s position to the end of the file.
Well, if the file object support the tell method, you can do:
current_size = f.tell()
That will tell you were it is currently writing. If you write in a sequential way this will be the size of the file.
Otherwise, you can use the file system capabilities, i.e. os.fstat
as suggested by others.
Another solution is using StringIO “if you are doing in-memory operations”.
with open(file_path, 'rb') as x:
body = StringIO()
body.write(x.read())
body.seek(0, 0)
Now body
behaves like a file object with various attributes like body.read()
.
body.len
gives the file size.
I was curious about the performance implications of both, since once you open a file, the name
attribute of the handle gives you the filename (so you can call os.stat
on it).
Here’s a function for the seek/tell method:
import io
def seek_size(f):
pos = f.tell()
f.seek(0, io.SEEK_END)
size = f.tell()
f.seek(pos) # back to where we were
return size
With a 65 MiB file on an SSD, Windows 10, this is some 6.5x faster than calling os.stat(f.name)