How to get file extension correctly?

Question:

I know that this question is asked many times on this website. But I found that they missed an important point: only file extension with one period was taken into consider like *.png *.mp3, but how do I deal with these filename with two period like .tar.gz.

The basic code is:

filename = '/home/lancaster/Downloads/a.ppt'
extention = filename.split('/')[-1]

But obviously, this code do not work with the file like a.tar.gz.
How to deal with it? Thanks.

Asked By: Page David

||

Answers:

Here is a in build module in os. More about os.path.splitext.

In [1]: from os.path import splitext
In [2]: file_name,extension = splitext('/home/lancaster/Downloads/a.ppt')
In [3]: extension
Out[1]: '.ppt'

If you have to fine the extension of .tar.gz,.tar.bz2 you have to write a function like this

from os.path import splitext
def splitext_(path):
    for ext in ['.tar.gz', '.tar.bz2']:
        if path.endswith(ext):
            return path[:-len(ext)], path[-len(ext):]
    return splitext(path)

Result

In [4]: file_name,ext = splitext_('/home/lancaster/Downloads/a.tar.gz')
In [5]: ext
Out[2]: '.tar.gz'

Edit

Generally you can use this function

from os.path import splitext
def splitext_(path):
    if len(path.split('.')) > 2:
        return path.split('.')[0],'.'.join(path.split('.')[-2:])
    return splitext(path)

It will work for all extensions.

Working on all files.

In [6]: inputs = ['a.tar.gz', 'b.tar.lzma', 'a.tar.lz', 'a.tar.lzo', 'a.tar.xz','a.png']
In [7]: for file_ in inputs:                                                                    
    file_name,extension = splitext_(file_)
    print extension
   ....:     
tar.gz
tar.lzma
tar.lz
tar.lzo
tar.xz
.png
Answered By: Rahul K P

One possible way is:

  1. Slice at “.” => tmp_ext = filename.split('.')[1:]

Result is a list = ['tar', 'gz']

  1. Join them together => extention = ".".join(tmp_ext)

Result is your extension as string = 'tar.gz'

Update: Example:

>>> test = "/test/test/test.tar.gz"
>>> t2 = test.split(".")[1:]
>>> t2
['tar', 'gz']
>>> ".".join(t2)
'tar.gz'
Answered By: no11
filename = '/home/lancaster/Downloads/a.tar.gz'
extention = filename.split('/')[-1]

if '.' in extention:
  extention = extention.split('.')[-1]
  if len(extention) > 0:
    extention = '.'+extention
    print extention
Answered By: akshay.s.jagtap

Simplest One:

import os.path
print os.path.splitext("/home/lancaster/Downloads/a.ppt")[1]
# '.ppt'
Answered By: Saket Mittal

The role of a file extension is to tell the viewer (and sometimes the computer) which application to use to handle the file.

Taking your worst-case example in your comments (a.ppt.tar.gz), this is a PowerPoint file that has been tar-balled and then gzipped. So you need to use a gzip-handling program to open it. Using PowerPoint or a tarball-handling program wouldn’t work. OK, a clever program that knew how to handle both .tar and .gz files could understand both operations and work with a .tar.gz file – but note that it would do that even if the extension was simply .gz.

The fact that both tar and gzip add their extensions to the original filename, rather than replace them (as zip does) is a convenience. But the base name of the gzip file is still a.ppt.tar.

Answered By: John Burger
>>> import os
>>> import re

>>> filename = os.path.basename('/home/lancaster/Downloads/a.ppt')  
>>> extensions = re.findall(r'.([^.]+)', basename)
['ppt']


>>> filename = os.path.basename('/home/lancaster/Downloads/a.ppt.tar.gz')  
>>> extensions = re.findall(r'.([^.]+)', basename)
['ppt','tar','gz']
Answered By: matt

Python 3.4

You can now use Path from pathlib. It has many features, one of them is suffix:

>>> from pathlib import Path
>>> Path('my/library/setup.py').suffix
'.py'
>>> Path('my/library.tar.gz').suffix
'.gz'
>>> Path('my/library').suffix
''

If you want to get more than one suffix, use suffixes:

>>> from pathlib import Path
>>> Path('my/library.tar.gar').suffixes
['.tar', '.gar']
>>> Path('my/library.tar.gz').suffixes
['.tar', '.gz']
>>> Path('my/library').suffixes
[]
Answered By: Or Duan
with re.findall and python 3.6

filename = '/home/Downloads/abc.ppt.tar.gz'

ext = r'.w{1,6}'

re.findall(f'{ext}\b | {ext}$', filename,  re.X)

['.ppt', '.tar', '.gz']
Answered By: LetzerWille
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.