How to extract numbers from filename in Python?

Question

I need to extract just the numbers from file names such as:

GapPoints1.shp

GapPoints23.shp

GapPoints109.shp

How can I extract just the numbers from these files using Python? I’ll need to incorporate this into a for loop.

Asked By: Borealis

||

Source

Answer 1

So, you haven’t left any description of where these files are and how you’re getting them, but I assume you’d get the filenames using the os module.

As for getting the numbers out of the names, you’d be best off using regular expressions with re, something like this:

import re
def get_numbers_from_filename(filename):
    return re.search(r'd+', filename).group(0)

Then, to include that in a for loop, you’d run that function on each filename:

for filename in os.listdir(myfiledirectory):
   print get_numbers_from_filename(filename)

or something along those lines.

Answered By: jdotjdot

Answer 2

you can use regular expressions:

regex = re.compile(r'd+')

Then to get the strings that match:

regex.findall(filename)

This will return a list of strings which contain the numbers. If you actually want integers, you could use int:

[int(x) for x in regex.findall(filename)]

If there’s only 1 number in each filename, you could use regex.search(filename).group(0) (if you’re certain that it will produce a match). If no match is found, the above line will produce a AttributeError saying that NoneType has not attribute group.

Answered By: mgilson

Answer 3

If there is just one number:

filter(lambda x: x.isdigit(), filename)

Answered By: kelwinfc

Answer 4

Hear is my code I used to bring the published year of a paper to the first of filename, after the file is downloaded from google scholar.
The main files usually are constructed so: Author+publishedYear.pdf hence, by implementing this code the filename will become: PublishedYear+Author.pdf.

# Renaming Pdf according to number extraction
# You want to rename a pdf file, so the digits of document published year comes first.
# Use regular expersion
# As long as you implement this file, the other pattern will be accomplished to your filename.

# import libraries
import re
import os

# Change working directory to this folder
address = os.getcwd ()
os.chdir(address)

# defining a class with two function
class file_name:
    # Define a function to extract any digits
    def __init__ (self, filename):
        self.filename = filename

    # Because we have tow pattern, we must define tow function.
    # First function for pattern as : schrodinger1990.pdf
    def number_extrction_pattern_non_digits_first (filename):

        pattern = (r'(D+)(d+)(.pdf)')
        digits_pattern_non_digits_first = re.search(pattern, filename, re.IGNORECASE).group (2) 
        non_digits_pattern_non_digits_first = re.search(pattern, filename, re.IGNORECASE).group (1)
        return digits_pattern_non_digits_first, non_digits_pattern_non_digits_first

    # Second function for pattern as : 1993schrodinger.pdf
    def number_extrction_pattern_digits_first (filename):

        pattern = (r'(d+)(D+)(.pdf)')
        digits_pattern_digits_first = re.search(pattern, filename, re.IGNORECASE).group (1) 
        non_digits_pattern_digits_first = re.search(pattern, filename, re.IGNORECASE).group (2)

        return digits_pattern_digits_first, non_digits_pattern_digits_first



if __name__ == '__main__':

    # Define a pattern to check filename pattern
    pattern_check1 = (r'(D+)(d+)(.pdf)')

    # Declare each file address.
    for filename in os.listdir(address):

        if filename.endswith('.pdf'):
            if re.search(pattern_check1, filename, re.IGNORECASE):

                digits = file_name.number_extrction_pattern_non_digits_first (filename)[0]
                non_digits = file_name.number_extrction_pattern_non_digits_first (filename)[1]
                os.rename(filename, digits + non_digits + '.pdf')

            # Else other pattern exists.    
            else :

                digits = file_name.number_extrction_pattern_digits_first (filename)[0]
                non_digits = file_name.number_extrction_pattern_digits_first (filename)[1]
                os.rename(filename, digits + non_digits + '.pdf')

Answered By: StephanSchrodinger

Answer 5

I had similar problem. what jdotjdot answered, works perfectly.

import re
def get_numbers_from_filename(filename):
    return re.search(r'd+', filename).group(0)

for filename in os.listdir(myfiledirectory):
   print get_numbers_from_filename(filename)

what if I dont want to print get_numbers_from_filename(filename), but instead I want something to be called lets say m = get_numbers_from_filename(filename) and to be able to use that m later on. I am trying to do it but it is always some kind of error.

I wanna be able to have in_file = "something" + m

Answered By: Aida

How to extract numbers from filename in Python?

Question:

Answers: