How do I split a string to extract only uppercase string or uppercase followed by float?

Question:

I am using Selenium with Python to scrape some file information. I would like to extract only the file type and version number if available eg. GML 3.1.1. I’m looking for the split function to do so. My current response is a list that looks like this:

ESRI Shapefile, (50.7 kB)
GML 3.1.1, (124.9 kB)
Google Earth KML 2.1, (126.5 kB)
MapInfo MIF, (53.5 kB)

The script section is as follows:

for file in files:
    file_format = file.text
    print(file_format)

I’m looking for the strip() function that checks if the word before the comma is uppercase or uppercase followed by float. The following is the output I’m looking for:

ESRI
GML 3.1.1
KML 2.1
MIF
Asked By: h1m aga1n

||

Answers:

Using a regex that finds words of all uppercase letters followed optionally by a space and digits / dots would work here:

s='''ESRI Shapefile, (50.7 kB)
GML 3.1.1, (124.9 kB)
Google Earth KML 2.1, (126.5 kB)
MapInfo MIF, (53.5 kB)'''

import re

re.findall(r'b[A-Z]+b(?:s[d.]+)?', s)
['ESRI', 'GML 3.1.1', 'KML 2.1', 'MIF']
Answered By: Mark
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.