Regex to match city names from text with numbers

Question:

I have a string with the names of a cities and the numbers of people living in them. I need to match only names of cities using Regex

city = "New York - 8 468 000 Los Angeles - 3 849 000 Berlin - 3 645 000"

tried this

[a-zA-Z]+(?:[s-][a-zA-Z]+)*$

but it returns "None"

Asked By: gleb

||

Answers:

Try:

([^-]+?)s*-s*([ds]+)

Regex demo.


import re

city = "New York - 8 468 000 Los Angeles - 3 849 000 Berlin - 3 645 000"

pat = re.compile(r"([^-]+?)s*-s*([ds]+)")

for c, n in pat.findall(city):
    print(c, int(n.replace(" ", "")))

Prints:

New York 8468000
Los Angeles 3849000
Berlin 3645000

EDIT: If you don’t need numbers:

import re

city = "New York - 8 468 000 Los Angeles - 3 849 000 Berlin - 3 645 000"

pat = re.compile(r"([^-]+?)s*-s*[ds]+")

for c in pat.findall(city):
    print(c)

Prints:

New York
Los Angeles
Berlin
Answered By: Andrej Kesely

If you want all cities as a single string you can use [a-zA-Z]+ to disregard all numbers and return a single string:

cities = " ".join(re.findall("[a-zA-Z]+", city))

Returning:

'New York Los Angeles Berlin'

Otherwise if you want them separated, I would split by - first and then return using the same method as above in a list-comprehension way:

cities = [" ".join(re.findall("[a-zA-Z]+",x)) for x in city.split('-')[:-1]

Returning:

['New York','Los Angeles','Berlin']
Answered By: Celius Stingher

Try this:

[a-zA-Z]+ ?[a-zA-Z]+(?= *-)

See regex demo.

Answered By: SaSkY
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.