Regex to match city names from text with numbers
Question:
I have a string with the names of a cities and the numbers of people living in them. I need to match only names of cities using Regex
city = "New York - 8 468 000 Los Angeles - 3 849 000 Berlin - 3 645 000"
tried this
[a-zA-Z]+(?:[s-][a-zA-Z]+)*$
but it returns "None"
Answers:
Try:
([^-]+?)s*-s*([ds]+)
import re
city = "New York - 8 468 000 Los Angeles - 3 849 000 Berlin - 3 645 000"
pat = re.compile(r"([^-]+?)s*-s*([ds]+)")
for c, n in pat.findall(city):
print(c, int(n.replace(" ", "")))
Prints:
New York 8468000
Los Angeles 3849000
Berlin 3645000
EDIT: If you don’t need numbers:
import re
city = "New York - 8 468 000 Los Angeles - 3 849 000 Berlin - 3 645 000"
pat = re.compile(r"([^-]+?)s*-s*[ds]+")
for c in pat.findall(city):
print(c)
Prints:
New York
Los Angeles
Berlin
If you want all cities as a single string you can use [a-zA-Z]+
to disregard all numbers and return a single string:
cities = " ".join(re.findall("[a-zA-Z]+", city))
Returning:
'New York Los Angeles Berlin'
Otherwise if you want them separated, I would split by -
first and then return using the same method as above in a list-comprehension way:
cities = [" ".join(re.findall("[a-zA-Z]+",x)) for x in city.split('-')[:-1]
Returning:
['New York','Los Angeles','Berlin']
I have a string with the names of a cities and the numbers of people living in them. I need to match only names of cities using Regex
city = "New York - 8 468 000 Los Angeles - 3 849 000 Berlin - 3 645 000"
tried this
[a-zA-Z]+(?:[s-][a-zA-Z]+)*$
but it returns "None"
Try:
([^-]+?)s*-s*([ds]+)
import re
city = "New York - 8 468 000 Los Angeles - 3 849 000 Berlin - 3 645 000"
pat = re.compile(r"([^-]+?)s*-s*([ds]+)")
for c, n in pat.findall(city):
print(c, int(n.replace(" ", "")))
Prints:
New York 8468000
Los Angeles 3849000
Berlin 3645000
EDIT: If you don’t need numbers:
import re
city = "New York - 8 468 000 Los Angeles - 3 849 000 Berlin - 3 645 000"
pat = re.compile(r"([^-]+?)s*-s*[ds]+")
for c in pat.findall(city):
print(c)
Prints:
New York
Los Angeles
Berlin
If you want all cities as a single string you can use [a-zA-Z]+
to disregard all numbers and return a single string:
cities = " ".join(re.findall("[a-zA-Z]+", city))
Returning:
'New York Los Angeles Berlin'
Otherwise if you want them separated, I would split by -
first and then return using the same method as above in a list-comprehension way:
cities = [" ".join(re.findall("[a-zA-Z]+",x)) for x in city.split('-')[:-1]
Returning:
['New York','Los Angeles','Berlin']