Pandas split column if condition, else null
Question:
I want to split a column. If it has a letter (any letter) at the end, this will be the value for the second column. Otherwise, the second column should be null
import pandas as pd
data = pd.DataFrame({"data": ["0.00I", "0.01E", "99.99", "0.14F"]})
desired result:
a b
0 0.00 I
1 0.01 E
2 99.99 None
3 0.14 F
Answers:
You can use str.extract
with the (d+(?:.d+)?)(D)?
regex:
out = data['data'].str.extract(r'(d+(?:.d+)?)(D)?').set_axis(['a', 'b'], axis=1)
Or, if you want to remove the original ‘data’ column while adding new columns in place:
data[['a', 'b']] = data.pop('data').str.extract('(d+(?:.d+)?)(D)?')
output:
a b
0 0.00 I
1 0.01 E
2 99.99 NaN
3 0.14 F
(d+(?:.d+)?) # capture a number (with optional decimal)
(D)? # optionally capture a non-digit
try using a regular expression and apply
def find_numeric(x):
if re.search("[[0-9]+.[0-9]+", x):
return re.search("[0-9]+.[0-9]+", x).group()
else:
return ""
def find_alpha(x):
if re.search("[a-zA-Z]+", x):
return re.search("[a-zA-Z]+", x).group()
else:
return "None"
data = pd.DataFrame({"data": ["0.00I", "0.01E", "99.99", "0.14F"]})
data["numeric"] = data["data"].apply(find_numeric)
data["alpha"]= data["data"].apply(find_alpha)
print(data)
output:
data numeric alpha
0 0.00I 0.00 I
1 0.01E 0.01 E
2 99.99 99.99 None
3 0.14F 0.14 F
I want to split a column. If it has a letter (any letter) at the end, this will be the value for the second column. Otherwise, the second column should be null
import pandas as pd
data = pd.DataFrame({"data": ["0.00I", "0.01E", "99.99", "0.14F"]})
desired result:
a b
0 0.00 I
1 0.01 E
2 99.99 None
3 0.14 F
You can use str.extract
with the (d+(?:.d+)?)(D)?
regex:
out = data['data'].str.extract(r'(d+(?:.d+)?)(D)?').set_axis(['a', 'b'], axis=1)
Or, if you want to remove the original ‘data’ column while adding new columns in place:
data[['a', 'b']] = data.pop('data').str.extract('(d+(?:.d+)?)(D)?')
output:
a b
0 0.00 I
1 0.01 E
2 99.99 NaN
3 0.14 F
(d+(?:.d+)?) # capture a number (with optional decimal)
(D)? # optionally capture a non-digit
try using a regular expression and apply
def find_numeric(x):
if re.search("[[0-9]+.[0-9]+", x):
return re.search("[0-9]+.[0-9]+", x).group()
else:
return ""
def find_alpha(x):
if re.search("[a-zA-Z]+", x):
return re.search("[a-zA-Z]+", x).group()
else:
return "None"
data = pd.DataFrame({"data": ["0.00I", "0.01E", "99.99", "0.14F"]})
data["numeric"] = data["data"].apply(find_numeric)
data["alpha"]= data["data"].apply(find_alpha)
print(data)
output:
data numeric alpha
0 0.00I 0.00 I
1 0.01E 0.01 E
2 99.99 99.99 None
3 0.14F 0.14 F