Python: strip pair-wise column names
Question:
I have a DataFrame with columns that look like this:
df=pd.DataFrame(columns=['(NYSE_close, close)','(NYSE_close, open)','(NYSE_close, volume)', '(NASDAQ_close, close)','(NASDAQ_close, open)','(NASDAQ_close, volume)'])
df:
(NYSE_close, close) (NYSE_close, open) (NYSE_close, volume) (NASDAQ_close, close) (NASDAQ_close, open) (NASDAQ_close, volume)
I want to remove everything after the underscore and append whatever comes after the comma to get the following:
df:
NYSE_close NYSE_open NYSE_volume NASDAQ_close NASDAQ_open NASDAQ_volume
I tried to strip the column name but it replaced it with nan. Any suggestions on how to do that?
Thank you in advance.
Answers:
You could use re.sub
to extract the appropriate parts of the column names to replace them with:
import re
df=pd.DataFrame(columns=['(NYSE_close, close)','(NYSE_close, open)','(NYSE_close, volume)', '(NASDAQ_close, close)','(NASDAQ_close, open)','(NASDAQ_close, volume)'])
df.columns = [re.sub(r'(([^_]+_)w+, (w+))', r'12', c) for c in df.columns]
Output:
Empty DataFrame
Columns: [NYSE_close, NYSE_open, NYSE_volume, NASDAQ_close, NASDAQ_open, NASDAQ_volume]
Index: []
You could:
import re
def cvt_col(x):
s = re.sub('[()_,]', ' ', x).split()
return s[0] + '_' + s[2]
df.rename(columns = cvt_col)
Empty DataFrame
Columns: [NYSE_close, NYSE_open, NYSE_volume, NASDAQ_close, NASDAQ_open, NASDAQ_volume]
Index: []
Use a list comprehension, twice:
step1 = [ent.strip('()').split(',') for ent in df]
df.columns = ["_".join([left.split('_')[0], right.strip()])
for left, right in step1]
df
Empty DataFrame
Columns: [NYSE_close, NYSE_open, NYSE_volume, NASDAQ_close, NASDAQ_open, NASDAQ_volume]
Index: []
I have a DataFrame with columns that look like this:
df=pd.DataFrame(columns=['(NYSE_close, close)','(NYSE_close, open)','(NYSE_close, volume)', '(NASDAQ_close, close)','(NASDAQ_close, open)','(NASDAQ_close, volume)'])
df:
(NYSE_close, close) (NYSE_close, open) (NYSE_close, volume) (NASDAQ_close, close) (NASDAQ_close, open) (NASDAQ_close, volume)
I want to remove everything after the underscore and append whatever comes after the comma to get the following:
df:
NYSE_close NYSE_open NYSE_volume NASDAQ_close NASDAQ_open NASDAQ_volume
I tried to strip the column name but it replaced it with nan. Any suggestions on how to do that?
Thank you in advance.
You could use re.sub
to extract the appropriate parts of the column names to replace them with:
import re
df=pd.DataFrame(columns=['(NYSE_close, close)','(NYSE_close, open)','(NYSE_close, volume)', '(NASDAQ_close, close)','(NASDAQ_close, open)','(NASDAQ_close, volume)'])
df.columns = [re.sub(r'(([^_]+_)w+, (w+))', r'12', c) for c in df.columns]
Output:
Empty DataFrame
Columns: [NYSE_close, NYSE_open, NYSE_volume, NASDAQ_close, NASDAQ_open, NASDAQ_volume]
Index: []
You could:
import re
def cvt_col(x):
s = re.sub('[()_,]', ' ', x).split()
return s[0] + '_' + s[2]
df.rename(columns = cvt_col)
Empty DataFrame
Columns: [NYSE_close, NYSE_open, NYSE_volume, NASDAQ_close, NASDAQ_open, NASDAQ_volume]
Index: []
Use a list comprehension, twice:
step1 = [ent.strip('()').split(',') for ent in df]
df.columns = ["_".join([left.split('_')[0], right.strip()])
for left, right in step1]
df
Empty DataFrame
Columns: [NYSE_close, NYSE_open, NYSE_volume, NASDAQ_close, NASDAQ_open, NASDAQ_volume]
Index: []