How to separate column in dataframe pandas
Question:
I have DataFrame containing values about shops and categories in one column.
Date
Spent
…
Category/Shop
2022-08-04
126.98
…
Supermarkets
2022-08-04
NaN
…
ShopName
2022-08-04
119.70
…
Supermarkets
2022-08-04
NaN
…
ShopName
…
I need to separate last column into to columns:
Date
Spent
…
Category
Shop
2022-08-04
126.98
…
Supermarkets
ShopName
2022-08-04
119.70
…
Supermarkets
ShopName
How can this be done?
We can assume that every second row in the Category/Shop column contains the name of the store that needs to be moved to a new column.
Answers:
Based on the sample and expecting future similar behavior I would do it with groupby
df = df.fillna(method='ffill').groupby(['Date','Spent'])['Category/Shop'].apply(list).reset_index()
df['Category'],df['Shop'] = df['Category/Shop'].str[0],df['Category/Shop'].str[1]
df = df.drop(columns='Category/Shop')
Outputting:
Date Spent Category Shop
0 2022-08-04 119.70 Supermarkets ShopName
1 2022-08-04 126.98 Supermarkets ShopName
Apply the pandas series str. split() function on the “Address” column and pass the delimiter (comma in this case) on which you want to split the column. Also, make sure to pass True to the expand parameter.
I would go for iloc
to retrieve every second row, and build a new dataframe with pd.concat
.
Given df
is your source DataFrame, it would look like this:
pd.concat(
[
# start at the first row and use every 2nd row of df
df.iloc[::2].reset_index(drop=True),
# start at the second row and use every 2nd row of df, but only the last column
df.iloc[1::2]["category/shop"].reset_index(drop=True)
],
# concatenate along columns
axis=1
)
Output:
date spent category/shop category/shop
0 2022-08-04 126.98 Supermarkets ShopName
1 2022-08-04 119.70 Supermarkets ShopName
One can use a list comprehension as follows
df['Category'], df['Shop'] = zip(*[('Supermarkets', 'ShopName') if x == 'Supermarkets' else ('ShopName', 'ShopName') for x in df['Category/Shop']])
[Out]:
Date Spent Category/Shop Category Shop
0 2022-08-04 126.98 Supermarkets Supermarkets ShopName
1 2022-08-04 NaN ShopName ShopName ShopName
2 2022-08-04 119.70 Supermarkets Supermarkets ShopName
3 2022-08-04 NaN ShopName ShopName ShopName
Then, as one doesn’t need the rows where Spent
is NaN
, nor the column Category/Shop
, one can drop both with pandas.DataFrame.dropna
and pandas.DataFrame.drop
df = df.dropna(subset=['Spent']).drop(columns=['Category/Shop'])
[Out]:
Date Spent Category Shop
0 2022-08-04 126.98 Supermarkets ShopName
2 2022-08-04 119.70 Supermarkets ShopName
or, if one wants to reset the index, pass, as well, pandas.DataFrame.reset_index
df = df.dropna(subset=['Spent']).drop(columns=['Category/Shop']).reset_index(drop=True)
[Out]:
Date Spent Category Shop
0 2022-08-04 126.98 Supermarkets ShopName
1 2022-08-04 119.70 Supermarkets ShopName
I have DataFrame containing values about shops and categories in one column.
Date | Spent | … | Category/Shop |
---|---|---|---|
2022-08-04 | 126.98 | … | Supermarkets |
2022-08-04 | NaN | … | ShopName |
2022-08-04 | 119.70 | … | Supermarkets |
2022-08-04 | NaN | … | ShopName |
…
I need to separate last column into to columns:
Date | Spent | … | Category | Shop |
---|---|---|---|---|
2022-08-04 | 126.98 | … | Supermarkets | ShopName |
2022-08-04 | 119.70 | … | Supermarkets | ShopName |
How can this be done?
We can assume that every second row in the Category/Shop column contains the name of the store that needs to be moved to a new column.
Based on the sample and expecting future similar behavior I would do it with groupby
df = df.fillna(method='ffill').groupby(['Date','Spent'])['Category/Shop'].apply(list).reset_index()
df['Category'],df['Shop'] = df['Category/Shop'].str[0],df['Category/Shop'].str[1]
df = df.drop(columns='Category/Shop')
Outputting:
Date Spent Category Shop
0 2022-08-04 119.70 Supermarkets ShopName
1 2022-08-04 126.98 Supermarkets ShopName
Apply the pandas series str. split() function on the “Address” column and pass the delimiter (comma in this case) on which you want to split the column. Also, make sure to pass True to the expand parameter.
I would go for iloc
to retrieve every second row, and build a new dataframe with pd.concat
.
Given df
is your source DataFrame, it would look like this:
pd.concat(
[
# start at the first row and use every 2nd row of df
df.iloc[::2].reset_index(drop=True),
# start at the second row and use every 2nd row of df, but only the last column
df.iloc[1::2]["category/shop"].reset_index(drop=True)
],
# concatenate along columns
axis=1
)
Output:
date spent category/shop category/shop
0 2022-08-04 126.98 Supermarkets ShopName
1 2022-08-04 119.70 Supermarkets ShopName
One can use a list comprehension as follows
df['Category'], df['Shop'] = zip(*[('Supermarkets', 'ShopName') if x == 'Supermarkets' else ('ShopName', 'ShopName') for x in df['Category/Shop']])
[Out]:
Date Spent Category/Shop Category Shop
0 2022-08-04 126.98 Supermarkets Supermarkets ShopName
1 2022-08-04 NaN ShopName ShopName ShopName
2 2022-08-04 119.70 Supermarkets Supermarkets ShopName
3 2022-08-04 NaN ShopName ShopName ShopName
Then, as one doesn’t need the rows where Spent
is NaN
, nor the column Category/Shop
, one can drop both with pandas.DataFrame.dropna
and pandas.DataFrame.drop
df = df.dropna(subset=['Spent']).drop(columns=['Category/Shop'])
[Out]:
Date Spent Category Shop
0 2022-08-04 126.98 Supermarkets ShopName
2 2022-08-04 119.70 Supermarkets ShopName
or, if one wants to reset the index, pass, as well, pandas.DataFrame.reset_index
df = df.dropna(subset=['Spent']).drop(columns=['Category/Shop']).reset_index(drop=True)
[Out]:
Date Spent Category Shop
0 2022-08-04 126.98 Supermarkets ShopName
1 2022-08-04 119.70 Supermarkets ShopName