How to separate column in dataframe pandas

Question:

I have DataFrame containing values about shops and categories in one column.

Date Spent Category/Shop
2022-08-04 126.98 Supermarkets
2022-08-04 NaN ShopName
2022-08-04 119.70 Supermarkets
2022-08-04 NaN ShopName

I need to separate last column into to columns:

Date Spent Category Shop
2022-08-04 126.98 Supermarkets ShopName
2022-08-04 119.70 Supermarkets ShopName

How can this be done?

We can assume that every second row in the Category/Shop column contains the name of the store that needs to be moved to a new column.

Asked By: Egor

||

Answers:

Based on the sample and expecting future similar behavior I would do it with groupby

df = df.fillna(method='ffill').groupby(['Date','Spent'])['Category/Shop'].apply(list).reset_index()
df['Category'],df['Shop'] = df['Category/Shop'].str[0],df['Category/Shop'].str[1]
df = df.drop(columns='Category/Shop')

Outputting:

         Date   Spent      Category      Shop
0  2022-08-04  119.70  Supermarkets  ShopName
1  2022-08-04  126.98  Supermarkets  ShopName
Answered By: Celius Stingher

Apply the pandas series str. split() function on the “Address” column and pass the delimiter (comma in this case) on which you want to split the column. Also, make sure to pass True to the expand parameter.

Answered By: RUTURAJ KALMEGH

I would go for iloc to retrieve every second row, and build a new dataframe with pd.concat.

Given df is your source DataFrame, it would look like this:

pd.concat(
    [
        # start at the first row and use every 2nd row of df
        df.iloc[::2].reset_index(drop=True),
        # start at the second row and use every 2nd row of df, but only the last column
        df.iloc[1::2]["category/shop"].reset_index(drop=True)
    ],
    # concatenate along columns
    axis=1
)

Output:

         date   spent category/shop category/shop
0  2022-08-04  126.98  Supermarkets      ShopName
1  2022-08-04  119.70  Supermarkets      ShopName
Answered By: pkeilbach

One can use a list comprehension as follows

df['Category'], df['Shop'] = zip(*[('Supermarkets', 'ShopName') if x == 'Supermarkets' else ('ShopName', 'ShopName') for x in df['Category/Shop']])

[Out]:

         Date   Spent Category/Shop      Category      Shop
0  2022-08-04  126.98  Supermarkets  Supermarkets  ShopName
1  2022-08-04     NaN      ShopName      ShopName  ShopName
2  2022-08-04  119.70  Supermarkets  Supermarkets  ShopName
3  2022-08-04     NaN      ShopName      ShopName  ShopName

Then, as one doesn’t need the rows where Spent is NaN, nor the column Category/Shop, one can drop both with pandas.DataFrame.dropna and pandas.DataFrame.drop

df = df.dropna(subset=['Spent']).drop(columns=['Category/Shop'])

[Out]:

         Date   Spent      Category      Shop
0  2022-08-04  126.98  Supermarkets  ShopName
2  2022-08-04  119.70  Supermarkets  ShopName

or, if one wants to reset the index, pass, as well, pandas.DataFrame.reset_index

df = df.dropna(subset=['Spent']).drop(columns=['Category/Shop']).reset_index(drop=True)

[Out]:

         Date   Spent      Category      Shop
0  2022-08-04  126.98  Supermarkets  ShopName
1  2022-08-04  119.70  Supermarkets  ShopName
Answered By: Gonçalo Peres
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.