How to separate column in dataframe pandas

Question

I have DataFrame containing values about shops and categories in one column.

Date	Spent	…	Category/Shop
2022-08-04	126.98	…	Supermarkets
2022-08-04	NaN	…	ShopName
2022-08-04	119.70	…	Supermarkets
2022-08-04	NaN	…	ShopName

…

I need to separate last column into to columns:

Date	Spent	…	Category	Shop
2022-08-04	126.98	…	Supermarkets	ShopName
2022-08-04	119.70	…	Supermarkets	ShopName

How can this be done?

We can assume that every second row in the Category/Shop column contains the name of the store that needs to be moved to a new column.

Asked By: Egor

||

Source

Answer 1

Based on the sample and expecting future similar behavior I would do it with groupby

df = df.fillna(method='ffill').groupby(['Date','Spent'])['Category/Shop'].apply(list).reset_index()
df['Category'],df['Shop'] = df['Category/Shop'].str[0],df['Category/Shop'].str[1]
df = df.drop(columns='Category/Shop')

Outputting:

         Date   Spent      Category      Shop
0  2022-08-04  119.70  Supermarkets  ShopName
1  2022-08-04  126.98  Supermarkets  ShopName

Answered By: Celius Stingher

Answer 2

Apply the pandas series str. split() function on the “Address” column and pass the delimiter (comma in this case) on which you want to split the column. Also, make sure to pass True to the expand parameter.

Answered By: RUTURAJ KALMEGH

Answer 3

I would go for iloc to retrieve every second row, and build a new dataframe with pd.concat.

Given df is your source DataFrame, it would look like this:

pd.concat(
    [
        # start at the first row and use every 2nd row of df
        df.iloc[::2].reset_index(drop=True),
        # start at the second row and use every 2nd row of df, but only the last column
        df.iloc[1::2]["category/shop"].reset_index(drop=True)
    ],
    # concatenate along columns
    axis=1
)

Output:

         date   spent category/shop category/shop
0  2022-08-04  126.98  Supermarkets      ShopName
1  2022-08-04  119.70  Supermarkets      ShopName

Answered By: pkeilbach

Answer 4

One can use a list comprehension as follows

df['Category'], df['Shop'] = zip(*[('Supermarkets', 'ShopName') if x == 'Supermarkets' else ('ShopName', 'ShopName') for x in df['Category/Shop']])

[Out]:

         Date   Spent Category/Shop      Category      Shop
0  2022-08-04  126.98  Supermarkets  Supermarkets  ShopName
1  2022-08-04     NaN      ShopName      ShopName  ShopName
2  2022-08-04  119.70  Supermarkets  Supermarkets  ShopName
3  2022-08-04     NaN      ShopName      ShopName  ShopName

Then, as one doesn’t need the rows where Spent is NaN, nor the column Category/Shop, one can drop both with pandas.DataFrame.dropna and pandas.DataFrame.drop

df = df.dropna(subset=['Spent']).drop(columns=['Category/Shop'])

[Out]:

         Date   Spent      Category      Shop
0  2022-08-04  126.98  Supermarkets  ShopName
2  2022-08-04  119.70  Supermarkets  ShopName

or, if one wants to reset the index, pass, as well, pandas.DataFrame.reset_index

df = df.dropna(subset=['Spent']).drop(columns=['Category/Shop']).reset_index(drop=True)

[Out]:

         Date   Spent      Category      Shop
0  2022-08-04  126.98  Supermarkets  ShopName
1  2022-08-04  119.70  Supermarkets  ShopName

Answered By: Gonçalo Peres

How to separate column in dataframe pandas

Question:

Answers: