How do I split a column to many colum by row quantity

Question:

I got a long single column DataFrame as following table:

Column A
Cell 1
Cell 2
Cell 3
Cell 4
Cell 5
Cell 6
Cell 7
Cell 8

I want to split column A in order with specify row quantity and add to others new columns
If I give 2 row quantity for each column

Column A Column B Column C Column D
Cell 1 Cell 3 Cell 5 Cell 7
Cell 2 Cell 4 Cell 6 Cell 8

split long a single column to new adding column by given row quantity.

Asked By: Tom

||

Answers:

You could divide the rows into groups and pivot the groups into columns.

Short Version

number_rows = 2
df['cols'] = np.ceil(df['Column A'].expanding().count()/number_rows)
df.index = pd.Series(range(len(df))) % number_rows
df = df.pivot(columns='cols', values='Column A')
cols     1.0     2.0     3.0     4.0
0     Cell 0  Cell 2  Cell 4  Cell 6
1     Cell 1  Cell 3  Cell 5  Cell 7

The code will also work if your number of rows to split by (number_rows) are not a multiple of length of your DataFrame, and missing values (np.nan) will be added.

Long Version with Explanation of Steps

Create demo data

import pandas as pd
import numpy as np
df = pd.DataFrame({'Column A': [f'Cell {i}' for i in range(4)]})
  Column A
0   Cell 0
1   Cell 1
2   Cell 2
3   Cell 3

Create columns for later columns and rows

number_rows = 3 # 3 instead of 2 to illustrate filling with missing values
df['cols'] = np.ceil(df['Column A'].expanding().count()/number_rows)
df['cols'] = 'Column '+df['cols'].astype(int).astype(str)
df['rows'] = pd.Series(range(len(df))) % number_rows
  Column A      cols  rows
0   Cell 0  Column 1     0
1   Cell 1  Column 1     1
2   Cell 2  Column 1     2
3   Cell 3  Column 2     0

Pivot table

df = df.pivot(columns='cols', index='rows', values='Column A')
cols Column 1 Column 2
rows                  
0      Cell 0   Cell 3
1      Cell 1      NaN
2      Cell 2      NaN

You can remove the column and index names with:

df.columns.name = df.index.name = None
Answered By: Benjamin Ziepert

You can use the underlying numpy array to reshape in Fortran order (rows, then columns):

from string import ascii_uppercase

N = 2

out = (pd.DataFrame(df['Column A'].to_numpy().reshape(N, -1, order='F'))
        # the line below is optional, just to have the column names
         .rename(columns=dict(enumerate(ascii_uppercase))).add_prefix('Column ')
      )

Output:

  Column A Column B Column C Column D
0   Cell 1   Cell 3   Cell 5   Cell 7
1   Cell 2   Cell 4   Cell 6   Cell 8

If you want to handle N that are non multiples of len(df), you can add a reindex step to pad the DataFrame with NaNs:

N = 3

out = (pd.DataFrame(df['Column A'].reindex(range(int(np.ceil(len(df)/N)*N)))
                                  .to_numpy().reshape(N, -1, order='F'))
         .rename(columns=dict(enumerate(ascii_uppercase))).add_prefix('Column ')
      )

Output:

  Column A Column B Column C
0   Cell 1   Cell 4   Cell 7
1   Cell 2   Cell 5   Cell 8
2   Cell 3   Cell 6      NaN
Answered By: mozway

You should Create a new row, with the name of new columns corresponding to each value and also change the correpoding index. Finally you can use df.pivot()
to pivot your dataframe to the new format.

n_rows = 2
df['new_col'] = "Column "
df['new_col']=df['new_col']+pd.Series(df.index%n_rows).ne(1).cumsum().astype(str)
df.index=df.index%n_rows
print(df.pivot(columns='new_col', values='Column A'))

new_col Column 1 Column 2 Column 3 Column 4
0         Cell 1   Cell 3   Cell 5   Cell 7
1         Cell 2   Cell 4   Cell 6   Cell 8
Answered By: ali bakhtiari
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.