Convert values in a column to column headers in pandas
Question:
I have the following code, which takes the values in one column of a pandas dataframe and makes them the columns of a new data frame. The values in the first column of the dataframe become the index of the new dataframe.
In a sense, I want to turn an adjacency list into an adjacency matrix. Here’s the code so far:
import pandas as pa
# Create a dataframe
oldcols = {'col1':['a','a','b','b'], 'col2':['c','d','c','d'], 'col3':[1,2,3,4]}
a = pa.DataFrame(oldcols)
# The columns of the new data frame will be the values in col2 of the original
newcols = list(set(oldcols['col2']))
rows = list(set(oldcols['col1']))
# Create the new data matrix
data = np.zeros((len(rows), len(newcols)))
# Iterate over each row and fill in the new matrix
for row in zip(a['col1'], a['col2'], a['col3']):
rowindex = rows.index(row[0])
colindex = newcols.index(row[1])
data[rowindex][colindex] = row[2]
newf = pa.DataFrame(data)
newf.columns = newcols
newf.index = rows
This works for this particular instance as follows: The original DataFrame
col1 col2 col3
0 a c 1
1 a d 2
2 b c 3
3 b d 4
is transformed to a new DataFrame that looks like
c d
a 1 2
b 3 4
It will fail if the values in col3 are not numbers. My question is, is there a more elegant/robust way of doing this?
Answers:
This looks like a job for pivot:
import pandas as pd
oldcols = {'col1':['a','a','b','b'], 'col2':['c','d','c','d'], 'col3':[1,2,3,4]}
a = pd.DataFrame(oldcols)
newf = a.pivot(index='col1', columns='col2')
print(newf)
yields
col3
col2 c d
col1
a 1 2
b 3 4
If you don’t want a MultiIndex column, you can drop the col3
using:
newf.columns = newf.columns.droplevel(0)
which would then yield
col2 c d
col1
a 1 2
b 3 4
As @unutbu mentioned, you can reshape the dataframe using pivot
.
res = a.pivot(index='col1', columns='col2', values='col3')
An even more terse way is to unpack column labels as args.
res = a.pivot(*a).rename_axis(index=None, columns=None)
Another method is to explicitly construct a graph object (using the popular graph library networkx
) and construct an adjacency matrix. It’s probably too verbose for a simple pivot operation but if the given data is already in graph form, it could be useful.
import networkx as nx
g = nx.Graph()
col1 = a['col1'].unique()
col2 = a['col2'].unique()
g.add_weighted_edges_from(list(map(tuple, a.values)))
res = nx.to_pandas_adjacency(g).loc[col1, col2]
Yet another way is to assign the first two columns as MultiIndex and then unstack the second column:
df = pd.DataFrame({'col1':['a','a','b','b'], 'col2':['c','d','c','d'], 'col3':[1,2,3,4]})
df.set_index(['col1', 'col2']).squeeze().unstack('col2')
results in
col2 c d
col1
a 1 2
b 3 4
The squeeze()
method converts a DataFrame with a single column into a Series.
I have the following code, which takes the values in one column of a pandas dataframe and makes them the columns of a new data frame. The values in the first column of the dataframe become the index of the new dataframe.
In a sense, I want to turn an adjacency list into an adjacency matrix. Here’s the code so far:
import pandas as pa
# Create a dataframe
oldcols = {'col1':['a','a','b','b'], 'col2':['c','d','c','d'], 'col3':[1,2,3,4]}
a = pa.DataFrame(oldcols)
# The columns of the new data frame will be the values in col2 of the original
newcols = list(set(oldcols['col2']))
rows = list(set(oldcols['col1']))
# Create the new data matrix
data = np.zeros((len(rows), len(newcols)))
# Iterate over each row and fill in the new matrix
for row in zip(a['col1'], a['col2'], a['col3']):
rowindex = rows.index(row[0])
colindex = newcols.index(row[1])
data[rowindex][colindex] = row[2]
newf = pa.DataFrame(data)
newf.columns = newcols
newf.index = rows
This works for this particular instance as follows: The original DataFrame
col1 col2 col3
0 a c 1
1 a d 2
2 b c 3
3 b d 4
is transformed to a new DataFrame that looks like
c d
a 1 2
b 3 4
It will fail if the values in col3 are not numbers. My question is, is there a more elegant/robust way of doing this?
This looks like a job for pivot:
import pandas as pd
oldcols = {'col1':['a','a','b','b'], 'col2':['c','d','c','d'], 'col3':[1,2,3,4]}
a = pd.DataFrame(oldcols)
newf = a.pivot(index='col1', columns='col2')
print(newf)
yields
col3
col2 c d
col1
a 1 2
b 3 4
If you don’t want a MultiIndex column, you can drop the col3
using:
newf.columns = newf.columns.droplevel(0)
which would then yield
col2 c d
col1
a 1 2
b 3 4
As @unutbu mentioned, you can reshape the dataframe using pivot
.
res = a.pivot(index='col1', columns='col2', values='col3')
An even more terse way is to unpack column labels as args.
res = a.pivot(*a).rename_axis(index=None, columns=None)
Another method is to explicitly construct a graph object (using the popular graph library networkx
) and construct an adjacency matrix. It’s probably too verbose for a simple pivot operation but if the given data is already in graph form, it could be useful.
import networkx as nx
g = nx.Graph()
col1 = a['col1'].unique()
col2 = a['col2'].unique()
g.add_weighted_edges_from(list(map(tuple, a.values)))
res = nx.to_pandas_adjacency(g).loc[col1, col2]
Yet another way is to assign the first two columns as MultiIndex and then unstack the second column:
df = pd.DataFrame({'col1':['a','a','b','b'], 'col2':['c','d','c','d'], 'col3':[1,2,3,4]})
df.set_index(['col1', 'col2']).squeeze().unstack('col2')
results in
col2 c d
col1
a 1 2
b 3 4
The squeeze()
method converts a DataFrame with a single column into a Series.