Nested List to Pandas Dataframe with headers
Question:
Basically I am trying to do the opposite of How to generate a list from a pandas DataFrame with the column name and column values?
To borrow that example, I want to go from the form:
data = [
['Name','Rank','Complete'],
['one', 1, 1],
['two', 2, 1],
['three', 3, 1],
['four', 4, 1],
['five', 5, 1]
]
which should output:
Rank Complete
Name
One 1 1
Two 2 1
Three 3 1
Four 4 1
Five 5 1
However when I do something like:
pd.DataFrame(data)
I get a dataframe where the first list should be my column labels, and then the first element of each list should be the indices.
Answers:
One way to do this would be to take the column names as a separate list and then only give from 1st index for pd.DataFrame
–
In [8]: data = [['Name','Rank','Complete'],
...: ['one', 1, 1],
...: ['two', 2, 1],
...: ['three', 3, 1],
...: ['four', 4, 1],
...: ['five', 5, 1]]
In [10]: df = pd.DataFrame(data[1:],columns=data[0])
In [11]: df
Out[11]:
Name Rank Complete
0 one 1 1
1 two 2 1
2 three 3 1
3 four 4 1
4 five 5 1
If you want to set the first column Name
column as index, use the .set_index()
method and send in the column to use for index. Example –
In [16]: df = pd.DataFrame(data[1:],columns=data[0]).set_index('Name')
In [17]: df
Out[17]:
Rank Complete
Name
one 1 1
two 2 1
three 3 1
four 4 1
five 5 1
To create the desired dataframe from construction, the list could be converted into a numpy array and indexed accordingly.
arr = np.array(data, dtype=object)
df = pd.DataFrame(arr[1:, 1:], index=pd.Index(arr[1:, 0], name=arr[0,0]), columns=arr[0, 1:], dtype=int)
Another method is, since the data looks like a csv file read into a Python list, it could be converted into an in-memory text buffer and have pd.read_csv
called on it. A nice thing about read_csv
is that it can set MultiIndex columns, indices etc. and can infer dtypes.
from io import StringIO
df = pd.read_csv(StringIO('n'.join(['|'.join(map(str, row)) for row in data])), sep='|', index_col=[0])
A convenience function for the latter method:
from io import StringIO
def read_list(data, index_col=None, header=0):
sio = StringIO('n'.join(['|'.join(map(str, row)) for row in data]))
return pd.read_csv(sio, sep='|', index_col=index_col, header=header)
df = read_list(data, index_col=[0])
Basically I am trying to do the opposite of How to generate a list from a pandas DataFrame with the column name and column values?
To borrow that example, I want to go from the form:
data = [
['Name','Rank','Complete'],
['one', 1, 1],
['two', 2, 1],
['three', 3, 1],
['four', 4, 1],
['five', 5, 1]
]
which should output:
Rank Complete
Name
One 1 1
Two 2 1
Three 3 1
Four 4 1
Five 5 1
However when I do something like:
pd.DataFrame(data)
I get a dataframe where the first list should be my column labels, and then the first element of each list should be the indices.
One way to do this would be to take the column names as a separate list and then only give from 1st index for pd.DataFrame
–
In [8]: data = [['Name','Rank','Complete'],
...: ['one', 1, 1],
...: ['two', 2, 1],
...: ['three', 3, 1],
...: ['four', 4, 1],
...: ['five', 5, 1]]
In [10]: df = pd.DataFrame(data[1:],columns=data[0])
In [11]: df
Out[11]:
Name Rank Complete
0 one 1 1
1 two 2 1
2 three 3 1
3 four 4 1
4 five 5 1
If you want to set the first column Name
column as index, use the .set_index()
method and send in the column to use for index. Example –
In [16]: df = pd.DataFrame(data[1:],columns=data[0]).set_index('Name')
In [17]: df
Out[17]:
Rank Complete
Name
one 1 1
two 2 1
three 3 1
four 4 1
five 5 1
To create the desired dataframe from construction, the list could be converted into a numpy array and indexed accordingly.
arr = np.array(data, dtype=object)
df = pd.DataFrame(arr[1:, 1:], index=pd.Index(arr[1:, 0], name=arr[0,0]), columns=arr[0, 1:], dtype=int)
Another method is, since the data looks like a csv file read into a Python list, it could be converted into an in-memory text buffer and have pd.read_csv
called on it. A nice thing about read_csv
is that it can set MultiIndex columns, indices etc. and can infer dtypes.
from io import StringIO
df = pd.read_csv(StringIO('n'.join(['|'.join(map(str, row)) for row in data])), sep='|', index_col=[0])
A convenience function for the latter method:
from io import StringIO
def read_list(data, index_col=None, header=0):
sio = StringIO('n'.join(['|'.join(map(str, row)) for row in data]))
return pd.read_csv(sio, sep='|', index_col=index_col, header=header)
df = read_list(data, index_col=[0])