Nested List to Pandas Dataframe with headers

Question:

Basically I am trying to do the opposite of How to generate a list from a pandas DataFrame with the column name and column values?

To borrow that example, I want to go from the form:

data = [
    ['Name','Rank','Complete'],
    ['one', 1, 1],
    ['two', 2, 1],
    ['three', 3, 1],
    ['four', 4, 1],
    ['five', 5, 1]
]

which should output:

      Rank Complete
 Name
  One    1        1
  Two    2        1
Three    3        1
 Four    4        1
 Five    5        1

However when I do something like:

pd.DataFrame(data)

I get a dataframe where the first list should be my column labels, and then the first element of each list should be the indices.

Asked By: qwertylpc

||

Answers:

One way to do this would be to take the column names as a separate list and then only give from 1st index for pd.DataFrame

In [8]: data = [['Name','Rank','Complete'],
   ...:                ['one', 1, 1],
   ...:                ['two', 2, 1],
   ...:                ['three', 3, 1],
   ...:                ['four', 4, 1],
   ...:                ['five', 5, 1]]

In [10]: df = pd.DataFrame(data[1:],columns=data[0])

In [11]: df
Out[11]:
    Name  Rank  Complete
0    one     1         1
1    two     2         1
2  three     3         1
3   four     4         1
4   five     5         1

If you want to set the first column Name column as index, use the .set_index() method and send in the column to use for index. Example –

In [16]: df = pd.DataFrame(data[1:],columns=data[0]).set_index('Name')

In [17]: df
Out[17]:
       Rank  Complete
Name
one       1         1
two       2         1
three     3         1
four      4         1
five      5         1
Answered By: Anand S Kumar

To create the desired dataframe from construction, the list could be converted into a numpy array and indexed accordingly.

arr = np.array(data, dtype=object)
df = pd.DataFrame(arr[1:, 1:], index=pd.Index(arr[1:, 0], name=arr[0,0]), columns=arr[0, 1:], dtype=int)

Another method is, since the data looks like a csv file read into a Python list, it could be converted into an in-memory text buffer and have pd.read_csv called on it. A nice thing about read_csv is that it can set MultiIndex columns, indices etc. and can infer dtypes.

from io import StringIO
df = pd.read_csv(StringIO('n'.join(['|'.join(map(str, row)) for row in data])), sep='|', index_col=[0])

res


A convenience function for the latter method:

from io import StringIO
def read_list(data, index_col=None, header=0):
    sio = StringIO('n'.join(['|'.join(map(str, row)) for row in data]))
    return pd.read_csv(sio, sep='|', index_col=index_col, header=header)

df = read_list(data, index_col=[0])
Answered By: cottontail