Convert one DataFrame row to flat list
Question:
I new to Python and I’m therefore having trouble converting a row in a DataFrame
into a flat list
. To do this I use the following code:
Toy DataFrame
:
import pandas as pd
d = {
"a": [1, 2, 3, 4, 5],
"b": [9, 8, 7, 6, 5],
"n": ["a", "b", "c", "d", "e"]
}
df = pd.DataFrame(d)
My code:
df_note = df.loc[df.n == "d", ["a", "b"]].values #convert to array
df_note = df_note.tolist() #convert to nested list
df_note = reduce(lambda x, y: x + y, df_note) #convert to flat list
To me this code appears to be both gross and inefficient. The fact that I convert to an array
before a list
is what is causing the problem, i.e. the list
to be nested. That withstanding, I can not find a means of converting the row directly to a list. Any advice?
This question is not a dupe of this. In my case, I want the list to be flat.
Answers:
You get a nested list because you select a sub data frame.
This takes a row, which can be converted to a list without flattening:
df.loc[0, :].values.tolist()
[1, 9, 'a']
How about slicing the list:
df_note.values.tolist()[0]
[4, 6]
The values are stored in an NumPy array. So you do not convert them. Pandas uses a lot of NumPy under the hood. The attribute access df_note.values
is just a different name for part of the data frame.
You are almost there, actually just use flatten
instead of reduce
to unnest the array (instead of unnesting the list), and chain operations to have a one liner:
df.loc[df.n == "d", ['a','b']].values.flatten().tolist()
#[4, 6]
I am assuming you’re explicitly selecting columns a
and b
only to get rid of column n
, which you are solely using to select the wanted row.
In that case, you could also use the n
column as the index first, using set_index:
>>> dfi = df.set_index('n')
>>> dfi.ix['d'].tolist()
[4, 6]
I new to Python and I’m therefore having trouble converting a row in a DataFrame
into a flat list
. To do this I use the following code:
Toy DataFrame
:
import pandas as pd
d = {
"a": [1, 2, 3, 4, 5],
"b": [9, 8, 7, 6, 5],
"n": ["a", "b", "c", "d", "e"]
}
df = pd.DataFrame(d)
My code:
df_note = df.loc[df.n == "d", ["a", "b"]].values #convert to array
df_note = df_note.tolist() #convert to nested list
df_note = reduce(lambda x, y: x + y, df_note) #convert to flat list
To me this code appears to be both gross and inefficient. The fact that I convert to an array
before a list
is what is causing the problem, i.e. the list
to be nested. That withstanding, I can not find a means of converting the row directly to a list. Any advice?
This question is not a dupe of this. In my case, I want the list to be flat.
You get a nested list because you select a sub data frame.
This takes a row, which can be converted to a list without flattening:
df.loc[0, :].values.tolist()
[1, 9, 'a']
How about slicing the list:
df_note.values.tolist()[0]
[4, 6]
The values are stored in an NumPy array. So you do not convert them. Pandas uses a lot of NumPy under the hood. The attribute access df_note.values
is just a different name for part of the data frame.
You are almost there, actually just use flatten
instead of reduce
to unnest the array (instead of unnesting the list), and chain operations to have a one liner:
df.loc[df.n == "d", ['a','b']].values.flatten().tolist()
#[4, 6]
I am assuming you’re explicitly selecting columns a
and b
only to get rid of column n
, which you are solely using to select the wanted row.
In that case, you could also use the n
column as the index first, using set_index:
>>> dfi = df.set_index('n')
>>> dfi.ix['d'].tolist()
[4, 6]