Convert pandas data frame to series
Question:
I’m somewhat new to pandas. I have a pandas data frame that is 1 row by 23 columns.
I want to convert this into a series? I’m wondering what the most pythonic way to do this is?
I’ve tried pd.Series(myResults)
but it complains ValueError: cannot copy sequence with size 23 to array axis with dimension 1
. It’s not smart enough to realize it’s still a “vector” in math terms.
Thanks!
Answers:
It’s not smart enough to realize it’s still a “vector” in math terms.
Say rather that it’s smart enough to recognize a difference in dimensionality. 🙂
I think the simplest thing you can do is select that row positionally using iloc
, which gives you a Series with the columns as the new index and the values as the values:
>>> df = pd.DataFrame([list(range(5))], columns=["a{}".format(i) for i in range(5)])
>>> df
a0 a1 a2 a3 a4
0 0 1 2 3 4
>>> df.iloc[0]
a0 0
a1 1
a2 2
a3 3
a4 4
Name: 0, dtype: int64
>>> type(_)
<class 'pandas.core.series.Series'>
You can retrieve the series through slicing your dataframe using one of these two methods:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iloc.html
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.loc.html
import pandas as pd
import numpy as np
df = pd.DataFrame(data=np.random.randn(1,8))
series1=df.iloc[0,:]
type(series1)
pandas.core.series.Series
You can transpose the single-row dataframe (which still results in a dataframe) and then squeeze the results into a series (the inverse of to_frame
).
df = pd.DataFrame([list(range(5))], columns=["a{}".format(i) for i in range(5)])
>>> df.squeeze(axis=0)
a0 0
a1 1
a2 2
a3 3
a4 4
Name: 0, dtype: int64
Note: To accommodate the point raised by @IanS (even though it is not in the OP’s question), test for the dataframe’s size. I am assuming that df
is a dataframe, but the edge cases are an empty dataframe, a dataframe of shape (1, 1), and a dataframe with more than one row in which case the use should implement their desired functionality.
if df.empty:
# Empty dataframe, so convert to empty Series.
result = pd.Series()
elif df.shape == (1, 1)
# DataFrame with one value, so convert to series with appropriate index.
result = pd.Series(df.iat[0, 0], index=df.columns)
elif len(df) == 1:
# Convert to series per OP's question.
result = df.T.squeeze()
else:
# Dataframe with multiple rows. Implement desired behavior.
pass
This can also be simplified along the lines of the answer provided by @themachinist.
if len(df) > 1:
# Dataframe with multiple rows. Implement desired behavior.
pass
else:
result = pd.Series() if df.empty else df.iloc[0, :]
Another way –
Suppose myResult is the dataFrame that contains your data in the form of 1 col and 23 rows
# label your columns by passing a list of names
myResult.columns = ['firstCol']
# fetch the column in this way, which will return you a series
myResult = myResult['firstCol']
print(type(myResult))
In similar fashion, you can get series from Dataframe with multiple columns.
data = pd.DataFrame({"a":[1,2,3,34],"b":[5,6,7,8]})
new_data = pd.melt(data)
new_data.set_index("variable", inplace=True)
This gives a dataframe with index as column name of data and all data are present in “values” column
You can also use stack()
df= DataFrame([list(range(5))], columns = [“a{}”.format(I) for I in range(5)])
After u run df, then run:
df.stack()
You obtain your dataframe in series
If you have a one column dataframe df, you can convert it to a series:
df.iloc[:,0] # pandas Series
Since you have a one row dataframe df
, you can transpose it so you’re in the previous case:
df.T.iloc[:,0]
Another way is very simple
df= df.iloc[3].reset_index(drop=True).squeeze()
Squeeze -> is the one that converts to Series.
I’m somewhat new to pandas. I have a pandas data frame that is 1 row by 23 columns.
I want to convert this into a series? I’m wondering what the most pythonic way to do this is?
I’ve tried pd.Series(myResults)
but it complains ValueError: cannot copy sequence with size 23 to array axis with dimension 1
. It’s not smart enough to realize it’s still a “vector” in math terms.
Thanks!
It’s not smart enough to realize it’s still a “vector” in math terms.
Say rather that it’s smart enough to recognize a difference in dimensionality. 🙂
I think the simplest thing you can do is select that row positionally using iloc
, which gives you a Series with the columns as the new index and the values as the values:
>>> df = pd.DataFrame([list(range(5))], columns=["a{}".format(i) for i in range(5)])
>>> df
a0 a1 a2 a3 a4
0 0 1 2 3 4
>>> df.iloc[0]
a0 0
a1 1
a2 2
a3 3
a4 4
Name: 0, dtype: int64
>>> type(_)
<class 'pandas.core.series.Series'>
You can retrieve the series through slicing your dataframe using one of these two methods:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iloc.html
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.loc.html
import pandas as pd
import numpy as np
df = pd.DataFrame(data=np.random.randn(1,8))
series1=df.iloc[0,:]
type(series1)
pandas.core.series.Series
You can transpose the single-row dataframe (which still results in a dataframe) and then squeeze the results into a series (the inverse of to_frame
).
df = pd.DataFrame([list(range(5))], columns=["a{}".format(i) for i in range(5)])
>>> df.squeeze(axis=0)
a0 0
a1 1
a2 2
a3 3
a4 4
Name: 0, dtype: int64
Note: To accommodate the point raised by @IanS (even though it is not in the OP’s question), test for the dataframe’s size. I am assuming that df
is a dataframe, but the edge cases are an empty dataframe, a dataframe of shape (1, 1), and a dataframe with more than one row in which case the use should implement their desired functionality.
if df.empty:
# Empty dataframe, so convert to empty Series.
result = pd.Series()
elif df.shape == (1, 1)
# DataFrame with one value, so convert to series with appropriate index.
result = pd.Series(df.iat[0, 0], index=df.columns)
elif len(df) == 1:
# Convert to series per OP's question.
result = df.T.squeeze()
else:
# Dataframe with multiple rows. Implement desired behavior.
pass
This can also be simplified along the lines of the answer provided by @themachinist.
if len(df) > 1:
# Dataframe with multiple rows. Implement desired behavior.
pass
else:
result = pd.Series() if df.empty else df.iloc[0, :]
Another way –
Suppose myResult is the dataFrame that contains your data in the form of 1 col and 23 rows
# label your columns by passing a list of names
myResult.columns = ['firstCol']
# fetch the column in this way, which will return you a series
myResult = myResult['firstCol']
print(type(myResult))
In similar fashion, you can get series from Dataframe with multiple columns.
data = pd.DataFrame({"a":[1,2,3,34],"b":[5,6,7,8]})
new_data = pd.melt(data)
new_data.set_index("variable", inplace=True)
This gives a dataframe with index as column name of data and all data are present in “values” column
You can also use stack()
df= DataFrame([list(range(5))], columns = [“a{}”.format(I) for I in range(5)])
After u run df, then run:
df.stack()
You obtain your dataframe in series
If you have a one column dataframe df, you can convert it to a series:
df.iloc[:,0] # pandas Series
Since you have a one row dataframe df
, you can transpose it so you’re in the previous case:
df.T.iloc[:,0]
Another way is very simple
df= df.iloc[3].reset_index(drop=True).squeeze()
Squeeze -> is the one that converts to Series.