How do I Pass a List of Series to a Pandas DataFrame?

Question:

I realize Dataframe takes a map of {‘series_name’:Series(data, index)}. However, it automatically sorts that map even if the map is an OrderedDict().

Is there a simple way to pass a list of Series(data, index, name=name) such that the order is preserved and the column names are the series.name? Is there an easy way if all the indices are the same for all the series?

I normally do this by just passing a numpy column_stack of series.values and specifying the column names. However, this is ugly and in this particular case the data is strings not floats.

Asked By: rhaskett

||

Answers:

You could use pandas.concat:

import pandas as pd
from pandas.util.testing import rands

data = [pd.Series([rands(4) for j in range(6)],
                  index=pd.date_range('1/1/2000', periods=6),
                  name='col'+str(i)) for i in range(4)]

df = pd.concat(data, axis=1, keys=[s.name for s in data])
print(df)

yields

            col0  col1  col2  col3
2000-01-01  GqcN  Lwlj  Km7b  XfaA
2000-01-02  lhNC  nlSm  jCYu  XLVb
2000-01-03  sSRz  PFby  C1o5  0BJe
2000-01-04  khZb  Ny9p  crUY  LNmc
2000-01-05  hmLp  4rVp  xF2P  OmD9
2000-01-06  giah  psQb  T5RJ  oLSh
Answered By: unutbu
a = pd.Series(data=[1,2,3])
b = pd.Series(data=[4,5,6])
a.name = 'a'
b.name= 'b'

pd.DataFrame(zip(a,b), columns=[a.name, b.name])

or just concat dataframes

pd.concat([pd.DataFrame(a),pd.DataFrame(b)], axis=1)

In [53]: %timeit pd.DataFrame(zip(a,b), columns=[a.name, b.name])
1000 loops, best of 3: 362 us per loop

In [54]: %timeit pd.concat([pd.DataFrame(a),pd.DataFrame(b)], axis=1)
1000 loops, best of 3: 808 us per loop
Answered By: jassinm

Check out DataFrame.from_items too

Answered By: Wes McKinney

Simply passing the list of Series to DataFrame then transposing seems to work too. It will also fill in any indices that are missing from one or the other Series.

import pandas as pd
from pandas.util.testing import rands
data = [pd.Series([rands(4) for j in range(6)],
                  index=pd.date_range('1/1/2000', periods=6),
                  name='col'+str(i)) for i in range(4)]
df = pd.DataFrame(data).T
print(df)
Answered By: hgcrpd

Build the list of series:

import pandas as pd
import numpy as np

> series = [pd.Series(np.random.rand(3), name=c) for c in list('abcdefg')]

First method pd.DataFrame.from_items:

> pd.DataFrame.from_items([(s.name, s) for s in series])
          a         b         c         d         e         f         g
0  0.071094  0.077545  0.299540  0.377555  0.751840  0.879995  0.933399
1  0.538251  0.066780  0.415607  0.796059  0.718893  0.679950  0.502138
2  0.096001  0.680868  0.883778  0.210488  0.642578  0.023881  0.250317

Second method pd.concat:

> pd.concat(series, axis=1)
          a         b         c         d         e         f         g
0  0.071094  0.077545  0.299540  0.377555  0.751840  0.879995  0.933399
1  0.538251  0.066780  0.415607  0.796059  0.718893  0.679950  0.502138
2  0.096001  0.680868  0.883778  0.210488  0.642578  0.023881  0.250317
Answered By: luca

You can first create an empty DataFrame and then use append() to it.

df = pd.DataFrame()

then:

df = df.append(list_series)

I also like to make sure the previous script that created list_series won’t mess my dataframe up:

df.drop_duplicates(inplace=True)
Answered By: Cid Medeiros

This one is simpler:

import pandas as pd
from pandas.util.testing import rands
    
data = [pd.Series([rands(4) for j in range(6)],
                  index=pd.date_range('1/1/2000', periods=6),
                  name='col'+str(i)) for i in range(4)]
    
df = pd.DataFrame(data)
print(df)

which yields

     2000-01-01 2000-01-02 2000-01-03 2000-01-04 2000-01-05 2000-01-06
col0       oPg5       9Af9       SNfq       vnCb       ArCU       8Bhy
col1       IKmX       xS0c       yqCQ       sVov       92CN       WIyH
col2       1x2s       JBk7       Z5vh       km7k       ed1F       pIDt
col3       m9M3       mxil       1v72       Fkme       YooA       5H5b

, or try this one

df = pd.DataFrame(data).T
print(df)

to yield

            col0  col1  col2  col3
2000-01-01  6zbm  UfrI  isNy  wVv0
2000-01-02  Kgej  0SN4  thDS  7BP2
2000-01-03  mcTx  BGDI  5BJC  mUdg
2000-01-04  iVSP  6Rim  6gg9  fY2A
2000-01-05  HzEU  giJ6  HFD1  dE98
2000-01-06  wYCi  nWmp  jqLz  GwKz
Answered By: Hyunwoo.Jung.Henry
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.