How to unpack the columns of a pandas DataFrame to multiple variables

Question:

Lists or numpy arrays can be unpacked to multiple variables if the dimensions match. For a 3xN array, the following will work:

import numpy as np 
a,b =          [[1,2,3],[4,5,6]]
a,b = np.array([[1,2,3],[4,5,6]])
# result: a=[1,2,3],   b=[4,5,6]

How can I achieve a similar behaviour for the columns of a pandas DataFrame? Extending the above example:

import pandas as pd 
df = pd.DataFrame([[1,2,3],[4,5,6]])
df.columns = ['A','B','C']    # Rename cols and
df.index = ['i', 'ii']        # rows for clarity

The following does not work as expected:

a,b = df.T
# result: a='i',   b='ii'
a,b,c = df
# result: a='A',   b='B',   c='C'

However, what I would like to get is the following:

a,b,c = unpack(df)
result: a=df['A'], b=df['B'], c=df['C']

Is the function unpack already available in pandas? Or can it be mimicked in an easy way?

Asked By: normanius

||

Answers:

I just figured that the following works, which is already close to what I try to achieve:

a,b,c = df.T.values        # Common
a,b,c = df.T.to_numpy()    # Recommended
# a,b,c = df.T.as_matrix() # Deprecated

Details: As always, things are a little more complicated than one thinks. Note that a pd.DataFrame stores columns separately in Series. Calling df.values (or better: df.to_numpy()) is potentially expensive, as it combines the columns in a single ndarray, which likely involves copying actions and type conversions. Also, the resulting container has a single dtype able to accommodate all data in the data frame.

In summary, the above approach loses the per-column dtype information and is potentially expensive. It is technically cleaner to iterate the columns in one of the following ways (there are more options):

# The following alternatives create VIEWS!
a,b,c = (v for _,v in df.items())      # returns pd.Series
a,b,c = (df[c] for c in df)            # returns pd.Series

Note that the above creates views! Modifying the data likely will trigger a SettingWithCopyWarning.

a.iloc[0] = "blabla"    # raises SettingWithCopyWarning

If you want to modify the unpacked variables, you have to copy the columns.

# The following alternatives create COPIES!
a,b,c = (v.copy() for _,v in df.items())      # returns pd.Series
a,b,c = (df[c].copy() for c in df)            # returns pd.Series
a,b,c = (df[c].to_numpy() for c in df)        # returns np.ndarray

While this is cleaner, it requires more characters. I personally do not recommend the above approach for production code. But to avoid typing (e.g., in interactive shell sessions), it is still a fair option…

# More verbose and explicit alternatives
a,b,c = df["the first col"], df["the second col"], df["the third col"]
a,b,c = df.iloc[:,0], df.iloc[:,1], df.iloc[:,2]
Answered By: normanius

The dataframe.values shown method is indeed a good solution, but it involves building a numpy array.

In the case you want to access pandas series methods after unpacking, I personally use a different approach.

For the people like me that use a lot of chained methods, I have a solution by adding a custom unpacking method to pandas. Note that this may not be very good for production pipelines, but it is very handy in ad-hoc data analyses.

df = pd.DataFrame({
    "lat": [30, 40], 
    "lon": [0, 1],
})

This approach involves returning a generator on a .unpack() call.

from typing import Tuple

def unpack(self: pd.DataFrame) -> Tuple[pd.Series]:
    return (
        self[col]
        for col in self.columns
    )

pd.DataFrame.unpack = unpack

This can be used in two major ways.

Either directly as a solution to your problem:

lat, lon = df.unpack()

Or, can be used in a method chaining.
Imagine a geo function which has to take a latitude serie in the first arg and a longitude in the second arg, named do_something_geographical(lat, lon)

df_result = (
    df
        .(...some method chaining...)
        .assign(
            geographic_result=lambda dataframe: do_something_geographical(dataframe[["lat", "lon"]].unpack())
        )
        .(...some method chaining...)
)
Answered By: D Sestu
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.