Creating custom names for columns based on other column names in pandas dataframe

Question:

I have a dataframe like below:

enter image description here

I am looking to create a column using difference or use any other calculations among columns. However, I looking to name the column so that it relfects the operation done. For ex below I am finding the difference b/w Origin 1 and Dest 1 as below:

enter image description here

How do I create those custom naming of columns as highlighted and especially when I have to create multiple such columns.

Asked By: Spandan Rout

||

Answers:

Just iterate through it and for naming you can use a f-string

for col_a in df.columns:
   for col_b in df.columns:
      if col_a != col_b:
         df[f'{col_a} - {col_b}'] = df[col_a] - df[col_b]

if you use itertools (pre-installed in python) you can make it easier to read (as proposed by @MustafaAydın):

import itertools
for col_a, col_b in itertools.permutations(df, 2):
    df[f'{col_a} - {col_b}'] = df[col_a] - df[col_b]

if you want to do multiple operations just add a line

import itertools
for col_a, col_b in itertools.permutations(df, 2):
    df[f'{col_a} - {col_b}'] = df[col_a] - df[col_b]
    df[f'{col_a} + {col_b}'] = df[col_a] + df[col_b]

if you only want to use subsets of columns, e.g. only from origin to destination you can do:

import itertools
origins = [col for col in df.columns if col.startswith('Origin')]
destinations = [col for col in df.columns if col.startswith('Dest')]
for col_a, col_b in itertools.product(origins, destinations):
    df[f'{col_a} - {col_b}'] = df[col_a] - df[col_b]
    df[f'{col_a} + {col_b}'] = df[col_a] + df[col_b]
Answered By: Andreas

It is quite simple.

Let’s define a dataframe with two columns a and b:

df = pd.DataFrame({"a":[1,2,3,4],"b":[4,3,2,1]})

Output:

    a   b
0   1   4
1   2   3
2   3   2
3   4   1

Now, let’s create the difference mentioned above between the two columns

df["a-b"] = df["a"] - df["b"]

Voila! Now you have a new column.

    a   b   a-b
0   1   4   -3
1   2   3   -1
2   3   2   1
3   4   1   3

For multiple iterative calculation, we can workout loop-based approach:

df = pd.DataFrame({"a":[1,2,3,4],"b":[4,3,2,1],"c":[8,7,6,5]})

df["a-b"] = df["a"] -df["b"]

#if you want to calculate for every column combination
for i in df.columns:
    for j in df.columns:
        if i != j and "-" not in j and "-" not in i:
            df[f"{i}-{j}"] = df[i] - df[j]

This approach calculates all differences between all columns in one loop.

Output:

    a   b   c   a-b a-c b-a b-c c-a c-b
0   1   4   8   -3  -7  3   -4  7   4
1   2   3   7   -1  -5  1   -4  5   4
2   3   2   6   1   -3  -1  -4  3   4
3   4   1   5   3   -1  -3  -4  1   4
Answered By: Domagoj

This is very simple in pandas.

data['Dest 1 - Origin 1'] = data['Dest 1'] - data['Origin 1']