Append new column to csv based on lookup

Question:

I have two csv files lookup.csv and data.csv. I’m converting lookup.csv as dictionary and need to add new column in data.csv based on the column.

Input:

lookup.csv

   1 first
   2 second
   ...

data.csv

  101 NYC 1
  202 DC  2

Expected output:

data.csv

  col1 col2 col3 col4
  101  NYC  1    first
  202   DC  2    second
  ... 

Here for the first row new column col4 has first because the col3 has 1 and it’s corresponding value in lookup.csv is first.

I tried the below logic but failing here:

df = pd.read_csv("lookup.csv",header=None, index_col=0, squeeze=True).to_dict()
df1 = pd.read_csv("data.csv")
df1['col4'] = df.get(df1['col3'])

Error: TypeError: unhashable type: 'Series'

Can someone please help in resolving this issue?

Asked By: praneethh

||

Answers:

get method expects a hashable key (i.e., a single value), but df1['col3'] is a Series object. Try apply method:

import pandas as pd

lookup_dict = pd.read_csv("lookup.csv", header=None, index_col=0).squeeze("columns").to_dict()

data_df = pd.read_csv("data.csv", header=None, index_col=False)
data_df.columns = ['col1', 'col2', 'col3']

data_df['col4'] = data_df['col3'].apply(lambda x: lookup_dict.get(x))

print(data_df)

Output:

   col1 col2  col3    col4
0   101  NYC     1   first
1   202   DC     2  second
Answered By: angwrk

You can also pandas merge method.

If lookup.csv is:

   Code    Name
0     1   first
1     2  second

and data.csv is:

   Pin Initial  Code
0  101     NYC     1
1  202      DC     2
2  101     NYC     1
3  202      DC     2
4  101     NYC     1
5  202      DC     2
6  101     NYC     1
7  202      DC     2

Then read each csv into dataframe

import pandas as pd
lookupdf = pd.read_csv('lookup.csv')
datadf = pd.read_csv('data.csv')

And use following single code line with merge (which will occur using common column name):

newdf = pd.merge(datadf, lookupdf)

See the result:

print(newdf)

   Pin Initial  Code    Name
0  101     NYC     1   first
1  101     NYC     1   first
2  101     NYC     1   first
3  101     NYC     1   first
4  202      DC     2  second
5  202      DC     2  second
6  202      DC     2  second
7  202      DC     2  second
Answered By: rnso

First of all, the squeeze=True is causing pd.read_csv to return a series, not a dataframe [read_csv docs]. That’s why you’re getting the unhashable type series error.

Secondly, instead of converting it to a dictionary, you can just merge the dataframes or join them, depending on whether shared key is a column or the index.

df = pd.read_csv("lookup.csv", header=None, names=['num', 'name'])
df1 = pd.read_csv("data.csv", header=0, names=['foo', 'bar', 'num'])
df_merged = df.merge(df1, on='num')
Answered By: Jthaller
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.