Adding values to existing columns in pandas
Question:
I loop into csv files in a directory and read them with pandas.
For each csv files I have a category and a marketplace.
Then I need to get the id of the category and the id of the marketplace from the database which will be valid for this csv file.
the finalDf is a dataframe containing all the products for all the csv files and I need to append it with data fron the current csv.
The list of the products of the current CSV are retrived using:
df['PRODUCT']
I need to append them to the finalDf and I used:
finalDf['PRODUCT'] = finalDf['PRODUCT'].append(df['PRODUCT'],ignore_index=True)
This seems to work fine, and I now have to insert catid and marketid to the corresponding columns of the finalDf. because catid and marketid are consitent accross the current csv file I just need to add them as much time as there are rows in the df dataframe, this is what I’m trying to accomplish in the code below.
finalDf = pd.DataFrame(columns=['PRODUCT', 'CAT_ID', 'MARKET_ID'])
finalDf['PRODUCT'] = finalDf.PRODUCT.astype('category')
df = pd.read_csv(filename, header=None,
names=['PRODUCT', 'URL_PRODUCT', 'RANK', 'URL_IMAGE', 'STARS', 'PRICE', 'NAME', 'SNAPDATE',
'CATEGORY', 'MARKETPLACE', 'PARENTCAT', 'LISTTYPE', 'VERSION', 'LEVEL'], sep='t')
finalDf['PRODUCT'] = finalDf['PRODUCT'].append(df['PRODUCT'],ignore_index=True)
# Here I have a single value to add n times, n corresponding to the number of rows in the dataframe df
catid = 2113
marketid = 13
catids = pd.Series([catid]*len(df.index))
marketids = pd.Series([marketid]*len(df.index))
finalDf['CAT_ID'] = finalDf['CAT_ID'].append(catids, ignore_index=True)
finalDf['MARKET_ID'] = finalDf['MARKET_ID'].append(marketids, ignore_index=True)
print finalDf.head()
PRODUCT CAT_ID MARKET_ID
0 ABC NaN NaN
1 ABB NaN NaN
2 ABE NaN NaN
3 DCB NaN NaN
4 EFT NaN NaN
As you can see, I just have NaN values instead of the actual values.
expected output:
PRODUCT CAT_ID MARKET_ID
0 ABC 2113 13
1 ABB 2113 13
2 ABE 2113 13
3 DCB 2113 13
4 EFT 2113 13
finalDF containing several csv would look like:
PRODUCT CAT_ID MARKET_ID
0 ABC 2113 13
1 ABB 2113 13
2 ABE 2113 13
3 DCB 2113 13
4 EFT 2113 13
5 SDD 2114 13
6 ERT 2114 13
7 GHJ 2114 13
8 MOD 2114 13
9 GTR 2114 13
10 WLY 2114 13
11 WLO 2115 13
12 KOP 2115 13
Any idea?
Thanks
Answers:
You actually do not need catids and marketids:
finalDf['CAT_ID'] = catid
finalDf['MARKET_ID'] = marketid
Will work.
For the rest of the script, I would probably have made things a bit simpler in that way:
finalDf = pd.DataFrame()
finalDf['PRODUCT'] = df['PRODUCT'].reset_index()
Supposing that you are not interested in df
‘s original index as your code implied.
I finally found the solution, don’t know why the other one didn’t work though.
But this one is simpler:
tempDf = pd.DataFrame(columns=['PRODUCT','CAT_ID','MARKET_ID'])
tempDf['PRODUCT'] = df['PRODUCT']
tempDf['CAT_ID'] = catid
tempDf['MARKET_ID'] = 13
finalDf = pd.concat([finalDf,tempDf])
To add a value, you can also try to use:
dataframe.at[index,'column-name']='new value'
I loop into csv files in a directory and read them with pandas.
For each csv files I have a category and a marketplace.
Then I need to get the id of the category and the id of the marketplace from the database which will be valid for this csv file.
the finalDf is a dataframe containing all the products for all the csv files and I need to append it with data fron the current csv.
The list of the products of the current CSV are retrived using:
df['PRODUCT']
I need to append them to the finalDf and I used:
finalDf['PRODUCT'] = finalDf['PRODUCT'].append(df['PRODUCT'],ignore_index=True)
This seems to work fine, and I now have to insert catid and marketid to the corresponding columns of the finalDf. because catid and marketid are consitent accross the current csv file I just need to add them as much time as there are rows in the df dataframe, this is what I’m trying to accomplish in the code below.
finalDf = pd.DataFrame(columns=['PRODUCT', 'CAT_ID', 'MARKET_ID'])
finalDf['PRODUCT'] = finalDf.PRODUCT.astype('category')
df = pd.read_csv(filename, header=None,
names=['PRODUCT', 'URL_PRODUCT', 'RANK', 'URL_IMAGE', 'STARS', 'PRICE', 'NAME', 'SNAPDATE',
'CATEGORY', 'MARKETPLACE', 'PARENTCAT', 'LISTTYPE', 'VERSION', 'LEVEL'], sep='t')
finalDf['PRODUCT'] = finalDf['PRODUCT'].append(df['PRODUCT'],ignore_index=True)
# Here I have a single value to add n times, n corresponding to the number of rows in the dataframe df
catid = 2113
marketid = 13
catids = pd.Series([catid]*len(df.index))
marketids = pd.Series([marketid]*len(df.index))
finalDf['CAT_ID'] = finalDf['CAT_ID'].append(catids, ignore_index=True)
finalDf['MARKET_ID'] = finalDf['MARKET_ID'].append(marketids, ignore_index=True)
print finalDf.head()
PRODUCT CAT_ID MARKET_ID
0 ABC NaN NaN
1 ABB NaN NaN
2 ABE NaN NaN
3 DCB NaN NaN
4 EFT NaN NaN
As you can see, I just have NaN values instead of the actual values.
expected output:
PRODUCT CAT_ID MARKET_ID
0 ABC 2113 13
1 ABB 2113 13
2 ABE 2113 13
3 DCB 2113 13
4 EFT 2113 13
finalDF containing several csv would look like:
PRODUCT CAT_ID MARKET_ID
0 ABC 2113 13
1 ABB 2113 13
2 ABE 2113 13
3 DCB 2113 13
4 EFT 2113 13
5 SDD 2114 13
6 ERT 2114 13
7 GHJ 2114 13
8 MOD 2114 13
9 GTR 2114 13
10 WLY 2114 13
11 WLO 2115 13
12 KOP 2115 13
Any idea?
Thanks
You actually do not need catids and marketids:
finalDf['CAT_ID'] = catid
finalDf['MARKET_ID'] = marketid
Will work.
For the rest of the script, I would probably have made things a bit simpler in that way:
finalDf = pd.DataFrame()
finalDf['PRODUCT'] = df['PRODUCT'].reset_index()
Supposing that you are not interested in df
‘s original index as your code implied.
I finally found the solution, don’t know why the other one didn’t work though.
But this one is simpler:
tempDf = pd.DataFrame(columns=['PRODUCT','CAT_ID','MARKET_ID'])
tempDf['PRODUCT'] = df['PRODUCT']
tempDf['CAT_ID'] = catid
tempDf['MARKET_ID'] = 13
finalDf = pd.concat([finalDf,tempDf])
To add a value, you can also try to use:
dataframe.at[index,'column-name']='new value'