Pandas – add value at specific iloc into new dataframe column
Question:
I have a large dataframe containing lots of columns.
For each row/index in the dataframe I do some operations, read in some ancilliary ata, etc and get a new value. Is there a way to add that new value into a new column at the correct row/index?
I can use .assign to add a new column but as I’m looping over the rows and only generating the data to add for one value at a time (generating it is quite involved). When it’s generated I’d like to immediately add it to the dataframe rather than waiting until I’ve generated the entire series.
This doesn’t work and gives a key error:
df['new_column_name'].iloc[this_row]=value
Do I need to initialise the column first or something?
Answers:
If you have a dataframe like
import pandas as pd
df = pd.DataFrame(data={'X': [1.5, 6.777, 2.444, pd.np.NaN], 'Y': [1.111, pd.np.NaN, 8.77, pd.np.NaN], 'Z': [5.0, 2.333, 10, 6.6666]})
Instead of iloc,you can use .loc
with row index and column name like df.loc[row_indexer,column_indexer]=value
df.loc[[0,3],'Z'] = 3
Output:
X Y Z
0 1.500 1.111 3.000
1 6.777 NaN 2.333
2 2.444 8.770 10.000
3 NaN NaN 3.000
There are two steps to created & populate a new column using only a row number…
(in this approach iloc is not used)
First, get the row index value by using the row number
rowIndex = df.index[someRowNumber]
Then, use row index with the loc function to reference the specific row and add the new column / value
df.loc[rowIndex, 'New Column Title'] = "some value"
These two steps can be combine into one line as follows
df.loc[df.index[someRowNumber], 'New Column Title'] = "some value"
You can just use pandas built in function DataFrame.at
You can chose a list on several index or a single index and column
df.at[4, 'B'] = 10
If you want to add values to certain rows in a new column, depending on values in other cells of the dataframe you can do it like this:
import pandas as pd
df = pd.DataFrame(data={"A":[1,1,2,2], "B":[1,2,3,4]})
Add value in a new column based on the values in cloumn "A":
df.loc[df.A == 2, "C"] = 100
This creates the column "C" and addes the value 100 to it, if column "A" is 2.
Output:
A B C
0 1 1 NaN
1 1 2 NaN
2 2 3 100
3 2 4 100
It is not necessary to initialise the column first.
I have a large dataframe containing lots of columns.
For each row/index in the dataframe I do some operations, read in some ancilliary ata, etc and get a new value. Is there a way to add that new value into a new column at the correct row/index?
I can use .assign to add a new column but as I’m looping over the rows and only generating the data to add for one value at a time (generating it is quite involved). When it’s generated I’d like to immediately add it to the dataframe rather than waiting until I’ve generated the entire series.
This doesn’t work and gives a key error:
df['new_column_name'].iloc[this_row]=value
Do I need to initialise the column first or something?
If you have a dataframe like
import pandas as pd
df = pd.DataFrame(data={'X': [1.5, 6.777, 2.444, pd.np.NaN], 'Y': [1.111, pd.np.NaN, 8.77, pd.np.NaN], 'Z': [5.0, 2.333, 10, 6.6666]})
Instead of iloc,you can use .loc
with row index and column name like df.loc[row_indexer,column_indexer]=value
df.loc[[0,3],'Z'] = 3
Output:
X Y Z 0 1.500 1.111 3.000 1 6.777 NaN 2.333 2 2.444 8.770 10.000 3 NaN NaN 3.000
There are two steps to created & populate a new column using only a row number…
(in this approach iloc is not used)
First, get the row index value by using the row number
rowIndex = df.index[someRowNumber]
Then, use row index with the loc function to reference the specific row and add the new column / value
df.loc[rowIndex, 'New Column Title'] = "some value"
These two steps can be combine into one line as follows
df.loc[df.index[someRowNumber], 'New Column Title'] = "some value"
You can just use pandas built in function DataFrame.at
You can chose a list on several index or a single index and column
df.at[4, 'B'] = 10
If you want to add values to certain rows in a new column, depending on values in other cells of the dataframe you can do it like this:
import pandas as pd
df = pd.DataFrame(data={"A":[1,1,2,2], "B":[1,2,3,4]})
Add value in a new column based on the values in cloumn "A":
df.loc[df.A == 2, "C"] = 100
This creates the column "C" and addes the value 100 to it, if column "A" is 2.
Output:
A B C
0 1 1 NaN
1 1 2 NaN
2 2 3 100
3 2 4 100
It is not necessary to initialise the column first.