How to add data entered by the user to a pandas data frame column?
Question:
I have the following dataset:
import pandas as pd
data = {'type': ['train', 'train', 'train', 'pool', 'pool',
'pool', 'pool', 'pool'], 'index': [0,1,2,3,4,5,6,7], 'corpus':
['a','b','c', 'd', 'e', 'f', 'g', 'h'], 'labels': [[1,0,0],
[0,1,0], [1,1,0], None , None , None , None , None]}
data = pd.DataFrame(data)
data
what I want to do is to display the data from columns "corpus" associated with column ‘type’ ‘pool’ to a user add some labels to it. After that, my program should be able insert in the dataset the labels added by the user to each of the texts displayed. With the code below, the program is adding the last label entered by the user and replacing all the labels of the original dataset.
for row, c in data.iterrows():
if c['type'] == 'pool':
a = input(f"Please enter your labels for
the below text: nn {c['corpus']}")
data['labels'] = a
So, my output current output is:
type corpus labels
0 train a 0,0,1
1 train b 0,0,1
2 train c 0,0,1
7 pool h 0,0,1
4 pool e 0,0,1
3 pool d 0,0,1
5 pool f 0,0,1
6 pool g 0,0,1
my goal is:
type corpus labels
0 train a [1, 0, 0]
1 train b [0, 1, 0]
2 train c [1, 1, 0]
7 pool h [1, 0, 0]
4 pool e [0, 0, 1]
3 pool d [1, 1, 1]
5 pool f [0, 1, 0]
6 pool g [0, 0, 1]
Answers:
There are two things to fix with the code:
Firstly if you assign a
to data['labels']
you are actually assigning it to the whole column (this is why you get the same value in all rows).
Secondly assigning the return from input
will assign a string but the other rows contained a list of ints. To solve this we can use split
to get the elements, map int
to those and assing using df.at
import pandas as pd
data = {
"type": ["train", "train", "train", "pool", "pool", "pool", "pool", "pool"],
"index": [0, 1, 2, 3, 4, 5, 6, 7],
"corpus": ["a", "b", "c", "d", "e", "f", "g", "h"],
"labels": [[1, 0, 0], [0, 1, 0], [1, 1, 0], None, None, None, None, None],
}
data = pd.DataFrame(data)
print(data)
for idx, row in data.iterrows():
if row["type"] == "pool":
a = input(f"Please enter your labels for the below text: nn {row['corpus']} ")
data.at[idx, "labels"] = list(map(int, a.split(",")))
print(data)
I have the following dataset:
import pandas as pd
data = {'type': ['train', 'train', 'train', 'pool', 'pool',
'pool', 'pool', 'pool'], 'index': [0,1,2,3,4,5,6,7], 'corpus':
['a','b','c', 'd', 'e', 'f', 'g', 'h'], 'labels': [[1,0,0],
[0,1,0], [1,1,0], None , None , None , None , None]}
data = pd.DataFrame(data)
data
what I want to do is to display the data from columns "corpus" associated with column ‘type’ ‘pool’ to a user add some labels to it. After that, my program should be able insert in the dataset the labels added by the user to each of the texts displayed. With the code below, the program is adding the last label entered by the user and replacing all the labels of the original dataset.
for row, c in data.iterrows():
if c['type'] == 'pool':
a = input(f"Please enter your labels for
the below text: nn {c['corpus']}")
data['labels'] = a
So, my output current output is:
type corpus labels
0 train a 0,0,1
1 train b 0,0,1
2 train c 0,0,1
7 pool h 0,0,1
4 pool e 0,0,1
3 pool d 0,0,1
5 pool f 0,0,1
6 pool g 0,0,1
my goal is:
type corpus labels
0 train a [1, 0, 0]
1 train b [0, 1, 0]
2 train c [1, 1, 0]
7 pool h [1, 0, 0]
4 pool e [0, 0, 1]
3 pool d [1, 1, 1]
5 pool f [0, 1, 0]
6 pool g [0, 0, 1]
There are two things to fix with the code:
Firstly if you assign a
to data['labels']
you are actually assigning it to the whole column (this is why you get the same value in all rows).
Secondly assigning the return from input
will assign a string but the other rows contained a list of ints. To solve this we can use split
to get the elements, map int
to those and assing using df.at
import pandas as pd
data = {
"type": ["train", "train", "train", "pool", "pool", "pool", "pool", "pool"],
"index": [0, 1, 2, 3, 4, 5, 6, 7],
"corpus": ["a", "b", "c", "d", "e", "f", "g", "h"],
"labels": [[1, 0, 0], [0, 1, 0], [1, 1, 0], None, None, None, None, None],
}
data = pd.DataFrame(data)
print(data)
for idx, row in data.iterrows():
if row["type"] == "pool":
a = input(f"Please enter your labels for the below text: nn {row['corpus']} ")
data.at[idx, "labels"] = list(map(int, a.split(",")))
print(data)