How to add data entered by the user to a pandas data frame column?

Question:

I have the following dataset:

import pandas as pd

data = {'type': ['train', 'train', 'train', 'pool', 'pool', 
'pool', 'pool', 'pool'], 'index': [0,1,2,3,4,5,6,7], 'corpus': 
['a','b','c', 'd', 'e', 'f', 'g', 'h'], 'labels': [[1,0,0], 
[0,1,0], [1,1,0], None , None , None  , None , None]}


data = pd.DataFrame(data)

data

what I want to do is to display the data from columns "corpus" associated with column ‘type’ ‘pool’ to a user add some labels to it. After that, my program should be able insert in the dataset the labels added by the user to each of the texts displayed. With the code below, the program is adding the last label entered by the user and replacing all the labels of the original dataset.

for row, c in data.iterrows():
  if c['type'] == 'pool':
    a = input(f"Please enter your labels for 
the below text: nn {c['corpus']}")
    data['labels'] = a

So, my output current output is:

        type     corpus labels
   0    train       a   0,0,1
   1    train       b   0,0,1
   2    train       c   0,0,1
   7    pool        h   0,0,1
   4    pool        e   0,0,1
   3    pool        d   0,0,1
   5    pool        f   0,0,1
   6    pool        g   0,0,1

my goal is:

    type    corpus   labels
0   train       a   [1, 0, 0]
1   train       b   [0, 1, 0]
2   train       c   [1, 1, 0]
7   pool        h   [1, 0, 0]
4   pool        e   [0, 0, 1]
3   pool        d   [1, 1, 1]
5   pool        f   [0, 1, 0]
6   pool        g   [0, 0, 1]
Asked By: ForeverLearner

||

Answers:

There are two things to fix with the code:

Firstly if you assign a to data['labels'] you are actually assigning it to the whole column (this is why you get the same value in all rows).

Secondly assigning the return from input will assign a string but the other rows contained a list of ints. To solve this we can use split to get the elements, map int to those and assing using df.at

import pandas as pd

data = {
    "type": ["train", "train", "train", "pool", "pool", "pool", "pool", "pool"],
    "index": [0, 1, 2, 3, 4, 5, 6, 7],
    "corpus": ["a", "b", "c", "d", "e", "f", "g", "h"],
    "labels": [[1, 0, 0], [0, 1, 0], [1, 1, 0], None, None, None, None, None],
}


data = pd.DataFrame(data)
print(data)

for idx, row in data.iterrows():
    if row["type"] == "pool":
        a = input(f"Please enter your labels for the below text: nn {row['corpus']} ")
        data.at[idx, "labels"] = list(map(int, a.split(",")))
print(data)
Answered By: Matteo Zanoni
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.