iterate through a list and add to add an 1 or 0 to corresponding columns in a data frame

Question:

I am creating a user interface with the code below:

data_pool = {'corpus': ['aa', 'bb', 'cc','dd', 'ee'], 'zero_level_name': ['a', 'b', 
'c','d', 'e'], 'time': ['', '', '', '', ''], 'labels': ['', '', '', '', '']}

data_pool = pd.DataFrame(data_pool)

print(data_pool)

data_pool[['label1', 'label2', 'label3', 'label4']] = ''

index = 0
listA = ['label1', 'label2', 'label3', 'label4']
number_of_instances = int(input('Please, enter the number of texts you want to 
annotate today: '))

while index < number_of_instances:
    row = data_pool.loc[index]
    print("Enter labels for the text or enter 2 to go back to previous:")
    print()
    print('index:', index)
    print()
    print('text :nnn', row['corpus'],'nnn')
    start = time.time()
    label = input(": ")
    end = time.time()
    label.lower()
    if label == '2':
        index -= 1
        if index < 0:
            print('There is no previous row')
    else:
        label = label.split(',')
        label = [i.strip().lower() for i in label]
        for i in label:
            if i not in listA:
                print('Invalid input, try again')
                index -= 1
            else:
                if i == 'label1':
                    data_pool.loc[index, 'label1'] = 1
                elif i == 'label2':
                    data_pool.loc[index, 'label2'] = 1
                elif i == 'label3':
                    data_pool.loc[index, 'label3'] = 1
                elif i == 'label4':
                   data_pool.loc[index, 'label4'] = 1
                data_pool.loc[index, 'zero_level_name'] = label
                data_pool.loc[index, 'time'] = end-start
                break
        index += 1


print(data_pool)

with this code I achieve this dataframe:

  corpus zero_level_name      time labels label1 label2 label3 label4
0     aa        [label1]  5.372776             1                     
1     bb        [label2]  3.291902                    1              
2     cc               c                                             
3     dd               d                                             
4     ee               e   

with this code I am able to assign 1 to column ‘label1’ every time I input the string ‘label1’, assign 1 to column ‘label2’ every time I input the string ‘label2’ and so on.
However, with my code I cannot assign 1 to different columns at the same time if I input two labels at the same time. I want to be able to do this. For instance, if I input ‘label1, label2’ my output should be something like that:

    corpus zero_level_name      time labels label1 label2 label3 label4
0     aa  [label1, label2]  5.372776      1       1                     
1     bb  [label2, label3] 3.291902                    1     1         
2     cc  [label2, label3, label4] 3.548               1     1        1              
3     dd               d                                             
4     ee               e   

how can I achieve this goal?

Asked By: ForeverLearner

||

Answers:

Note that your break is exiting the for loop, try to remove it and see if it solves your problem 🙂
So even if you have two labels that are valid, you exit after the first one

(or maybe you wanted the break, to exit the while loop, but I’m not sure I understand the logic there..)

Answered By: lmaayanl
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.