How to make recurrent list of lists with while-loop

Question:

I have a file with the format turn_index t sentence t metadata and looks like this, where the length of dialogues (i.e. turns) is variable:

0 hello metadata1
1 hi! metadata2
0 hi there metadata3
1 how are you? metadata4
2 very well meta5
3 I’m so busy today meta6

I would like to group two turns in a list, and group all same-dialogue lists in big list:
[["hello", "hi!"]]
[["hi there", "how are you?"], ["how are you?", "very well"]["very well", "I'm so busy today"]]
My attempt at windowing the sentences two at a time is not working, and I can’t even begin figure out how to group per dialogue. My code is the following:

turns = data.readlines()
window_size = 2
i = 0
j = 0
dialogue = []
while i < len(turns) - window_size + 1:
   restart = False
   dialogue=[]
   for turn in turns:
       sec = turn.rstrip().split("t")
       double_sent = [sec[0], sec[1]]
       i += 1
Asked By: zazzylele

||

Answers:

A solution to fit the edited output. Dialogues will hold all lists of lists you mentioned.

dialogues = []
double_sent = []
for line1, line2 in zip(turns[:-1], turns[1:]):
    if int(line2.split('t')[0])-int(line1.split('t')[0]) == 1:
        double_sent.append([line1.split('t')[1], line2.split('t')[1]])
    else:
        dialogues.append(double_sent)
        double_sent = []
dialogues.append(double_sent.copy())

Here

zip(turns[:-1], turns[1:])

is is a neat expression to always select two subsequent elements of something. This is definitely something useful to remember.

The next line

if int(line2.split('t')[0])-int(line1.split('t')[0]) == 1

checks whether the turn numbering of the selected lines are following each other. This condition will fail only if you have a switch back to 0, which indicates that a dialogue is finished and can be appended to the dialogues list. If there is an error in the numbering this will give a wrong output.

# Output
>>> dialogues
>>> [[['hello', 'hi!']], [['hi there', 'how are you?'], ['how are you?', 'very well'], ['very well', "I'm so busy today"]]]
Answered By: Flow
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.