Pass elements of a list in a function

Question:

I have a function that is able to create triples and relationships from text. However, when I create a list of a column that contains text and pass it through the function, it only processes the first row, or item of the list. Therefore, I am wondering how the whole list can be processed within this function. Maybe a for loop would work?

The following line contains the list

rez_dictionary = {'Decent Little Reader, Poor Tablet',
 'Ok For What It Is',
 'Too Heavy and Poor weld quality,',
 'difficult mount',
 'just got it installed'}
from transformers import pipeline

triplet_extractor = pipeline('text2text-generation', model='Babelscape/rebel-large', tokenizer='Babelscape/rebel-large')

# We need to use the tokenizer manually since we need special tokens.
extracted_text = triplet_extractor.tokenizer.batch_decode([triplet_extractor(rez_dictionary, return_tensors=True, return_text=False)[0]["generated_token_ids"]])

print(extracted_text[0])

If anyone has a suggestion, I am looking forward for it.

Would it also be possible to get the output adjusted to the following format:

# Function to parse the generated text and extract the triplets
def extract_triplets(text):
    triplets = []
    relation, subject, relation, object_ = '', '', '', ''
    text = text.strip()
    current = 'x'
    for token in text.replace("<s>", "").replace("<pad>", "").replace("</s>", "").split():
        if token == "<triplet>":
            current = 't'
            if relation != '':
                triplets.append({'head': subject.strip(), 'type': relation.strip(),'tail': object_.strip()})
                relation = ''
            subject = ''
        elif token == "<subj>":
            current = 's'
            if relation != '':
                triplets.append({'head': subject.strip(), 'type': relation.strip(),'tail': object_.strip()})
            object_ = ''
        elif token == "<obj>":
            current = 'o'
            relation = ''
        else:
            if current == 't':
                subject += ' ' + token
            elif current == 's':
                object_ += ' ' + token
            elif current == 'o':
                relation += ' ' + token
    if subject != '' and relation != '' and object_ != '':
        triplets.append({'head': subject.strip(), 'type': relation.strip(),'tail': object_.strip()})
    return triplets
extracted_triplets = extract_triplets(extracted_text[0])
print(extracted_triplets)
Asked By: Timskouten

||

Answers:

You are removing the other entries of rez_dictionary inside the batch_decode:

triplet_extractor(rez_dictionary, return_tensors=True, return_text=False)[0]["generated_token_ids"]

Use a list comprehension instead:

from transformers import pipeline

rez = ['Decent Little Reader, Poor Tablet',
 'Ok For What It Is',
 'Too Heavy and Poor weld quality,',
 'difficult mount',
 'just got it installed']

triplet_extractor = pipeline('text2text-generation', model='Babelscape/rebel-large', tokenizer='Babelscape/rebel-large')

model_output = triplet_extractor(rez, return_tensors=True, return_text=False)

extracted_text = triplet_extractor.tokenizer.batch_decode([x["generated_token_ids"] for x in model_output])
print("n".join(extracted_text))

Output:

<s><triplet> Decent Little Reader <subj> Poor Tablet <obj> different from <triplet> Poor Tablet <subj> Decent Little Reader <obj> different from</s>
<s><triplet> Ok For What It Is <subj> film <obj> instance of</s>
<s><triplet> Too Heavy and Poor <subj> weld quality <obj> subclass of</s>
<s><triplet> difficult mount <subj> mount <obj> subclass of</s>
<s><triplet> 2008 Summer Olympics <subj> 2008 <obj> point in time</s>

Regarding the extension of the OP’s question, OP wanted to know how to run the function extract_triplets. OP can simply do that via a for-loop:

for text in extracted_text:
  print(extract_triplets(text))

Output:

[{'head': 'Decent Little Reader', 'type': 'different from', 'tail': 'Poor Tablet'}, {'head': 'Poor Tablet', 'type': 'different from', 'tail': 'Decent Little Reader'}]
[{'head': 'Ok For What It Is', 'type': 'instance of', 'tail': 'film'}]
[{'head': 'Too Heavy and Poor', 'type': 'subclass of', 'tail': 'weld quality'}]
[{'head': 'difficult mount', 'type': 'subclass of', 'tail': 'mount'}]
[{'head': '2008 Summer Olympics', 'type': 'point in time', 'tail': '2008'}]
Answered By: cronoik