Bleu_score in NLTK library

Question:

I am new to using the nltk library. I want to find the two most similar strings. In doing so, I used the ‘bleu_score’ as follows:

import nltk
from nltk.translate import bleu
from nltk.translate.bleu_score import SmoothingFunction
smoothie = SmoothingFunction().method4```


C1 = 'FISSEN Ltds'
C2 = 'FISSEN Ltds Maschinen- und Werkzeugbau'
C3 = 'V.R.P. Baumaschinen Ltds'
print('BLEUscore1:',bleu([C1], C2, smoothing_function=smoothie, auto_reweigh=False))
print('BLEUscore2:',bleu([C2], C3, smoothing_function=smoothie, auto_reweigh=False))
print('BLEUscore3:',bleu([C1], C3, smoothing_function=smoothie, auto_reweigh=False))

The output is like this:

BLEUscore1: 0.2585784506653774
BLEUscore2: 0.26042143846335913
BLEUscore3: 0.1472821272412462

I wonder why the results show the best similarity between C2 and C3 while C1 and C2 are the best answers. And what is the best way to assess this similarity between two strings whose answer is C1 and C2?

I appreciate any help you can provide 🙂

Asked By: Hadi

||

Answers:

You can try with SequenceMatcher;

from difflib import SequenceMatcher

C1 = 'FISSEN Ltds'
C2 = 'FISSEN Ltds Maschinen- und Werkzeugbau'
C3 = 'V.R.P. Baumaschinen Ltds'

print(SequenceMatcher(None, C1, C2).ratio())
print(SequenceMatcher(None, C2, C3).ratio())
print(SequenceMatcher(None, C1, C3).ratio())

# Output ->
# 0.4489795918367347
# 0.3548387096774194
# 0.2857142857142857

Hope this Helps…

Answered By: Sachin Kohli
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.