Generating the plural form of a noun

Question:

Given a word, which may or may not be a singular-form noun, how would you generate its plural form?

Based on this NLTK tutorial and this informal list on pluralization rules, I wrote this simple function:

def plural(word):
    """
    Converts a word to its plural form.
    """
    if word in c.PLURALE_TANTUMS:
        # defective nouns, fish, deer, etc
        return word
    elif word in c.IRREGULAR_NOUNS:
        # foot->feet, person->people, etc
        return c.IRREGULAR_NOUNS[word]
    elif word.endswith('fe'):
        # wolf -> wolves
        return word[:-2] + 'ves'
    elif word.endswith('f'):
        # knife -> knives
        return word[:-1] + 'ves'
    elif word.endswith('o'):
        # potato -> potatoes
        return word + 'es'
    elif word.endswith('us'):
        # cactus -> cacti
        return word[:-2] + 'i'
    elif word.endswith('on'):
        # criterion -> criteria
        return word[:-2] + 'a'
    elif word.endswith('y'):
        # community -> communities
        return word[:-1] + 'ies'
    elif word[-1] in 'sx' or word[-2:] in ['sh', 'ch']:
        return word + 'es'
    elif word.endswith('an'):
        return word[:-2] + 'en'
    else:
        return word + 's'

But I think this is incomplete. Is there a better way to do this?

Asked By: Cerin

||

Answers:

First, it’s worth noting that, as the FAQ explains, WordNet cannot generate plural forms.

If you want to use it anyway, you can. With Morphy, WordNet might be able to generate plurals for many nouns… but it still won’t help with most irregular nouns, like “children”.


Anyway, the easy way to use WordNet from Python is via NLTK. One of the NLTK HOWTO docs explains the WordNet Interface. (Of course it’s even easier to just use NLTK without specifying a corpus, but that’s not what you asked for.)

There is a lower-level API to WordNet called pywordnet, but I believe it’s no longer maintained (it became the foundation for the NLTK integration), and only works with older versions of Python (maybe 2.7, but not 3.x) and of WordNet (only 2.x).

Alternatively, you can always access the C API by using ctypes or cffi or building custom bindings, or access the Java API by using Jython instead of CPython.

Or, of course, you can call the command-line interface via subprocess.


Anyway, at least on some installations, if you give the simple Morphy interface a singular noun, it will return its plural, while if you give it a plural noun, it will return its singular. So:

from nltk.corpus import wordnet as wn
assert wn.morphy('dogs') == 'dog'
assert wn.morphy('dog') == 'dog'

This isn’t actually documented, or even implied, to be true, and in fact it’s clearly not true for the OP, so I’m not sure I’d want to rely on it (even if it happens to work on your computer).

The other way around is documented to work, so you could write some rules that apply all possible English plural rules, call morphy on each one, and the first one that returns the starting string is the right plural.

However, the way it’s documented to work is effectively by blindly applying the same kind of rules. So, for example, it will properly tell you that doges is not the plural of dog—but not because it knows dogs is the right answer; only because it knows doge is a different word, and it likes the “+s” rule more than the “+es” rule. So, this isn’t going to be helpful.

Also, as explained above, it has no rules for any irregular plurals—WordNet has no idea that children and child are related in any way.

Also, wn.morphy('reckless') will return 'reckless' rather than None. If you want that, you’ll have to test whether it’s a noun first. You can do this just sticking with the same interface, although it’s a bit hacky:

def plural(word):
    result = wn.morphy(word)
    noun = wn.morphy(word, wn.NOUN)
    if noun in (word, result):
        return result

To do this properly, you will actually need to add a plurals database instead of trying to trick WordNet into doing something it can’t do.

Also, a word can have multiple meanings, and they can have different plurals, and sometimes there are even multiple plurals for the same meaning. So you probably want to start with something like (lemma for s in synsets(word, wn.NOUN) for lemma in s.lemmas if lemma.name == word) and then get all appropriate plurals, instead of just returning “the” plural.

Answered By: abarnert

The pattern-en package offers pluralization

>>> import pattern.en
>>> pattern.en.pluralize("dog")
'dogs'
>>> 
Answered By: arturomp

Another option which supports python 3 is Inflect.

import inflect
engine = inflect.engine()
plural = engine.plural(your_string)
Answered By: alanc10n

Most current pluralize libraries do not return multiple plurals for some irregular words. Some libraries do not enforce the passed parameter is noun and pluralize a word by general rules. So I decided to build a python library – Plurals and Countable, which is open source on github. The main purpose is to get plurals (yes, mutliple plurals for some words), and has an option to return dictionary approved plurals only. It can also return whether a noun is countable/uncountable or either way.

import plurals_counterable as pluc
pluc.pluc_lookup_plurals('octopus', strict_level='dictionary')

will return a dictionary of the following.

{
    'query': 'octopus', 
    'base': 'octopus', 
    'plural': ['octopuses', 'octopi', 'octopodes'], 
    'countable': 'countable'
}

If you query by a noun’s plural, the return also tells which word is its base (singular or plural-only word).

The library actually looks up the words in dictionaries, so it takes some time to request, parse and return. Alternatively, you might use REST API provided by Dictionary.video. You’ll need contact [email protected] to get an API key. The call will be like

import requests
import json
import logging

url = 'https://dictionary.video/api/noun/plurals/octopus?key=YOUR_API_KEY'
response = requests.get(url)
if response.status_code == 200:
    return json.loads(response.text)
else:
    logging.error(url + ' response: status_code[%d]' % response.status_code)
    return None
Answered By: wholehope
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.