Creating Dictionary and Integer Key for List of Strings in Python

Question:

I have a list of unicode string lists.

Each string list represents a different document with the strings representing the authors’ names. Some documents have only one author while other documents can have multiple co-authors.

For example, a sample of authorship of three documents looks like this:

authors = [[u'Smith, J.', u'Williams, K.', u'Daniels, W.'], [u'Smith, J.'], [u'Williams, K.', u'Daniels, W.']]

I want to convert my list into a dictionary and list.

First, a dictionary that provides an integer key for each name:

author_name = {0: u'Smith, J.', 1: u'Williams, K.', 2: u'Daniels, W.'}

Second, a list that identifies the authors for each document by the integer key:

doc_author = [[0, 1, 2], [0], [1, 2]]

What is the most efficient way to create these?

FYI: I need my author data in this format to run a pre-built author-topic LDA algorithm written in Python.

Asked By: Rhymenoceros

||

Answers:

You need to invert your author_name dictionary; after that the conversion of your list is trivial, using a nested list comprehension:

author_to_id = {name: id for id, name in author_name.items()}

doc_author = [[author_to_id[name] for name in doc] for doc in authors]

Demo:

>>> authors = [[u'Smith, J.', u'Williams, K.', u'Daniels, W.'], [u'Smith, J.'], [u'Williams, K.', u'Daniels, W.']]
>>> author_name = {0: u'Smith, J.', 1: u'Williams, K.', 2: u'Daniels, W.'}
>>> author_to_id = {name: id for id, name in author_name.items()}
>>> [[author_to_id[name] for name in doc] for doc in authors]
[[0, 1, 2], [0], [1, 2]]
Answered By: Martijn Pieters
lst=['person', 'bicycle', 'car', 'motorbike', 'bus', 'truck' ]
dct = {}

for key, val in enumerate(lst):
    dct[key] = val

print(dct)


***output***
{0: 'person', 1: 'bicycle', 2: 'car', 3: 'motorbike', 4: 'bus', 5: 'truck'}
Answered By: Bharath Kumar
### list of lists
authors = [[u'Smith, J.', u'Williams, K.', u'Daniels, W.'], [u'Smith, J.'], [u'Williams, K.', u'Daniels, W.']]


###flat lists
flat_list = [x for xs in authors for x in xs]
# print(flat_list)

### remove duplicates
res = [*set(flat_list)]
# print(res)

### create dict
dct = {}
for key, val in enumerate(res):
    dct[key] = val

print(dct)


**output**

{0: 'Daniels, W.', 1: 'Williams, K.', 2: 'Smith, J.'}
Answered By: Bharath Kumar
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.