Python – Swap rows and columns, while creating a dictionary with it

Question:

I´m kind of new to Python, and I am trying to convert a list comprehension (Hands-on Data Analysis with Pandas by S.Molin) into a "normal" for loop, just for the mere purpose of practising.

Initially, the data comes from a CSV file and is loaded using Numpy. The result is each CSV row as a single array (void type) as follows:

array([(‘2018-10-13 11:10:23.560’, ‘262km NW of Ozernovskiy, Russia’, ‘mww’, 6.7, ‘green’, 1), (‘2018-10-13 04:34:15.580′, ’25km E of
Bitung, Indonesia’, ‘mww’, 5.2, ‘green’, 0), (‘2018-10-13
00:13:46.220′, ’42km WNW of Sola, Vanuatu’, ‘mww’, 5.7, ‘green’, 0),
(‘2018-10-12 21:09:49.240′, ’13km E of Nueva Concepcion, Guatemala’,
‘mww’, 5.7, ‘green’, 0), (‘2018-10-12 02:52:03.620’, ‘128km SE of
Kimbe, Papua New Guinea’, ‘mww’, 5.6, ‘green’, 1)], dtype=[(‘time’,
‘<U23’), (‘place’, ‘<U37’), (‘magType’, ‘<U3’), (‘mag’, ‘<f8’),
(‘alert’, ‘<U5’), (‘tsunami’, ‘<i4’)])

What I am trying is to alter it so that I get each column as an array of values, whose keys are the name of the columns:

{‘time’: array([‘2018-10-13 11:10:23.560’, ‘2018-10-13 04:34:15.580′,’2018-10-13 00:13:46.220’, ‘2018-10-12 21:09:49.240’,
‘2018-10-12 02:52:03.620′], dtype='<U23’), ‘place’: array([‘262km NW
of Ozernovskiy, Russia’, ’25km E of Bitung, Indonesia’, ’42km WNW of
Sola, Vanuatu’,’13km E of Nueva Concepcion, Guatemala’,’128km SE of
Kimbe, Papua New Guinea’], dtype='<U37′), ‘magType’: array([‘mww’,
‘mww’, ‘mww’, ‘mww’, ‘mww’], dtype='<U3′), ‘mag’: array([6.7, 5.2,
5.7, 5.7, 5.6]), ‘alert’: array([‘green’, ‘green’, ‘green’, ‘green’, ‘green’], dtype='<U5′), ‘tsunami’: array([1, 0, 0, 0, 1])}

The List comprehension used for this purpose is:

array_dict = {col: np.array([row[i] for row in data]) for i, col in enumerate(data.dtype.names)}

The solution I got so far is:

d ={}
for i,col in enumerate(data.dtype.names):
    for row in data:
        d[col].append(row[i])

I get the following error:

*---------
KeyError                                  Traceback (most recent call last)
Input In [51], in <cell line: 2>()
      2 for i,col in enumerate(data.dtype.names):
      3     for row in data:
----> 4         d[col].append(row[i])
KeyError: 'time'*

I have researched a bit online and it could be related to the data type column "time". My guess, but I am pretty sure I am wrong, is that in the list comprehension each column is created as NumPy array directly, whereas here I am not setting it to be as such beforehand (and hence the problem with the data type).

Any help would be highly appreciated. Many thanks!

Asked By: JaviCV

||

Answers:

To produce the same result as the dictionary comprehension that you’ve provided:

d = {}
for i, col in enumerate(data.dtype.names):
    values = []
    for row in data:
        values.append(row[i])
    d[col] = np.array(values)

The error that you’re getting is due to the fact that your dictionary d is empty (you have created it like so: d = {}. It does not contain the key ‘time’. You could create the key like this: d['time'] = some_value, but you can’t just access it if it doesn’t exist.

If you want, you can use the collections.defaultdict. With it, you don’t have to create the keys. If you access non-existend keys, the default value will be returned.

With your original code it would look like this:

from collections import defaultdict

d = defaultdict(list)

for i, col in enumerate(data.dtype.names):
    for row in data:
        d[col].append(row[i])

dict(d)

Then however, the values of your dictionary are not np.ndarays, but simple lists.

Answered By: Vladimir Fokow

A better way of displaying your array is (in ipython):

In [29]: x
Out[29]: 
array([('2018-10-13 11:10:23.560', '262km NW of Ozernovskiy, Russia', 'mww', 6.7, 'green', 1),
       ('2018-10-13 04:34:15.580', '25km E of Bitung, Indonesia', 'mww', 5.2, 'green', 0),
       ('2018-10-13 00:13:46.220', '42km WNW of Sola, Vanuatu', 'mww', 5.7, 'green', 0),
       ('2018-10-12 21:09:49.240', '13km E of Nueva Concepcion, Guatemala', 'mww', 5.7, 'green', 0),
       ('2018-10-12 02:52:03.620', '128km SE of Kimbe, Papua New Guinea', 'mww', 5.6, 'green', 1)],
      dtype=[('time', '<U23'), ('place', '<U37'), ('magType', '<U3'), ('mag', '<f8'), ('alert', '<U5'), ('tsunami', '<i4')])
In [30]: x.dtype
Out[30]: dtype([('time', '<U23'), ('place', '<U37'), ('magType', '<U3'), ('mag', '<f8'), ('alert', '<U5'), ('tsunami', '<i4')])
In [31]: x.shape
Out[31]: (5,)

It highlights the fact you have a structured array with 5 elements, and 6 fields (a compound dtype).

You can create a dict of all fields with:

In [32]: adict = {}
    ...: for i in x.dtype.names:
    ...:     adict[i] = x[i]
    ...: 
In [33]: adict
Out[33]: 
{'time': array(['2018-10-13 11:10:23.560', '2018-10-13 04:34:15.580',
        '2018-10-13 00:13:46.220', '2018-10-12 21:09:49.240',
        '2018-10-12 02:52:03.620'], dtype='<U23'),
 'place': array(['262km NW of Ozernovskiy, Russia', '25km E of Bitung, Indonesia',
        '42km WNW of Sola, Vanuatu',
        '13km E of Nueva Concepcion, Guatemala',
        '128km SE of Kimbe, Papua New Guinea'], dtype='<U37'),
 'magType': array(['mww', 'mww', 'mww', 'mww', 'mww'], dtype='<U3'),
 'mag': array([6.7, 5.2, 5.7, 5.7, 5.6]),
 'alert': array(['green', 'green', 'green', 'green', 'green'], dtype='<U5'),
 'tsunami': array([1, 0, 0, 0, 1], dtype=int32)}

x['time'] is the time field for all records. No need to iterate on records.

This structured array is not 2d; it is 1d, with records and fields (rows and columns is a common way of describing a 2d numeric array, though the terms are technically part of the numpy description.)

In [37]: x['mag']
Out[37]: array([6.7, 5.2, 5.7, 5.7, 5.6])
Answered By: hpaulj
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.