Read CSV file to numpy array, first row as strings, rest as float

Question:

I have data stored in a CSV where the first row is strings (column names) and the remaining rows are numbers. How do I store this to a numpy array? All I can find is how to set data type for columns but not for rows.

Right now I’m just skipping the headers to do the calculations but I need to have the headers in the final version. But if I leave the headers in it sets the whole array as string and the calculations fail.

This is what I have:

 data = np.genfromtxt(path_to_csv, dtype=None, delimiter=',', skip_header=1) 
Asked By: postelrich

||

Answers:

The whole idea of a numpy array is that all elements are the same type. Read the headers into a Python list and manage them separately from the numbers. You can also create a structured array (an array of records) and in this case you can use the headers to name the fields in the records. Storing them in the array would be redundant in that case.

Answered By: kindall

I’m not sure what you mean when you say you need the headers in the final version, but you can generate a structured array where the columns are accessed by strings like this:

data = np.genfromtxt(path_to_csv, dtype=None, delimiter=',', names=True)

and then access columns with data['col1_name'], data['col2_name'], etc.

Answered By: user545424

You can keep the column names if you use the names=True argument in the function np.genfromtxt

 data = np.genfromtxt(path_to_csv, dtype=float, delimiter=',', names=True) 

Please note the dtype=float, that will convert your data to float. This is more efficient than using dtype=None, that asks np.genfromtxt to guess the datatype for you.

The output will be a structured array, where you can access individual columns by their name. The names will be taken from your first row. Some modifications may occur, spaces in a column name will be changed to _ for example. The documentation should cover most questions you could have.

Answered By: Pierre GM
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.