Converting a text file with both numbers and characters into numpy arrays and list of strings

Question:

This seems to be so easy but I cannot solve it and none of the questions no answers I found here are useful for this case.
I have a text file that has valies like this:

  -4.040     -75.444 8156.648 1.00 ABC2512
  -4.036     -75.444 8161.305 2.00 ABC2512
  -4.032     -75.444 8174.597 3.00 ABC2512
  -4.029     -75.444 8196.432 4.00 ABC2512
  -4.026     -75.444 8212.521 5.00 ABC1240
  -4.012     -75.443 8268.073 11.00 ABC1240
  -4.009     -75.443 8280.411 12.00 ABC1240

Eventually, I want to have 4 different numpy arrays from the first four columns, and a list of strings from the last column. It could be nice if I also could have a header file as the frist row.
So far, I tried to convert it to a datarame like this:

f = open(irh_file)
g = pd.DataFrame(list(f))

but then I cannot split the columns, because the delimiters are both tabs and spaces.

Asked By: Travis_Dudeson

||

Answers:

You can do as:

colnames = ['a', 'b', 'c', 'd', 'e']
data = pd.read_csv('irh_file.txt', delim_whitespace=True, names=colnames)
>>> data
       a       b         c     d        e
0 -4.040 -75.444  8156.648   1.0  ABC2512
1 -4.036 -75.444  8161.305   2.0  ABC2512
2 -4.032 -75.444  8174.597   3.0  ABC2512
3 -4.029 -75.444  8196.432   4.0  ABC2512
4 -4.026 -75.444  8212.521   5.0  ABC1240
5 -4.012 -75.443  8268.073  11.0  ABC1240
6 -4.009 -75.443  8280.411  12.0  ABC1240
Answered By: Syed Hasnain

You can use pd.read_csv to load the data into dataframe:

import pandas as pd

df = pd.read_csv("out1.txt", sep=r"s+", header=None)
print(df)

Prints:

       0       1         2     3        4
0 -4.040 -75.444  8156.648   1.0  ABC2512
1 -4.036 -75.444  8161.305   2.0  ABC2512
2 -4.032 -75.444  8174.597   3.0  ABC2512
3 -4.029 -75.444  8196.432   4.0  ABC2512
4 -4.026 -75.444  8212.521   5.0  ABC1240
5 -4.012 -75.443  8268.073  11.0  ABC1240
6 -4.009 -75.443  8280.411  12.0  ABC1240

Then:

arr_1 = df[0].values
print(arr_1)

arr_5 = df[4].tolist()
print(arr_5)

Prints:

[-4.04  -4.036 -4.032 -4.029 -4.026 -4.012 -4.009]
['ABC2512', 'ABC2512', 'ABC2512', 'ABC2512', 'ABC1240', 'ABC1240', 'ABC1240']
Answered By: Andrej Kesely