Converting a text file with both numbers and characters into numpy arrays and list of strings
Question:
This seems to be so easy but I cannot solve it and none of the questions no answers I found here are useful for this case.
I have a text file that has valies like this:
-4.040 -75.444 8156.648 1.00 ABC2512
-4.036 -75.444 8161.305 2.00 ABC2512
-4.032 -75.444 8174.597 3.00 ABC2512
-4.029 -75.444 8196.432 4.00 ABC2512
-4.026 -75.444 8212.521 5.00 ABC1240
-4.012 -75.443 8268.073 11.00 ABC1240
-4.009 -75.443 8280.411 12.00 ABC1240
Eventually, I want to have 4 different numpy arrays from the first four columns, and a list of strings from the last column. It could be nice if I also could have a header file as the frist row.
So far, I tried to convert it to a datarame like this:
f = open(irh_file)
g = pd.DataFrame(list(f))
but then I cannot split the columns, because the delimiters are both tabs and spaces.
Answers:
You can do as:
colnames = ['a', 'b', 'c', 'd', 'e']
data = pd.read_csv('irh_file.txt', delim_whitespace=True, names=colnames)
>>> data
a b c d e
0 -4.040 -75.444 8156.648 1.0 ABC2512
1 -4.036 -75.444 8161.305 2.0 ABC2512
2 -4.032 -75.444 8174.597 3.0 ABC2512
3 -4.029 -75.444 8196.432 4.0 ABC2512
4 -4.026 -75.444 8212.521 5.0 ABC1240
5 -4.012 -75.443 8268.073 11.0 ABC1240
6 -4.009 -75.443 8280.411 12.0 ABC1240
You can use pd.read_csv
to load the data into dataframe:
import pandas as pd
df = pd.read_csv("out1.txt", sep=r"s+", header=None)
print(df)
Prints:
0 1 2 3 4
0 -4.040 -75.444 8156.648 1.0 ABC2512
1 -4.036 -75.444 8161.305 2.0 ABC2512
2 -4.032 -75.444 8174.597 3.0 ABC2512
3 -4.029 -75.444 8196.432 4.0 ABC2512
4 -4.026 -75.444 8212.521 5.0 ABC1240
5 -4.012 -75.443 8268.073 11.0 ABC1240
6 -4.009 -75.443 8280.411 12.0 ABC1240
Then:
arr_1 = df[0].values
print(arr_1)
arr_5 = df[4].tolist()
print(arr_5)
Prints:
[-4.04 -4.036 -4.032 -4.029 -4.026 -4.012 -4.009]
['ABC2512', 'ABC2512', 'ABC2512', 'ABC2512', 'ABC1240', 'ABC1240', 'ABC1240']
This seems to be so easy but I cannot solve it and none of the questions no answers I found here are useful for this case.
I have a text file that has valies like this:
-4.040 -75.444 8156.648 1.00 ABC2512
-4.036 -75.444 8161.305 2.00 ABC2512
-4.032 -75.444 8174.597 3.00 ABC2512
-4.029 -75.444 8196.432 4.00 ABC2512
-4.026 -75.444 8212.521 5.00 ABC1240
-4.012 -75.443 8268.073 11.00 ABC1240
-4.009 -75.443 8280.411 12.00 ABC1240
Eventually, I want to have 4 different numpy arrays from the first four columns, and a list of strings from the last column. It could be nice if I also could have a header file as the frist row.
So far, I tried to convert it to a datarame like this:
f = open(irh_file)
g = pd.DataFrame(list(f))
but then I cannot split the columns, because the delimiters are both tabs and spaces.
You can do as:
colnames = ['a', 'b', 'c', 'd', 'e']
data = pd.read_csv('irh_file.txt', delim_whitespace=True, names=colnames)
>>> data
a b c d e
0 -4.040 -75.444 8156.648 1.0 ABC2512
1 -4.036 -75.444 8161.305 2.0 ABC2512
2 -4.032 -75.444 8174.597 3.0 ABC2512
3 -4.029 -75.444 8196.432 4.0 ABC2512
4 -4.026 -75.444 8212.521 5.0 ABC1240
5 -4.012 -75.443 8268.073 11.0 ABC1240
6 -4.009 -75.443 8280.411 12.0 ABC1240
You can use pd.read_csv
to load the data into dataframe:
import pandas as pd
df = pd.read_csv("out1.txt", sep=r"s+", header=None)
print(df)
Prints:
0 1 2 3 4
0 -4.040 -75.444 8156.648 1.0 ABC2512
1 -4.036 -75.444 8161.305 2.0 ABC2512
2 -4.032 -75.444 8174.597 3.0 ABC2512
3 -4.029 -75.444 8196.432 4.0 ABC2512
4 -4.026 -75.444 8212.521 5.0 ABC1240
5 -4.012 -75.443 8268.073 11.0 ABC1240
6 -4.009 -75.443 8280.411 12.0 ABC1240
Then:
arr_1 = df[0].values
print(arr_1)
arr_5 = df[4].tolist()
print(arr_5)
Prints:
[-4.04 -4.036 -4.032 -4.029 -4.026 -4.012 -4.009]
['ABC2512', 'ABC2512', 'ABC2512', 'ABC2512', 'ABC1240', 'ABC1240', 'ABC1240']