How to extract specific part of a line in a text in Python
Question:
I have a huge file that I splitted in a series of lines with the function text.splitlines()
. From these lines I need to specifically extract some informations corresponding to a keyword: "ref-p". What I did is:
for index, line in enumerate(tpr_linee):
ref = "ref-p"
a = []
if ref in line:
a.append(line)
print(a)
what I obtained is:
1 [' ref-p (3x3):']
2 [' ref-p[ 0]={ 1.00000e+00, 0.00000e+00, 0.00000e+00}']
3 [' ref-p[ 1]={ 0.00000e+00, 1.00000e+00, 0.00000e+00}']
4 [' ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 1.00000e+00}']
now I need to move the three series of number into a dictionary in the form:
{ref-p: [[number, number, number], [number, number, number], etc]}
.
Also, in the larger dataset the array 3×3 may be a different shape in different files.
So my main goal is to find a way to extract all the numbers corresponding to ref-p
, taking only the numbers and ignoring the first appearance of ref-p
key.
Answers:
I have edited the first part of your code, so that the list a
will contain a list of strings to be analysed.
Then I split each string based on "=" (equal) sign, and strip the curly braces "{" and "}" to extract only the string of numbers.
When converting to float, the numbers are just 0.0 and 1.0. Try this:
a = []
for index, line in enumerate(tpr_linee):
if 'ref-p' in line:
a.append(line)
print(a)
a = [' ref-p (3x3):',
' ref-p[ 0]={ 1.00000e+00, 0.00000e+00, 0.00000e+00}',
' ref-p[ 1]={ 0.00000e+00, 1.00000e+00, 0.00000e+00}',
' ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 1.00000e+00}' ]
result = {'ref-p': []}
for strg in a:
if '=' in strg:
num_list = strg.split('=')[-1].strip('{').strip('}').split(',')
print(num_list)
result['ref-p'].append([float(e.strip()) for e in num_list])
print(result)
Output
[' 1.00000e+00', ' 0.00000e+00', ' 0.00000e+00']
[' 0.00000e+00', ' 1.00000e+00', ' 0.00000e+00']
[' 0.00000e+00', ' 0.00000e+00', ' 1.00000e+00']
{'ref-p': [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]]}
Try this:
import ast
out = []
for index, line in enumerate(tpr_linee):
ref = "ref-p"
if ref in line:
try:
line1 = line.split('=')[1].replace('{', '(').replace('}', ')')
line1 = ast.literal_eval(line1)
out.append(line1)
except:
continue
print(out)
[(1.0, 0.0, 0.0), (0.0, 1.0, 0.0), (0.0, 0.0, 1.0)]
I have a huge file that I splitted in a series of lines with the function text.splitlines()
. From these lines I need to specifically extract some informations corresponding to a keyword: "ref-p". What I did is:
for index, line in enumerate(tpr_linee):
ref = "ref-p"
a = []
if ref in line:
a.append(line)
print(a)
what I obtained is:
1 [' ref-p (3x3):']
2 [' ref-p[ 0]={ 1.00000e+00, 0.00000e+00, 0.00000e+00}']
3 [' ref-p[ 1]={ 0.00000e+00, 1.00000e+00, 0.00000e+00}']
4 [' ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 1.00000e+00}']
now I need to move the three series of number into a dictionary in the form:
{ref-p: [[number, number, number], [number, number, number], etc]}
.
Also, in the larger dataset the array 3×3 may be a different shape in different files.
So my main goal is to find a way to extract all the numbers corresponding to ref-p
, taking only the numbers and ignoring the first appearance of ref-p
key.
I have edited the first part of your code, so that the list a
will contain a list of strings to be analysed.
Then I split each string based on "=" (equal) sign, and strip the curly braces "{" and "}" to extract only the string of numbers.
When converting to float, the numbers are just 0.0 and 1.0. Try this:
a = []
for index, line in enumerate(tpr_linee):
if 'ref-p' in line:
a.append(line)
print(a)
a = [' ref-p (3x3):',
' ref-p[ 0]={ 1.00000e+00, 0.00000e+00, 0.00000e+00}',
' ref-p[ 1]={ 0.00000e+00, 1.00000e+00, 0.00000e+00}',
' ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 1.00000e+00}' ]
result = {'ref-p': []}
for strg in a:
if '=' in strg:
num_list = strg.split('=')[-1].strip('{').strip('}').split(',')
print(num_list)
result['ref-p'].append([float(e.strip()) for e in num_list])
print(result)
Output
[' 1.00000e+00', ' 0.00000e+00', ' 0.00000e+00']
[' 0.00000e+00', ' 1.00000e+00', ' 0.00000e+00']
[' 0.00000e+00', ' 0.00000e+00', ' 1.00000e+00']
{'ref-p': [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]]}
Try this:
import ast
out = []
for index, line in enumerate(tpr_linee):
ref = "ref-p"
if ref in line:
try:
line1 = line.split('=')[1].replace('{', '(').replace('}', ')')
line1 = ast.literal_eval(line1)
out.append(line1)
except:
continue
print(out)
[(1.0, 0.0, 0.0), (0.0, 1.0, 0.0), (0.0, 0.0, 1.0)]