How to extract specific part of a line in a text in Python

Question:

I have a huge file that I splitted in a series of lines with the function text.splitlines(). From these lines I need to specifically extract some informations corresponding to a keyword: "ref-p". What I did is:

for index, line in enumerate(tpr_linee):
    ref = "ref-p"
    a = []
    if ref in line:

        a.append(line)

        print(a)

what I obtained is:

  1  ['   ref-p (3x3):']
  2  ['      ref-p[    0]={ 1.00000e+00,  0.00000e+00,  0.00000e+00}']
  3  ['      ref-p[    1]={ 0.00000e+00,  1.00000e+00,  0.00000e+00}']
  4  ['      ref-p[    2]={ 0.00000e+00,  0.00000e+00,  1.00000e+00}']

now I need to move the three series of number into a dictionary in the form:

{ref-p: [[number, number, number], [number, number, number], etc]}.

Also, in the larger dataset the array 3×3 may be a different shape in different files.

So my main goal is to find a way to extract all the numbers corresponding to ref-p, taking only the numbers and ignoring the first appearance of ref-p key.

Asked By: Disobey1991

||

Answers:

I have edited the first part of your code, so that the list a will contain a list of strings to be analysed.

Then I split each string based on "=" (equal) sign, and strip the curly braces "{" and "}" to extract only the string of numbers.

When converting to float, the numbers are just 0.0 and 1.0. Try this:

a = []
for index, line in enumerate(tpr_linee):
    if 'ref-p' in line:
        a.append(line)
print(a)

a = ['   ref-p (3x3):', 
     '      ref-p[    0]={ 1.00000e+00,  0.00000e+00,  0.00000e+00}', 
     '      ref-p[    1]={ 0.00000e+00,  1.00000e+00,  0.00000e+00}', 
     '      ref-p[    2]={ 0.00000e+00,  0.00000e+00,  1.00000e+00}' ]

result = {'ref-p': []}
for strg in a:
    if '=' in strg:
        num_list = strg.split('=')[-1].strip('{').strip('}').split(',')
        print(num_list)
        result['ref-p'].append([float(e.strip()) for e in num_list])
print(result)

Output

[' 1.00000e+00', '  0.00000e+00', '  0.00000e+00']
[' 0.00000e+00', '  1.00000e+00', '  0.00000e+00']
[' 0.00000e+00', '  0.00000e+00', '  1.00000e+00']
{'ref-p': [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]]}
Answered By: perpetualstudent

Try this:

import ast 

out = []
for index, line in enumerate(tpr_linee):
    ref = "ref-p"
    if ref in line:
        try:
            line1 = line.split('=')[1].replace('{', '(').replace('}', ')')
            line1 = ast.literal_eval(line1)
            out.append(line1)
        except:
            continue
print(out)

[(1.0, 0.0, 0.0), (0.0, 1.0, 0.0), (0.0, 0.0, 1.0)]
Answered By: threadfin
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.