Python reading a csv file, and skipping the non fixed length header part

Question:

I am reading a number of files, with non fixed-length headers included, and don’t know hov to skip the “header part” until the data of interest appears. The file content looks like below, i am always interested in the contents after the line "Measurement values:" can i somehow use panda’s read_csv‘s skiprows argument, combined with a search string, or similar, to weed out the header part ?

Any inputs are welcome 🙂

Data of the Experiment
Test started: Wed Mar 07 08:10:32 CET 2018
Time     Revolutions     Axial Force     Radial Force
0        0        0        0
10        3000        0        4000
172800        3000        0        4000
172800        2000        0        4000
180000        2000        0        4000
237600        3000        0        22000
237600        2000        0        22000
244800        2000        0        22000
244800        1000        0        22000
252000        1000        0        22000
252000        3000        0        4000
259200        3000        0        4000
Critical Temperature 1: 110
Critical Temperature 2: 120
Critical Temperature 3: 120
Critical Temperature 4: 110
Critical Vibration level: 3500
Critical Torque: 7000
Measurement values:
Time:   Seconds elapsed [s] Torque [Nm] Speed [1/s] 
20180307081032: 210025.02   5.25    0.00    
20180307081033: 210025.98   17.50   3000.00 
20180307081034: 210026.97   1688.75 3000.00 
.
.
Asked By: opprud

||

Answers:

i have used below to skip first line while reading excel, you can do same for the csv file.
df = pandas.read_excel(excelFile, header=2)

Answered By: avinashse

I am not sure if this is the correct approach.

import pandas as pd
df = pd.read_csv(r"filename.csv")
lineNumber = 0
for i, v in enumerate(df.to_string(index=False).split("n"), 1):
    if "Measurement values" in v:
        lineNumber = i                          #Find line number of "Measurement values"
        break

df = pd.read_csv(r"filename.csv", skiprows=lineNumber)    #Read file again with lineNumber 
print(df)

Output:

  Time:   Seconds elapsed [s] Torque [Nm] Speed [1/s] 
0       20180307081032: 210025.02   5.25    0.00      
1       20180307081033: 210025.98   17.50   3000.00   
2       20180307081034: 210026.97   1688.75 3000.00  

There should be solution without reading the file twice.

Answered By: Rakesh

Very similar to Rakesh’s answer but without reading the whole file just to find the line with “Measurement values:”

import pandas as pd

file_name = r"filename.csv"

line_number = -1

with open(file_name, "r") as in_file:
    for i, line in enumerate(in_file, 1):
        if line.startswith("Measurement values:"):
            lineNumber = i
            break

if line_number == -1:
    raise RuntimeError("Could not find end of header")

df = pd.read_csv(file_name, skiprows = line_number)
print(df)
Answered By: James Elderfield

I’m not too familiar with pandas, but something like this should work for standard file I/O based on my own experience, and I hope the general strategy is transferrable:

data_file = open("filename.csv", "r")
data_file_line = ""
while not data_file_line.startswith("Measurement values:"):
    data_file_line = data_file.readline()
data_file_lines_minus_header = np.append(data_file_line, data_file.readlines())

I hope this proves helpful to someone!

Answered By: Zack Carter
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.