Python3: how to count columns in an external file

Question:

I am trying to count the number of columns in external files. Here is an example of a file, data.dat. Please note that it is not a CSV file. The whitespace is made up of spaces. Each file may have a different number of spaces between the columns.

Data                    Z-2              C+2
m_[a/b]                -155555.0        -133333.0
n_[a/b]                -188800.0        -133333.0
o_[a/b*Y]              -13.5            -17.95
p1_[cal/(a*c)]         -0.01947          0.27
p2_[a/b]               -700.2           -200.44
p3_(a*Y)/(b*c)          5.2966           6.0000
p4_[(a*Y)/b]           -22222.0         -99999.0
q1_[b/(b*Y)]            9.0             -6.3206
q2_[c]                 -25220.0         -171917.0
r_[a/b]                 1760.0           559140
s                       4.0             -4.0

I experimented with split(" ") but could not figure out how to get it to recognize multiple whitespaces; it counted each whitespace as a separate column.

This seems promising but my attempt only counts the first column. It may seem silly to attempt a CSV method to deal with a non-CSV file. Maybe this is where my problems are coming from. However, I have used CSV methods before to deal with text files.

For example, I import my data:

 with open(data) as csvfile:
     reader = csv.DictReader(csvfile)
     n_cols = len(reader.fieldnames)

When I use this, only the first column is recognized. The code is too long to post but I know this is happening because when manually enter n_cols = 3, I do get the results I expect.

It does work if I use commas to delimit the columns, but I can’t do that (I need to use whitespace).

Does anyone know an alternative method that deals with arbitrary whitespace and non-CSV files? Thank you for any advice.

Asked By: Ant

||

Answers:

Yes, there are alternative methods:

Pandas

import pandas as pd
df = pd.read_csv('data.dat', delim_whitespace=True) 

NumPy

arr = np.loadtxt('data.dat', dtype='str')
# or 
arr = np.genfromtxt('data.dat',dtype='str')

Python’s csv

If you want to use the python’s csv library, you can normalize the whitespaces first before reading it, eg:

import re
with open('data.dat') as csvfile:
    content = csvfile.read().strip()
    normalized_content = re.sub(r' +', r' ', content)
    reader = csv.reader(normalized_content.split('n'), delimiter=' ')
    
Answered By: ahmelq
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.