Pandas read csv "usecols=" wont allow filtering by string and index at same time

Question:

I am trying to include the index column in a dataframe pandas reads from a .csv file, but when filtering with "usecols=" I am getting the following error message if I try to include the index column with the data:

‘usecols’ must either be list-like of all strings, all unicode, all
integers or a callable.

Here is my .csv file data, and though the column has a name "Date" it is unable to see it since it’s in the 3rd row:

csv file

and this is my code:

df = pd.read_csv('mydata.csv', index_col=0, nrows=0) 
use_col = [col_name for col_name in list(df.columns) if col_name.startswith('M6A=F')] #filters columns
dti = [0]
df = pd.read_csv('mydata.csv', index_col=[0], usecols=dti+use_col)    

CAVEAT: I dont want to read the entire csv file into a dataframe, as this in a function being parsed to json via FastAPI and gets really slow that way, so needs to be filtered before reading the csv file, hence "usecols=" has to be used.

What I have tried so far:
I looked at this stackoverflow solution but I think that works because its using integers in both cases, while mine is mixing integer and string, unless I have misunderstood the error.

I also tried pandas concat method reading the csv into two dataframes, once to get the index column and then the data, but couldnt seem to get them to join in a way that the index lined up with the data, and it felt like I was creating additional room for error in the output.

There is a suggestion here which I will try if I cannot get a better solution, but it feels like a dubious method and the OP was posted 8 years ago.

Asked By: mdkb

||

Answers:

For me working callable by function:

f = lambda x: 'M6A=F' in x or 'Unnamed: 0' in x
df = pd.read_csv('hist1day.csv', usecols=f, index_col=[0], skiprows=[2])
print(df)
                         M6A=F             M6A=F.1             M6A=F.2  
NaN                       Open                High                 Low   
2009-03-20  0.6830999851226807  0.6830999851226807  0.6830999851226807   
2009-03-23  0.6995000243186951  0.6995000243186951  0.6995000243186951   
2009-03-24  0.6998000144958496  0.7056999802589417  0.6912999749183655   
2009-03-25  0.6941999793052673  0.7014999985694885  0.6894999742507935   
                       ...                 ...                 ...   
2022-08-04  0.6949999928474426   0.699400007724762  0.6938999891281128   
2022-08-05  0.6972000002861023  0.6978999972343445  0.6873999834060669   
2022-08-08  0.6899999976158142  0.7013000249862671  0.6899999976158142   
2022-08-09  0.6987000107765198  0.6998000144958496  0.6956999897956848   
2022-08-10  0.7081999778747559  0.7091000080108643  0.7080000042915344   

                       M6A=F.3             M6A=F.4 M6A=F.5  
NaN                      Close           Adj Close  Volume  
2009-03-20  0.6830999851226807  0.6830999851226807     0.0  
2009-03-23  0.6995000243186951  0.6995000243186951   392.0  
2009-03-24  0.6963000297546387  0.6963000297546387   588.0  
2009-03-25  0.6901999711990356  0.6901999711990356   616.0  
                       ...                 ...     ...  
2022-08-04  0.6985999941825867  0.6985999941825867  6604.0  
2022-08-05   0.691100001335144   0.691100001335144  9935.0  
2022-08-08  0.6984000205993652  0.6984000205993652  8102.0  
2022-08-09   0.695900022983551   0.695900022983551  8102.0  
2022-08-10  0.7085000276565552  0.7085000276565552   305.0  

[3373 rows x 6 columns]
Answered By: jezrael
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.