usecols keyword argument of pd.read_csv says it expects list[str] but the documentation says otherwise
Question:
I’m using PyCharm 2022.3.3 (Professional Edition) and I have the following warning:
MRE:
import pandas as pd; path_to_csv = "mycsv.csv"; df_db = pd.read_csv(path_to_csv, usecols=[0])
The documentation (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) says:
usecols list-like or callable, optional Return a subset of the
columns. If list-like, all elements must either be positional (i.e.
integer indices into the document columns) or strings that correspond
to column names provided either by the user in names or inferred from
the document header row(s). If names are given, the document header
row(s) are not taken into account. For example, a valid list-like
usecols parameter would be [0, 1, 2] or [‘foo’, ‘bar’, ‘baz’]. Element
order is ignored, so usecols=[0, 1] is the same as [1, 0]. To
instantiate a DataFrame from data with element order preserved use
pd.read_csv(data, usecols=[‘foo’, ‘bar’])[[‘foo’, ‘bar’]] for columns
in [‘foo’, ‘bar’] order or pd.read_csv(data, usecols=[‘foo’,
‘bar’])[[‘bar’, ‘foo’]] for [‘bar’, ‘foo’] order.
If callable, the callable function will be evaluated against the
column names, returning names where the callable function evaluates to
True. An example of a valid callable argument would be lambda x:
x.upper() in [‘AAA’, ‘BBB’, ‘DDD’]. Using this parameter results in
much faster parsing time and lower memory usage.
I’m lost as to what is happening.
I’m using pandas 1.5.3 and python 3.11.1.
Answers:
This was a problem with pandas-stubs
. An issue has been made and a PR just got merged.
I’m using PyCharm 2022.3.3 (Professional Edition) and I have the following warning:
MRE:
import pandas as pd; path_to_csv = "mycsv.csv"; df_db = pd.read_csv(path_to_csv, usecols=[0])
The documentation (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) says:
usecols list-like or callable, optional Return a subset of the
columns. If list-like, all elements must either be positional (i.e.
integer indices into the document columns) or strings that correspond
to column names provided either by the user in names or inferred from
the document header row(s). If names are given, the document header
row(s) are not taken into account. For example, a valid list-like
usecols parameter would be [0, 1, 2] or [‘foo’, ‘bar’, ‘baz’]. Element
order is ignored, so usecols=[0, 1] is the same as [1, 0]. To
instantiate a DataFrame from data with element order preserved use
pd.read_csv(data, usecols=[‘foo’, ‘bar’])[[‘foo’, ‘bar’]] for columns
in [‘foo’, ‘bar’] order or pd.read_csv(data, usecols=[‘foo’,
‘bar’])[[‘bar’, ‘foo’]] for [‘bar’, ‘foo’] order.If callable, the callable function will be evaluated against the
column names, returning names where the callable function evaluates to
True. An example of a valid callable argument would be lambda x:
x.upper() in [‘AAA’, ‘BBB’, ‘DDD’]. Using this parameter results in
much faster parsing time and lower memory usage.
I’m lost as to what is happening.
I’m using pandas 1.5.3 and python 3.11.1.
This was a problem with pandas-stubs
. An issue has been made and a PR just got merged.