pandas – AttributeError 'dataframe' object has no attribute
Question:
I am trying to filter out the dataframe that contains a list of product. However, I am getting the pandas - 'dataframe' object has no attribute 'str'
error whenever I run the code.
Here is the line of code:
include_clique = log_df.loc[log_df['Product'].str.contains("Product A")]
Product is an object datatype.
import pandas as pd
import numpy as np
data = pd.read_csv("FILE.csv", header = None)
headerName = ["DRID", "Product", "M24", "M23", "M22", "M21"]
data.columns = [headerName]
log_df = np.log(1 + data[["M24", "M23", "M22", "M21"]])
copy = data[["DRID", "Product"]].copy()
log_df = copy.join(log_df)
include_clique = log_df.loc[log_df['Product'].str.contains("Product A")]
Here is the head:
ID PRODUCT M24 M23 M22 M21
0 123421 A 0.000000 0.000000 1.098612 0.0
1 141840 A 0.693147 1.098612 0.000000 0.0
2 212006 A 0.693147 0.000000 0.000000 0.0
3 216097 A 1.098612 0.000000 0.000000 0.0
4 219517 A 1.098612 0.693147 1.098612 0.0
Answers:
Short answer: change data.columns=[headerName]
into data.columns=headerName
Explanation: when you set data.columns=[headerName]
, the columns are MultiIndex object. Therefore, your log_df['Product']
is a DataFrame and for DataFrame, there is no str
attribute.
When you set data.columns=headerName
, your log_df['Product']
is a single column and you can use str
attribute.
For any reason, if you need to keep your data as MultiIndex object, there is another solution: first convert your log_df['Product']
into Series. After that, str
attribute is available.
products = pd.Series(df.Product.values.flatten())
include_clique = products[products.str.contains("Product A")]
However, I guess the first solution is what you’re looking for
You get AttributeError: 'DataFrame' object has no attribute ...
when you try to access an attribute your dataframe doesn’t have.
A common case is when you try to select a column using .
instead of []
when the column name contains white space (e.g. 'col1 '
).
df.col1 # <--- error
df['col1 '] # <--- no error
Another common case is when you try to call a Series method on a DataFrame. For example, tolist()
(or map()
) are Series methods so they must be called on a column. If you call them on a DataFrame, you’ll get
AttributeError: 'DataFrame' object has no attribute 'tolist'
AttributeError: 'DataFrame' object has no attribute 'map'
As hoang tran explains, this is what is happening with OP as well. .str
is a Series accessor and it’s not implemented for DataFrames.
Yet another case is if you have a typo and try to call/access an attribute that’s simply not defined; e.g. if you try to call rows()
instead of iterrows()
, you’ll get
AttributeError: 'DataFrame' object has no attribute 'rows'
You can check the full list of attributes using the following comprehension.
[x for x in dir(pd.DataFrame) if not x.startswith('_')]
When you assign column names as df.columns = [['col1', 'col2']]
, df
is a MultiIndex dataframe now, so to access each column, you’ll need to pass a tuple:
df['col1'].str.contains('Product A') # <---- error
df['col1',].str.contains('Product A') # <---- no error; note the trailing comma
In fact, you can pass a tuple to select a column of any MultiIndex dataframe, e.g.
df['level_1_colname', 'level_2_colname'].str.contains('Product A')
You can also flatten a MultiIndex column names by mapping a "flattener" function on it. A common one is ''.join
:
df.columns = df.columns.map('_'.join)
I am trying to filter out the dataframe that contains a list of product. However, I am getting the pandas - 'dataframe' object has no attribute 'str'
error whenever I run the code.
Here is the line of code:
include_clique = log_df.loc[log_df['Product'].str.contains("Product A")]
Product is an object datatype.
import pandas as pd
import numpy as np
data = pd.read_csv("FILE.csv", header = None)
headerName = ["DRID", "Product", "M24", "M23", "M22", "M21"]
data.columns = [headerName]
log_df = np.log(1 + data[["M24", "M23", "M22", "M21"]])
copy = data[["DRID", "Product"]].copy()
log_df = copy.join(log_df)
include_clique = log_df.loc[log_df['Product'].str.contains("Product A")]
Here is the head:
ID PRODUCT M24 M23 M22 M21
0 123421 A 0.000000 0.000000 1.098612 0.0
1 141840 A 0.693147 1.098612 0.000000 0.0
2 212006 A 0.693147 0.000000 0.000000 0.0
3 216097 A 1.098612 0.000000 0.000000 0.0
4 219517 A 1.098612 0.693147 1.098612 0.0
Short answer: change data.columns=[headerName]
into data.columns=headerName
Explanation: when you set data.columns=[headerName]
, the columns are MultiIndex object. Therefore, your log_df['Product']
is a DataFrame and for DataFrame, there is no str
attribute.
When you set data.columns=headerName
, your log_df['Product']
is a single column and you can use str
attribute.
For any reason, if you need to keep your data as MultiIndex object, there is another solution: first convert your log_df['Product']
into Series. After that, str
attribute is available.
products = pd.Series(df.Product.values.flatten())
include_clique = products[products.str.contains("Product A")]
However, I guess the first solution is what you’re looking for
You get AttributeError: 'DataFrame' object has no attribute ...
when you try to access an attribute your dataframe doesn’t have.
A common case is when you try to select a column using .
instead of []
when the column name contains white space (e.g. 'col1 '
).
df.col1 # <--- error
df['col1 '] # <--- no error
Another common case is when you try to call a Series method on a DataFrame. For example, tolist()
(or map()
) are Series methods so they must be called on a column. If you call them on a DataFrame, you’ll get
AttributeError: 'DataFrame' object has no attribute 'tolist'
AttributeError: 'DataFrame' object has no attribute 'map'
As hoang tran explains, this is what is happening with OP as well. .str
is a Series accessor and it’s not implemented for DataFrames.
Yet another case is if you have a typo and try to call/access an attribute that’s simply not defined; e.g. if you try to call rows()
instead of iterrows()
, you’ll get
AttributeError: 'DataFrame' object has no attribute 'rows'
You can check the full list of attributes using the following comprehension.
[x for x in dir(pd.DataFrame) if not x.startswith('_')]
When you assign column names as df.columns = [['col1', 'col2']]
, df
is a MultiIndex dataframe now, so to access each column, you’ll need to pass a tuple:
df['col1'].str.contains('Product A') # <---- error
df['col1',].str.contains('Product A') # <---- no error; note the trailing comma
In fact, you can pass a tuple to select a column of any MultiIndex dataframe, e.g.
df['level_1_colname', 'level_2_colname'].str.contains('Product A')
You can also flatten a MultiIndex column names by mapping a "flattener" function on it. A common one is ''.join
:
df.columns = df.columns.map('_'.join)