PySpark: TypeError: 'str' object is not callable in dataframe operations

Question:

I am reading files from a folder in a loop and creating dataframes from these.
However, I am getting this weird error TypeError: 'str' object is not callable.
Please find the code here:

for yr in range (2014,2018):
  cat_bank_yr = sqlCtx.read.csv(cat_bank_path+str(yr)+'_'+h1+'bank.csv000',sep='|',schema=schema)
  cat_bank_yr=cat_bank_yr.withColumn("cat_ledger",trim(lower(col("cat_ledger"))))
  cat_bank_yr=cat_bank_yr.withColumn("category",trim(lower(col("category"))))

The code runs for one iteration and then stops at the line

cat_bank_yr=cat_bank_yr.withColumn("cat_ledger",trim(lower(col("cat_ledger")))) 

with the above error.

Can anyone help out?

Asked By: pnv

||

Answers:

Your code looks fine – if the error indeed happens in the line you say it happens, you probably accidentally overwrote one of the PySpark function with a string.

To check this, put the following line directly above your for loop and see whether the code runs without an error now:

from pyspark.sql.functions import col, trim, lower

Alternatively, double-check whether the code really stops in the line you said, or check whether col, trim, lower are what you expect them to be by calling them like this:

col

should return

function pyspark.sql.functions._create_function.._(col)

Answered By: Thomas

In the import section use:

from pyspark.sql import functions as F

Then in the code wherever using col, use F.col so your code would be:

# on top/header part of code 
from pyspark.sql import functions as F
    
for yr in range (2014,2018):
    cat_bank_yr = sqlCtx.read.csv(cat_bank_path+str(yr)+'_'+h1+'bank.csv000',sep='|',schema=schema)
    cat_bank_yr=cat_bank_yr.withColumn("cat_ledger",trim(lower(F.col("cat_ledger"))))
    cat_bank_yr=cat_bank_yr.withColumn("category",trim(lower(F.col("category"))))

Hope this will work. Good luck.

Answered By: imtheone

There is another possible reason. In your scripts, you may use col as a variable. This can also result in the error message.

Answered By: Leon
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.