PySpark in Databricks error with table conversion to pandas

Question

I’m using Databricks and want to convert my PySpark DataFrame to a pandas one using the df.toPandas() command.

However, I keep getting this error:

/databricks/spark/python/pyspark/sql/pandas/conversion.py:145: UserWarning: toPandas attempted Arrow optimization because 'spark.sql.execution.arrow.pyspark.enabled' is set to true, but has reached the error below and can not continue. Note that 'spark.sql.execution.arrow.pyspark.fallback.enabled' does not have an effect on failures in the middle of computation.
  'DataFrame' object has no attribute 'dtype'
  warnings.warn(msg)
AttributeError: 'DataFrame' object has no attribute 'dtype'

I tried different things, including:

spark.conf.set("spark.sql.execution.arrow.enabled", "false")

But nothing worked so far (I also checked some of the other posts that have this issue, but none helped).

UPDATE: result of df.printSchema():

flight_id: string (nullable = true)
 |-- flight_direction: string (nullable = true)
 |-- service_type: string (nullable = true)
 |-- flight_designator: string (nullable = true)
 |-- flight_number: string (nullable = true)
 |-- callsign: string (nullable = true)
 |-- scheduled_datetime: timestamp (nullable = true)
 |-- connecting_flight_designator: string (nullable = true)
 |-- airport_iata_codes: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- airline_name: string (nullable = true)
 |-- airport_names: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- country_number: long (nullable = true)
 |-- eu_category: string (nullable = true)
 |-- safe_town_indicator: boolean (nullable = true)
 |-- sibt: timestamp (nullable = true)
 |-- aibt: timestamp (nullable = true)
 |-- sobt: timestamp (nullable = true)
 |-- aibt: timestamp (nullable = true)
 |-- tsat: timestamp (nullable = true)
 |-- aircraft_name: string (nullable = true)
 |-- aircraft_registration: string (nullable = true)
 |-- ramp: string (nullable = true)
 |-- ramp_previous: string (nullable = true)
 |-- seats: long (nullable = true)
 |-- actual_total_pax: integer (nullable = true)
 |-- handler_apron: string (nullable = true)
 |-- occupancy_rate: double (nullable = false)

Asked By: Hans.nl

||

Source

Answer 1

There was a problem in the data filtering. There were duplicate columns. If anyone in the future has a similar issue, please check this.

Answered By: Hans.nl

PySpark in Databricks error with table conversion to pandas

Question:

Answers: