Pyspark: Select all columns except particular columns
Question:
I have a large number of columns in a PySpark dataframe, say 200. I want to select all the columns except say 3-4 of the columns. How do I select this columns without having to manually type the names of all the columns I want to select?
Answers:
df.drop(*[cols for cols in [list of columns to drop]])
Useful if the list to drop columns is huge. or if the list can be derived programmatically.
I have a large number of columns in a PySpark dataframe, say 200. I want to select all the columns except say 3-4 of the columns. How do I select this columns without having to manually type the names of all the columns I want to select?
df.drop(*[cols for cols in [list of columns to drop]])
Useful if the list to drop columns is huge. or if the list can be derived programmatically.