How to load all csv files in a folder with pyspark

Question:

I have a folder which has

Sales_December.csv
Sales_January.csv
Sales_February.csv
etc.

How can i make pyspark read all of them into 1 dataframe?

Asked By: Sahand Pourjavad

||

Answers:

  • create an empty list
  • read your csv files one by one and append DataFrames to the list
  • use reduce(DataFrame.unionAll, <list>) to combine them
    into one single DataFrame
Answered By: Gprj
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.