hdfs

Pyspark 3.3.0 dataframe show data but writing CSV creates empty file

Pyspark 3.3.0 dataframe show data but writing CSV creates empty file Question: Facing a very unusual issue. Dataframe shows data if ran df.show() however, when trying to write as csv, operation completes without error , but writes 0 byte empty file. Is this a bug ? Is there something missing? –pyspark version ____ __ / …

Total answers: 1

Write multiple Avro files from pyspark to the same directory

Write multiple Avro files from pyspark to the same directory Question: I’m trying to write out dataframe as Avro files from PySpark dataframe to the path /my/path/ to HDFS, and partition by the col ‘partition’, so under /my/path/ , there should be the following sub directory structures partition= 20230101 partition= 20230102 …. Under these sub …

Total answers: 1

How to Create HDFS file

How to Create HDFS file Question: I know it is possible to create directory HDFS with python using snakebite But I am looking to create a file on HDFS directory Asked By: Zak_Stack || Source Answers: You can use touchz to create an empty file on HDFS… I see rename command in the docs, which …

Total answers: 2

Moving files from one directory to another directory in HDFS using Pyspark

Moving files from one directory to another directory in HDFS using Pyspark Question: I am trying to read data all the JSON files from one directory and storing them in Spark Dataframe using the code below. (it works fine) spark = SparkSession.builder.getOrCreate() df = spark.read.json("hdfs:///user/temp/backup_data/st_in_*/*/*.json",multiLine=True) but when I try to save the DataFrame with multiple …

Total answers: 1

Python read file as stream from HDFS

Python read file as stream from HDFS Question: Here is my problem: I have a file in HDFS which can potentially be huge (=not enough to fit all in memory) What I would like to do is avoid having to cache this file in memory, and only process it line by line like I would …

Total answers: 4