Problem when rename file in Azure Databricks from a data lake
Question:
I am trying to rename a file with Python in Azure Databricks through the "import os" library using the "rename ()" function, it is something very simple really, but when doing it in Databricks I can’t get to the path where my file is. in the Data Lake, but doing a command "% fs ls path_file" yes I see it, I can even read it and process it with pyspark without problems.
I leave an example of my code:
import os
old_name = r"/mnt/datalake/path/part-00000-tid-1761178-3f1b0942-223-1-c000.csv"
new_name = r"/mnt/datalake/path/example.csv"
os.rename(old_name, new_name)
The above returns an error that does not find the path or file, but an "ls" command does that same path with out problem.
On the other hand, I have tried to rename the file with pySpark, but it uses a hadoop library (org.apache.hadoop.conf.Configuration) that I do not have installed and I cannot install it in the production environment …
What would I be missing?
Answers:
if you’re using os.rename
, you need to refer files as /dbfs/mnt/...
because you’re using local API to access DBFS.
But really, it could be better to use dbutils.fs.mv to do file renaming:
old_name = r"/mnt/datalake/path/part-00000-tid-1761178-3f1b0942-223-1-c000.csv"
new_name = r"/mnt/datalake/path/example.csv"
dbutils.fs.mv(old_name, new_name)
Use this.
Replace the path to your path
old_name = r"dbfs:/FileStore/tables/PM/TC/ROBERTA"
new_name = r"dbfs:/FileStore/tables/PM/TC/BERT"
dbutils.fs.mv(old_name, new_name, True)
I am trying to rename a file with Python in Azure Databricks through the "import os" library using the "rename ()" function, it is something very simple really, but when doing it in Databricks I can’t get to the path where my file is. in the Data Lake, but doing a command "% fs ls path_file" yes I see it, I can even read it and process it with pyspark without problems.
I leave an example of my code:
import os
old_name = r"/mnt/datalake/path/part-00000-tid-1761178-3f1b0942-223-1-c000.csv"
new_name = r"/mnt/datalake/path/example.csv"
os.rename(old_name, new_name)
The above returns an error that does not find the path or file, but an "ls" command does that same path with out problem.
On the other hand, I have tried to rename the file with pySpark, but it uses a hadoop library (org.apache.hadoop.conf.Configuration) that I do not have installed and I cannot install it in the production environment …
What would I be missing?
if you’re using os.rename
, you need to refer files as /dbfs/mnt/...
because you’re using local API to access DBFS.
But really, it could be better to use dbutils.fs.mv to do file renaming:
old_name = r"/mnt/datalake/path/part-00000-tid-1761178-3f1b0942-223-1-c000.csv"
new_name = r"/mnt/datalake/path/example.csv"
dbutils.fs.mv(old_name, new_name)
Use this.
Replace the path to your path
old_name = r"dbfs:/FileStore/tables/PM/TC/ROBERTA"
new_name = r"dbfs:/FileStore/tables/PM/TC/BERT"
dbutils.fs.mv(old_name, new_name, True)