azure-databricks

Problem when rename file in Azure Databricks from a data lake

Problem when rename file in Azure Databricks from a data lake Question: I am trying to rename a file with Python in Azure Databricks through the "import os" library using the "rename ()" function, it is something very simple really, but when doing it in Databricks I can’t get to the path where my file …

Total answers: 2

Log Pickle files as a part of Mlflow run

Log Pickle files as a part of Mlflow run Question: I am running an MLflow experiment as a part of it I would like to log a few artifacts as a python pickle. Ex: Trying out different categorical encoders, so wanted to log the encoder objects as a pickle file. Is there a way to …

Total answers: 2

how to pass parameter to python script from a pipeline

how to pass parameter to python script from a pipeline Question: I am building an Azure Data Factory pipeline and I would like to know how to get this parameter into the python script. The python script is located in Databricks (DBFS) and is run from Azure DataFactory. So, in my ADF pipeline, I have …

Total answers: 1

Python Pandas read csv from DataLake

Python Pandas read csv from DataLake Question: I’m trying to read a csv file that is stored on a Azure Data Lake Gen 2, Python runs in Databricks. Here are 2 lines of code, the first one works, the seconds one fails. Do I really have to mount the Adls to have Pandas being able …

Total answers: 2

How to calculate a Directory size in ADLS using PySpark?

How to calculate a Directory size in ADLS using PySpark? Question: I want to calculate a directory(e.g- XYZ) size which contains sub folders and sub files. I want total size of all the files and everything inside XYZ. I could find out all the folders inside a particular path. But I want size of all …

Total answers: 5

Removing non-ascii and special character in pyspark dataframe column

Removing non-ascii and special character in pyspark dataframe column Question: I am reading data from csv files which has about 50 columns, few of the columns(4 to 5) contain text data with non-ASCII characters and special characters. df = spark.read.csv(path, header=True, schema=availSchema) I am trying to remove all the non-Ascii and special characters and keep …

Total answers: 3

Load file from Azure Files to Azure Databricks

Load file from Azure Files to Azure Databricks Question: Looking for a way using Azure files SDK to upload files to my azure databricks blob storage I tried many things using function from this page But nothing worked. I don’t understand why example: file_service = FileService(account_name=’MYSECRETNAME’, account_key=’mySECRETkey’) generator = file_service.list_directories_and_files(‘MYSECRETNAME/test’) #listing file in folder /test, …

Total answers: 2

List All Files in a Folder Sitting in a Data Lake

List All Files in a Folder Sitting in a Data Lake Question: I’m trying to get an inventory of all files in a folder, which has a few sub-folders, all of which sit in a data lake. Here is the code that I’m testing. import sys, os import pandas as pd mylist = [] root …

Total answers: 3