databricks | py4u

Spark SQL – Pivot and concatenation

Spark SQL – Pivot and concatenation Question: I am working with spark sql and have a requirement to pivot and concatenate the data. My input data looks like ID Quantity Location 1 10 US 2 20 UK 2 5 CA 2 20 US 3 15 US 3 20 CA 4 25 US 4 10 CA …

Total answers: 1

Databricks: Issue while creating spark data frame from pandas

Databricks: Issue while creating spark data frame from pandas Question: I have a pandas data frame which I want to convert into spark data frame. Usually, I use the below code to create spark data frame from pandas but all of sudden I started to get the below error, I am aware that pandas has …

Total answers: 2

Getting Error: [Errno 95] Operation not supported while writing zip file in databricks

Getting Error: [Errno 95] Operation not supported while writing zip file in databricks Question: Here i am trying to zip the file and write that to one folder (mount point) using below code in Databricks. # List all files which need to be compressed import os modelPath = ‘/dbfs/mnt/temp/zip/’ filenames = [os.path.join(root, name) for root, …

Total answers: 1

How to query for the maximum / highest value in an field with PySpark

How to query for the maximum / highest value in an field with PySpark Question: The following dataframe will produce values 0 to 3. df = DeltaTable.forPath(spark, ‘/mnt/lake/BASE/SQLClassification/cdcTest/dbo/cdcmergetest/1’).history().select(col("version")) Can someone show me how to modify the dataframe such that it only provides the maximum value i.e 3? I have tried df.select("*").max("version") And df.max("version") But no …

Total answers: 1

Azure DataBricks ImportError: cannot import name dataclass_transform

Azure DataBricks ImportError: cannot import name dataclass_transform Question: I have a python notebook running the following imports on a DataBricks cluster %pip install presidio_analyzer %pip install presidio_anonymizer import spacy.cli spacy.cli.download("en_core_web_lg") nlp = spacy.load("en_core_web_lg") import csv import pprint import collections from typing import List, Iterable, Optional, Union, Dict import pandas as pd from presidio_analyzer import AnalyzerEngine, …

Total answers: 2

PySpark in Databricks error with table conversion to pandas

PySpark in Databricks error with table conversion to pandas Question: I’m using Databricks and want to convert my PySpark DataFrame to a pandas one using the df.toPandas() command. However, I keep getting this error: /databricks/spark/python/pyspark/sql/pandas/conversion.py:145: UserWarning: toPandas attempted Arrow optimization because ‘spark.sql.execution.arrow.pyspark.enabled’ is set to true, but has reached the error below and can not …

Total answers: 1

Databricks DLT pipeline with for..loop reports error "AnalysisException: Cannot redefine dataset"

Databricks DLT pipeline with for..loop reports error "AnalysisException: Cannot redefine dataset" Question: I have the following code which works fine for a single table. But when I try to use a for..loop() to process all the tables in my database, I am getting the error, "AnalysisException: Cannot redefine dataset ‘source_ds’,Map(),Map(),List(),List(),Map())". I need to pass the …

Total answers: 1

How to make sure values are map to the right delta table column?

How to make sure values are map to the right delta table column? Question: I’m writing a PySpark job to read the Values column from table1. Table1 has two column -> ID, Values Sample data in the Values column: +—-+———————————–+ | ID | values | +—-+———————————–+ | 1 | a=10&b=2&c=13&e=55&d=78&j=98&l=99 | | 2 | l=22&e=67&j=34&a=7&c=9&d=77&b=66 …

Total answers: 2

Unable to successfully divide the JSON file using python in DataBricks

Unable to successfully divide the JSON file using python in DataBricks Question: Hi I am writing a DATABRICKS Python code which picks huge JSON file and divide into two part. Which means from index 0 or "reporting_entity_name" till index 3 or "version" on one file and from index 4 in other file till the end. …

Total answers: 1

dbx execute install from azure artifacts / private pypi

dbx execute install from azure artifacts / private pypi Question: I would like to use dbx execute to run a task/job on an azure databricks cluster. However, i cannot make it install my code. More Details on the situation: Project A with a setup.py is dependent on Project B Project B is also python based …

Total answers: 1