scala

spark dataframe convert a few flattened columns to one array of struct column

spark dataframe convert a few flattened columns to one array of struct column Question: I’d like to have some guidance what functions in spark dataframe together with scala/python code to achieve this transformation. given a dataframe which has below columns columnA, columnB, columnA1, ColumnB1, ColumnA2, ColumnB2 …. ColumnA10, ColumnB10 eg. Fat Value, Fat Measure, Salt …

Total answers: 1

Default lib jars folder for Apache Toree kernel

Default lib jars folder for Apache Toree kernel Question: Say I want a default relative lib folder in jupyter notebook project directory where I can download custom jars so that I can import later without %addjar magic. I was under impression I can do something like: "__TOREE_OPTS__": "–jar-dir=./lib/" in ~/.local/share/jupyter/kernels/apache_toree_scala/kernel.json, but this doesn’t work. What …

Total answers: 2

How to trigger scala/python code when value in BQ table changes

How to trigger scala/python code when value in BQ table changes Question: We wanted to run (poll) scala/python code in GCP VM continuously which will run ETL program only when there is change in value in BQ table. i.e. we’ll add which ETL to run in BQ table and based on that ETL program will …

Total answers: 1

the output in python base64.b64decode doesn't match java's decode Base64

the output in python base64.b64decode doesn't match java's decode Base64 Question: I’m trying to refactor some scala code to python3. Currently stuck at decoding a string in base64. The output from Python’s base64.b64decode does not match the Scala’s output. Scala: import org.apache.commons.codec.binary.Base64.decodeBase64 val coded_str = "UgKgDwhoEAAANAEA1tYAADABABoBABMAAAAAAQAAAAEAAQACAAAAAAD6sT4AO0YAAA==" decodeBase64(coded_str) //Output 1 : res1: Array[Byte] = Array(82, 2, …

Total answers: 2

Scala code returns false for 1012 > 977 and a few other values

Scala code returns false for 1012 > 977 and a few other values Question: I have scala code and python code that are attempting the same task (2021 advent of code day 1 https://adventofcode.com/2021/day/1). The Python returns the correct solution, the Scala does not. I ran diff on both of the outputs and have determined …

Total answers: 1

scala map get keys from Map as Sequence sorting by both keys and values

scala map get keys from Map as Sequence sorting by both keys and values Question: In Python I can do: in_dd = {"aaa": 1, "bbb": 7, "zzz": 3, "hhh": 9, "ggg": 10, "ccc": 3} out_ll = [‘ggg’, ‘hhh’, ‘bbb’, ‘aaa’, ‘ccc’, ‘zzz’] so, I want to get keys sorted by value in descending order while …

Total answers: 1

Discover relationship between the entities

Discover relationship between the entities Question: I have a dataset like below – List((X,Set(" 1", " 7")), (Z,Set(" 5")), (D,Set(" 2")), (E,Set(" 8")), ("F ",Set(" 5", " 9", " 108")), (G,Set(" 2", " 11")), (A,Set(" 7", " 5")), (M,Set(108))) Here X is related to A as 7 is common between them Z is related to …

Total answers: 4

How to use Scala UDF accepting Map[String, String] in PySpark

How to use Scala UDF accepting Map[String, String] in PySpark Question: Based on the discussion from How to use Scala UDF in PySpark?, I am able to execute the UDF from a scala code for Primitive types, but I want to call scala UDF from PySpark which accepts a Map[String, String]. package com.test object ScalaPySparkUDFs …

Total answers: 1

List All Files in a Folder Sitting in a Data Lake

List All Files in a Folder Sitting in a Data Lake Question: I’m trying to get an inventory of all files in a folder, which has a few sub-folders, all of which sit in a data lake. Here is the code that I’m testing. import sys, os import pandas as pd mylist = [] root …

Total answers: 3

Stream stdout from scala Process

Stream stdout from scala Process Question: I am using scala Process to kick off a python program and using ProcessLogger to capture the stdout from the python program. I see that the print statements in the python program are printed only after the python program completes. Is there a way to stream the python print …

Total answers: 1