How to implement Python UDF in dbt

Question:

Please I need some help with applying python UDF to run on my dbt models.
I successfully created a python function in snowflake (DWH) and ran it against a table. This seems to work as expected, but implementing this on dbt seems to be a struggle. Some advice/help/direction will make my day.

here is my python UDF created on snowflake

create or replace function "077"."Unity".sha3_512(str varchar)
returns varchar
language python
runtime_version = '3.8'
handler = 'hash'
as

$$
import hashlib
 
def hash(str):
    # create a sha3 hash object
    hash_sha3_512 = hashlib.new("sha3_512", str.encode())

    return hash_sha3_512.hexdigest()
$$
;

The objective is the create the python function in dbt and apply it to the model below

{{ config(materialized = 'view') }}

WITH SEC AS(
    SELECT 
         A."AccountID" AS AccountID,
         A."AccountName" AS AccountName , 
         A."Password" AS Passwords,
 apply function here (A."Password") As SHash
    FROM {{ ref('Green', 'Account') }} A
   )

----------------VIEW RECORD------------------------------ 

SELECT * 
FROM SEC

is there a way to do this please. Thank you

Asked By: lee

||

Answers:

Assuming that UDF already exists in Snowflake:

{{ config(materialized = 'view') }}

WITH SEC AS(
    SELECT 
         A."AccountID" AS AccountID,
         A."AccountName" AS AccountName , 
         A."Password" AS Passwords,
         {{target.schema}}.sha3_512(A."Password") As SHash
    FROM {{ ref('Green', 'Account') }} A
   )
SELECT * 
FROM SEC;

The function could be created using on-run-start:

on-run-start:
  - '{{ creating_udf()}}'

and macro:

{% macro creating_udf() %}

create function if not exists {{target.schema}}.sha3_512(str varchar)
returns varchar
language python
runtime_version = '3.8'
handler = 'hash'
as

$$
import hashlib
 
def hash(str):
    # create a sha3 hash object
    hash_sha3_512 = hashlib.new("sha3_512", str.encode())

    return hash_sha3_512.hexdigest()
$$
;

{% endmacro %}
Answered By: Lukasz Szozda

answer above works but it’s better to run the macro with a pre_hook (instead of on-run-start) in your model config if you only want the macro running for a specific model.

Answered By: analystRUD