Regex only replace if wrapped in certain function

Question:

I am looking to create a python function that will take a long SQL script that I have to create a table and place the session variables into the script so it can be used as a view within Snowflake. For example,

SET TABLE_NAME = MY_TABLE_NAME;
CREATE OR REPLACE VIEW MY_VIEW AS (
SELECT * FROM IDENTIFIER($TABLE_NAME)
)

With the python script, the previous block becomes

CREATE OR REPLACE VIEW MY_VIEW AS(
SELECT * FROM MY_TABLE
)

However, during testing, I realized that the if the variable name is within another function, the last parenthesis is captured and is removed. Is there a way that I can replace the string with the variable value only if it is wrapped in the identifier function?

I would like this code:

IDENTIFIER($VAR_NAME)
identifier($VAR_NAME)
SELECT * FROM $VAR_NAME
DATEADD('DAY',+1,$VAR_NAME)

To become:

VAR_NAME
VAR_NAME
SELECT * FROM VAR_NAME
DATEADD('DAY',+1,VAR_NAME)

This is what I have tried so far. https://regex101.com/r/2SriK9/2
Thanks.

P.S. In the last example, if var_name were a function, it would need to have the function and then close with a closing parenthesis: DATEADD(‘DAY’,+1,MY_FUNC()) [Currently, my output makes it DATEADD(‘DAY’,+1,MY_FUNC()] with no closing parenthesis on the dateadd function.

Asked By: RuSXSnIpERXx

||

Answers:

There are two patterns you’re looking for here: one enclosed in the identifier function, and the other with just a preceding $ character, so you can use an alternation pattern to search for both of them, capture the variable names of each, if any, and replace the match with what’s captured.

Find (with the case-insensitive flag):

identifier($(w+))|$(w+)

Replace with:

12

Demo: https://regex101.com/r/2SriK9/3

Answered By: blhsing

Instead of using regular expressions to perform complex replacements, you can use a library such as sqlglot to parse the query into an AST, which you can then update to produce your desired query:

import sqlglot
def run_updates(ast, scope):
   if isinstance(ast, list):
      #remove any `SET` queries:
      return [run_updates(i, scope) for i in ast if 
       not isinstance(i, sqlglot.expressions.Command) or str(i.this).lower() != 'set']
   if isinstance(ast, sqlglot.expressions.Create):
      if isinstance(ast.this, (sqlglot.expressions.Func, sqlglot.expressions.UserDefinedFunction)):
         #node is a function, save the function name
         scope['functions'].append(ast.this.this.this)
   if isinstance(ast, sqlglot.expressions.Anonymous):
      if ast.this.lower() == 'identifier':
         #remove `IDENTIFER($VALUE)` expressions
         return sqlglot.expressions.Identifier(this = ast.args['expressions'][0].this.this[1:], quoted=False)
   if isinstance(ast, sqlglot.expressions.Identifier):
      if isinstance(ast.this, str) and ast.this[0] == '$':
         #replace general `$session_variable` expressions
         if ast.this[1:] in scope['functions']:
            #if the session variable is function, update accordingly
            return sqlglot.expressions.Anonymous(this = ast.this[1:])
         return sqlglot.expressions.Identifier(this = ast.this[1:], quoted=False)
   if hasattr(ast, 'args'):
      ast.args = {a:run_updates(b, scope) for a, b in ast.args.items()}
   return ast

def replace_identifiers(query):
  ast, scope = sqlglot.parse(query), {'variables':{}, 'functions':[]}
  return ';n'.join(i.sql() for i in run_updates(ast, scope))

Tests:

#replacing IDENFITIER
s = '''
SET TABLE_NAME = MY_TABLE_NAME;
CREATE OR REPLACE VIEW MY_VIEW AS (
SELECT * FROM IDENTIFIER($TABLE_NAME)
)
'''
print(replace_identifiers(s))

#replacing IDENTIFIER and session variables
s1 = '''
IDENTIFIER($VAR_NAME);
identifier($VAR_NAME);
SELECT * FROM $VAR_NAME;
DATEADD('DAY',1,$VAR_NAME);
'''
print(replace_identifiers(s1))

#replacing session variable as a function call, since the test name is declared as a function in a prior line
s2 = '''
CREATE FUNCTION VAR_NAME() RETURNS TIMESTAMP AS 'SELECT now()';
DATEADD('DAY',1,$VAR_NAME);
'''
print(replace_identifiers(s2))

Output:

CREATE OR REPLACE VIEW MY_VIEW AS (SELECT * FROM TABLE_NAME)
VAR_NAME;
VAR_NAME;
SELECT * FROM VAR_NAME;
DATEADD('DAY', 1, VAR_NAME)
CREATE FUNCTION VAR_NAME() RETURNS TIMESTAMP AS 'SELECT now()';
DATEADD('DAY', 1, VAR_NAME())
Answered By: Ajax1234
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.