Back-ticks in DataFrame.colRegex?

Question:

For PySpark, I find back-ticks enclosing regular expressions for
DataFrame.colRegex()
here,
here,
and in this SO
question
. Here is the
example from the DataFrame.colRegex doc string:

df = spark.createDataFrame([("a", 1), ("b", 2), ("c",  3)], ["Col1", "Col2"])
df.select(df.colRegex("`(Col1)?+.+`")).show()
+----+
|Col2|
+----+
|   1|
|   2|
|   3|
+----+

The answer to the SO question
doesn’t show back-ticks for Scala. It refers to the Java
documentation for the Pattern
class
,
but that doesn’t explain back-ticks.

This page
indicates the use of back-ticks in Python to represent the string
representation of the adorned variable, but that doesn’t apply
to a regular expression.

What is the explanation for the back-ticks?

Asked By: user2153235

||

Answers:

The back-ticks are used to delimit the column name in case it includes special characters. For example, if you had a column called column-1 and you try

SELECT column-1 FROM mytable

You will probably get a

non-existent column ‘column’

error as the interpreter will treat that as SELECT (column) - 1 FROM mytable. Instead, you can delimit the column name with back-ticks to get around that issue:

SELECT `column-1` FROM mytable
Answered By: Nick
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.