Back-ticks in DataFrame.colRegex?

Question

For PySpark, I find back-ticks enclosing regular expressions for
DataFrame.colRegex()
here,
here,
and in this SO
question. Here is the
example from the DataFrame.colRegex doc string:

df = spark.createDataFrame([("a", 1), ("b", 2), ("c",  3)], ["Col1", "Col2"])
df.select(df.colRegex("`(Col1)?+.+`")).show()
+----+
|Col2|
+----+
|   1|
|   2|
|   3|
+----+

The answer to the SO question
doesn’t show back-ticks for Scala. It refers to the Java
documentation for the Pattern
class,
but that doesn’t explain back-ticks.

This page
indicates the use of back-ticks in Python to represent the string
representation of the adorned variable, but that doesn’t apply
to a regular expression.

What is the explanation for the back-ticks?

Asked By: user2153235

||

Source

Answer 1

The back-ticks are used to delimit the column name in case it includes special characters. For example, if you had a column called column-1 and you try

SELECT column-1 FROM mytable

You will probably get a

non-existent column ‘column’

error as the interpreter will treat that as SELECT (column) - 1 FROM mytable. Instead, you can delimit the column name with back-ticks to get around that issue:

SELECT `column-1` FROM mytable

Answered By: Nick

Back-ticks in DataFrame.colRegex?

Question:

Answers: