How to pass variables to spark.sql query in pyspark?
Question:
How to pass variables to spark.sql query in pyspark? When I query a table it fails with a AnalysisException
. Why?
>>> spark.sql("select * from student").show()
+-------+--------+
|roll_no| name|
+-------+--------+
| 1|ravindra|
+-------+--------+
>>> spark.sql("select * from student where roll_no={0} and name={1}".format(id,name)).show()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/spark-2.3.0-bin-hadoop2.6/python/pyspark/sql/session.py", line 767, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
File "/usr/local/spark-2.3.0-bin-hadoop2.6/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/usr/local/spark-2.3.0-bin-hadoop2.6/python/pyspark/sql/utils.py", line 69, in deco
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: u"cannot resolve '`ravindra`' given input columns: [default.student.id, default.student.roll_no, default.student.name]; line 1 pos 47;n'Project [*]n+- 'Filter ((roll_no#21 = 0) && (name#22 = 'ravindra))n +- SubqueryAlias `default`.`student`n +- HiveTableRelation `default`.`student`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#20, roll_no#21, name#22]n"
Answers:
I generally use the %s
string formatter inside sql strings
sqlc.sql('select * from students where roll_no=%s and name="%s"' % ('1', 'ravindra')).show()
Looking at your sql traceback, you must have missed the quotes for the name=
value when ravindra
is passed to the sql string, and sql engine thinks it as a variable call.
Your sql query then becomes
select * from students where roll_no=1 and name=ravindra -- no quotes
You can adjust your sql string to
spark.sql("select * from student where roll_no={0} and name='{1}'".format(id,name)).show()
quote your {1}
to get your desired result.
How to pass variables to spark.sql query in pyspark? When I query a table it fails with a AnalysisException
. Why?
>>> spark.sql("select * from student").show()
+-------+--------+
|roll_no| name|
+-------+--------+
| 1|ravindra|
+-------+--------+
>>> spark.sql("select * from student where roll_no={0} and name={1}".format(id,name)).show()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/spark-2.3.0-bin-hadoop2.6/python/pyspark/sql/session.py", line 767, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
File "/usr/local/spark-2.3.0-bin-hadoop2.6/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/usr/local/spark-2.3.0-bin-hadoop2.6/python/pyspark/sql/utils.py", line 69, in deco
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: u"cannot resolve '`ravindra`' given input columns: [default.student.id, default.student.roll_no, default.student.name]; line 1 pos 47;n'Project [*]n+- 'Filter ((roll_no#21 = 0) && (name#22 = 'ravindra))n +- SubqueryAlias `default`.`student`n +- HiveTableRelation `default`.`student`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#20, roll_no#21, name#22]n"
I generally use the %s
string formatter inside sql strings
sqlc.sql('select * from students where roll_no=%s and name="%s"' % ('1', 'ravindra')).show()
Looking at your sql traceback, you must have missed the quotes for the name=
value when ravindra
is passed to the sql string, and sql engine thinks it as a variable call.
Your sql query then becomes
select * from students where roll_no=1 and name=ravindra -- no quotes
You can adjust your sql string to
spark.sql("select * from student where roll_no={0} and name='{1}'".format(id,name)).show()
quote your {1}
to get your desired result.