Error in defining pyspark datastructure variables with a for loop
Question:
I would like to define a set of pyspark features as a run time variables (features).
I tried the below, it throws an error. Could you please help on this
colNames = ['colA', 'colB', 'colC', 'colD', 'colE']
tsfresh_feature_set = StructType(
[
StructField('field1', StringType(), True),
StructField('field2', StringType(), True),
StructField(item, DoubleType(), False) for item in colNames
]
)
Error that I get:
SyntaxError: invalid syntax
File "<command-621368>", line 9
StructField(item, DoubleType(), False) for item in colNames
^
SyntaxError: invalid syntax
Answers:
You are trying to use list comprehension for creating structure using list of column names for your DataFrame
StructField(item, DoubleType(), False) for item in colNames
But the problem is with the syntax:
- Wrap your code with
[]
[StructField(item, DoubleType(), False) for item in colNames]
- Unwrap the elements inside the list using
*
colNames = ['colA', 'colB', 'colC', 'colD', 'colE']
tsfresh_feature_set = StructType(
[
StructField('field1', StringType(), True),
StructField('field2', StringType(), True),
*[StructField(item, DoubleType(), False) for item in colNames]
]
)
I would like to define a set of pyspark features as a run time variables (features).
I tried the below, it throws an error. Could you please help on this
colNames = ['colA', 'colB', 'colC', 'colD', 'colE']
tsfresh_feature_set = StructType(
[
StructField('field1', StringType(), True),
StructField('field2', StringType(), True),
StructField(item, DoubleType(), False) for item in colNames
]
)
Error that I get:
SyntaxError: invalid syntax
File "<command-621368>", line 9
StructField(item, DoubleType(), False) for item in colNames
^
SyntaxError: invalid syntax
You are trying to use list comprehension for creating structure using list of column names for your DataFrame
StructField(item, DoubleType(), False) for item in colNames
But the problem is with the syntax:
- Wrap your code with
[]
[StructField(item, DoubleType(), False) for item in colNames]
- Unwrap the elements inside the list using
*
colNames = ['colA', 'colB', 'colC', 'colD', 'colE']
tsfresh_feature_set = StructType(
[
StructField('field1', StringType(), True),
StructField('field2', StringType(), True),
*[StructField(item, DoubleType(), False) for item in colNames]
]
)