apache-spark-mllib

AttributeError: 'DataFrame' object has no attribute 'map'

AttributeError: 'DataFrame' object has no attribute 'map' Question: I wanted to convert the spark data frame to add using the code below: from pyspark.mllib.clustering import KMeans spark_df = sqlContext.createDataFrame(pandas_df) rdd = spark_df.map(lambda data: Vectors.dense([float(c) for c in data])) model = KMeans.train(rdd, 2, maxIterations=10, runs=30, initializationMode=”random”) The detailed error message is: ————————————————————————— AttributeError Traceback (most recent …

Total answers: 2

Calling Java/Scala function from a task

Calling Java/Scala function from a task Question: Background My original question here was Why using DecisionTreeModel.predict inside map function raises an exception? and is related to How to generate tuples of (original lable, predicted label) on Spark with MLlib? When we use Scala API a recommended way of getting predictions for RDD[LabeledPoint] using DecisionTreeModel is …

Total answers: 1