Convert a simple one line string to RDD in Spark
Question:
I have a simple line:
line = "Hello, world"
I would like to convert it to an RDD with only one element.
I have tried
sc.parallelize(line)
But it get:
sc.parallelize(line).collect()
['H', 'e', 'l', 'l', 'o', ',', ' ', 'w', 'o', 'r', 'l', 'd']
Any ideas?
Answers:
try using List as parameter:
sc.parallelize(List(line)).collect()
it returns
res1: Array[String] = Array(hello,world)
The below code works fine in Python
sc.parallelize([line]).collect()
['Hello, world']
Here we are passing the parameter "line" as a list.
use the following code :
sc.parallelize(Seq(line))
I have a simple line:
line = "Hello, world"
I would like to convert it to an RDD with only one element.
I have tried
sc.parallelize(line)
But it get:
sc.parallelize(line).collect()
['H', 'e', 'l', 'l', 'o', ',', ' ', 'w', 'o', 'r', 'l', 'd']
Any ideas?
try using List as parameter:
sc.parallelize(List(line)).collect()
it returns
res1: Array[String] = Array(hello,world)
The below code works fine in Python
sc.parallelize([line]).collect()
['Hello, world']
Here we are passing the parameter "line" as a list.
use the following code :
sc.parallelize(Seq(line))