Simple, hassle-free, zero-boilerplate serialization in Scala/Java similar to Python's Pickle?
Question:
Is there a simple, hassle-free approach to serialization in Scala/Java that’s similar to Python’s pickle? Pickle is a dead-simple solution that’s reasonably efficient in space and time (i.e. not abysmal) but doesn’t care about cross-language accessibility, versioning, etc. and allows for optional customization.
What I’m aware of:
- Java’s built-in serialization is infamously slow ([1], [2]), bloated, and fragile. Also have to mark classes as Serializable—annoying when there are things that are clearly serializable but which don’t have that annotation (e.g. not many Point2D authors mark these Serializable).
- Scala’s BytePickle requires a bunch of boilerplate for every type you want to pickle, and even then it doesn’t work with (cyclic) object graphs.
- jserial: Unmaintained and doesn’t seem to be that much faster/smaller than the default Java serialization.
- kryo: Cannot (de-)serialize objects with no 0-arg ctors, which is a severe limitation. (Also you have to register every class you plan to serialize, or else you get significant slowdowns/bloat, but even so it’s still faster than pickle.)
- protostuff: AFAICT, you have to register every class you intend to serialize in advance in a “schema.”
Kryo and protostuff are the closest solutions I’ve found, but I’m wondering if there’s anything else out there (or if there’s some way to use these that I should be aware of). Please include usage examples! Ideally also include benchmarks.
Answers:
I would recommend SBinary. It uses implicits which are resolved at compile time, so it’s very effective and typesafe. It comes with built-in support for many common Scala datatypes. You have to manually write the serialization code for your (case) classes, but it’s easy to do.
I actually think you’d be best off with kryo (I’m not aware of alternatives that offer less schema defining other than non-binary protocols). You mention that pickle is not susceptible to the slowdowns and bloat that kryo gets without registering classes, but kryo is still faster and less bloated than pickle even without registering classes. See the following micro-benchmark (obviously take it with a grain of salt, but this is what I could do easily):
Python pickle
import pickle
import time
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
people = [Person("Alex", 20), Person("Barbara", 25), Person("Charles", 30), Person("David", 35), Person("Emily", 40)]
for i in xrange(10000):
output = pickle.dumps(people, -1)
if i == 0: print len(output)
start_time = time.time()
for i in xrange(10000):
output = pickle.dumps(people, -1)
print time.time() - start_time
Outputs 174 bytes and 1.18-1.23 seconds for me (Python 2.7.1 on 64-bit Linux)
Scala kryo
import com.esotericsoftware.kryo._
import java.io._
class Person(val name: String, val age: Int)
object MyApp extends App {
val people = Array(new Person("Alex", 20), new Person("Barbara", 25), new Person("Charles", 30), new Person("David", 35), new Person("Emily", 40))
val kryo = new Kryo
kryo.setRegistrationOptional(true)
val buffer = new ObjectBuffer(kryo)
for (i <- 0 until 10000) {
val output = new ByteArrayOutputStream
buffer.writeObject(output, people)
if (i == 0) println(output.size)
}
val startTime = System.nanoTime
for (i <- 0 until 10000) {
val output = new ByteArrayOutputStream
buffer.writeObject(output, people)
}
println((System.nanoTime - startTime) / 1e9)
}
Outputs 68 bytes for me and 30-40ms (Kryo 1.04, Scala 2.9.1, Java 1.6.0.26 hotspot JVM on 64-bit Linux). For comparison, it outputs 51 bytes and 18-25ms if I register the classes.
Comparison
Kryo uses about 40% of the space and 3% of the time as Python pickle when not registering classes, and about 30% of the space and 2% of the time when registering classes. And you can always write a custom serializer when you want more control.
Twitter’s chill library is just awesome. It uses Kryo for serialization but is ultra simple to use. Also nice: provides a MeatLocker[X] type which makes any X a Serializable.
Edit 2020-02-19: please note, as mentioned by @federico below, this answer is no longer valid as the repository has been archived by the owner.
Scala now has Scala-pickling which performs as good or better than Kyro depending on scenario – See slides 34-39 in this presentation.
Another good option is the recent (2016) **netvl/picopickle**
:
- Small and almost dependency-less (the core library depends only on shapeless).
- Extensibility: you can define your own serializators for your types and you can create custom backends, that is, you can use the same library for the different serialization formats (collections, JSON, BSON, etc.); other parts of the serialization behavior like nulls handling can also be customized.
- Flexibility and convenience: the default serialization format is fine for most uses, but it can be customized almost arbitrarily with support from a convenient converters DSL.
- Static serialization without reflection: shapeless Generic macros are used to provide serializers for arbitrary types, which means that no reflection is used.
For example:
Jawn-based pickler also provides additional functions, readString()
/writeString()
and readAst()
/writeAst()
, which [de]serialize objects to strings and JSON AST to strings, respectively:
import io.github.netvl.picopickle.backends.jawn.JsonPickler._
case class A(x: Int, y: String)
writeString(A(10, "hi")) shouldEqual """{"x":10,"y":"hi"}"""
readString[A]("""{"x":10,"y":"hi"}""") shouldEqual A(10, "hi")
Is there a simple, hassle-free approach to serialization in Scala/Java that’s similar to Python’s pickle? Pickle is a dead-simple solution that’s reasonably efficient in space and time (i.e. not abysmal) but doesn’t care about cross-language accessibility, versioning, etc. and allows for optional customization.
What I’m aware of:
- Java’s built-in serialization is infamously slow ([1], [2]), bloated, and fragile. Also have to mark classes as Serializable—annoying when there are things that are clearly serializable but which don’t have that annotation (e.g. not many Point2D authors mark these Serializable).
- Scala’s BytePickle requires a bunch of boilerplate for every type you want to pickle, and even then it doesn’t work with (cyclic) object graphs.
- jserial: Unmaintained and doesn’t seem to be that much faster/smaller than the default Java serialization.
- kryo: Cannot (de-)serialize objects with no 0-arg ctors, which is a severe limitation. (Also you have to register every class you plan to serialize, or else you get significant slowdowns/bloat, but even so it’s still faster than pickle.)
- protostuff: AFAICT, you have to register every class you intend to serialize in advance in a “schema.”
Kryo and protostuff are the closest solutions I’ve found, but I’m wondering if there’s anything else out there (or if there’s some way to use these that I should be aware of). Please include usage examples! Ideally also include benchmarks.
I would recommend SBinary. It uses implicits which are resolved at compile time, so it’s very effective and typesafe. It comes with built-in support for many common Scala datatypes. You have to manually write the serialization code for your (case) classes, but it’s easy to do.
I actually think you’d be best off with kryo (I’m not aware of alternatives that offer less schema defining other than non-binary protocols). You mention that pickle is not susceptible to the slowdowns and bloat that kryo gets without registering classes, but kryo is still faster and less bloated than pickle even without registering classes. See the following micro-benchmark (obviously take it with a grain of salt, but this is what I could do easily):
Python pickle
import pickle
import time
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
people = [Person("Alex", 20), Person("Barbara", 25), Person("Charles", 30), Person("David", 35), Person("Emily", 40)]
for i in xrange(10000):
output = pickle.dumps(people, -1)
if i == 0: print len(output)
start_time = time.time()
for i in xrange(10000):
output = pickle.dumps(people, -1)
print time.time() - start_time
Outputs 174 bytes and 1.18-1.23 seconds for me (Python 2.7.1 on 64-bit Linux)
Scala kryo
import com.esotericsoftware.kryo._
import java.io._
class Person(val name: String, val age: Int)
object MyApp extends App {
val people = Array(new Person("Alex", 20), new Person("Barbara", 25), new Person("Charles", 30), new Person("David", 35), new Person("Emily", 40))
val kryo = new Kryo
kryo.setRegistrationOptional(true)
val buffer = new ObjectBuffer(kryo)
for (i <- 0 until 10000) {
val output = new ByteArrayOutputStream
buffer.writeObject(output, people)
if (i == 0) println(output.size)
}
val startTime = System.nanoTime
for (i <- 0 until 10000) {
val output = new ByteArrayOutputStream
buffer.writeObject(output, people)
}
println((System.nanoTime - startTime) / 1e9)
}
Outputs 68 bytes for me and 30-40ms (Kryo 1.04, Scala 2.9.1, Java 1.6.0.26 hotspot JVM on 64-bit Linux). For comparison, it outputs 51 bytes and 18-25ms if I register the classes.
Comparison
Kryo uses about 40% of the space and 3% of the time as Python pickle when not registering classes, and about 30% of the space and 2% of the time when registering classes. And you can always write a custom serializer when you want more control.
Twitter’s chill library is just awesome. It uses Kryo for serialization but is ultra simple to use. Also nice: provides a MeatLocker[X] type which makes any X a Serializable.
Edit 2020-02-19: please note, as mentioned by @federico below, this answer is no longer valid as the repository has been archived by the owner.
Scala now has Scala-pickling which performs as good or better than Kyro depending on scenario – See slides 34-39 in this presentation.
Another good option is the recent (2016) **netvl/picopickle**
:
- Small and almost dependency-less (the core library depends only on shapeless).
- Extensibility: you can define your own serializators for your types and you can create custom backends, that is, you can use the same library for the different serialization formats (collections, JSON, BSON, etc.); other parts of the serialization behavior like nulls handling can also be customized.
- Flexibility and convenience: the default serialization format is fine for most uses, but it can be customized almost arbitrarily with support from a convenient converters DSL.
- Static serialization without reflection: shapeless Generic macros are used to provide serializers for arbitrary types, which means that no reflection is used.
For example:
Jawn-based pickler also provides additional functions,
readString()
/writeString()
andreadAst()
/writeAst()
, which [de]serialize objects to strings and JSON AST to strings, respectively:
import io.github.netvl.picopickle.backends.jawn.JsonPickler._
case class A(x: Int, y: String)
writeString(A(10, "hi")) shouldEqual """{"x":10,"y":"hi"}"""
readString[A]("""{"x":10,"y":"hi"}""") shouldEqual A(10, "hi")