Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
The SerDe for RasterUDT is barely usable. This won't be a big problem when running simple queries like RS_Envelope(RS_FromGeoTiff(content)) since the serde-aware expressions eliminated all the serialization. However, we'll run into problems when running queries involving raster serialization:
df_geotiff.alias("a").join(df_geotiff2.alias("b"), col("a.id") == col("b.id")).show()
Or simply collect a raster dataset:
dfGeoTiff.collect()
Each time we run such a query, the executors spawn several new threads. The job may hang or raise strange exceptions when processing large raster datasets. This is a thread dump captured on Spark UI after running several such queries:
These threads were created by SerializableRenderedImage. SerializableRenderedImage object will launch a TCP server in a newly spawned thread when being serialized, and the deserialized version of SerializableRenderedImage will connect to the server to fetch raster data. This avoids copying the raster data when serializing the GridCoverage2D object, but it is the worst way to implement raster serialization when we have to process a large number of rasters in batches.
SerializableRenderedImage is also buggy. It tracks the reference count of serialized objects in remoteReferenceCount. However, the reference counting mechanism was not correctly implemented so it leaks memory.
We may want to create SerializableRenderedImage objects with useDeepCopy = true to avoid these problems, but it introduces a new problem: the finalizer of SerializableRenderedImage will always connect to the server to decrement the remote reference count, even though there is no "server" in deep copy mode. Tons of exceptions will be raised by the finalizer, which is quite annoying.
INFO: IOException occurs when open the streams of the socket. javax.media.jai.util.ImagingException: IOException occurs when open the streams of the socket. at javax.media.jai.remote.SerializableRenderedImage.closeClient(SerializableRenderedImage.java:1117) at javax.media.jai.remote.SerializableRenderedImage.dispose(SerializableRenderedImage.java:1314) at javax.media.jai.remote.SerializableRenderedImage.finalize(SerializableRenderedImage.java:1259) at java.base/java.lang.System$2.invokeFinalize(System.java:2125) at java.base/java.lang.ref.Finalizer.runFinalizer(Finalizer.java:87) at java.base/java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:171) Caused by: java.net.SocketException: Connection reset at java.base/java.net.SocketInputStream.read(SocketInputStream.java:186) at java.base/java.net.SocketInputStream.read(SocketInputStream.java:140) at java.base/java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2893) at java.base/java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2909) at java.base/java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3406) at java.base/java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:932) at java.base/java.io.ObjectInputStream.<init>(ObjectInputStream.java:375) at javax.media.jai.remote.SerializableRenderedImage.closeClient(SerializableRenderedImage.java:1115) ... 5 more Caused by: java.net.SocketException: Connection reset at java.base/java.net.SocketInputStream.read(SocketInputStream.java:186) at java.base/java.net.SocketInputStream.read(SocketInputStream.java:140) at java.base/java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2893) at java.base/java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2909) at java.base/java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3406) at java.base/java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:932) at java.base/java.io.ObjectInputStream.<init>(ObjectInputStream.java:375) at javax.media.jai.remote.SerializableRenderedImage.closeClient(SerializableRenderedImage.java:1115) at javax.media.jai.remote.SerializableRenderedImage.dispose(SerializableRenderedImage.java:1314) at javax.media.jai.remote.SerializableRenderedImage.finalize(SerializableRenderedImage.java:1259) at java.base/java.lang.System$2.invokeFinalize(System.java:2125) at java.base/java.lang.ref.Finalizer.runFinalizer(Finalizer.java:87) at java.base/java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:171)