Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
3.1.2
-
None
-
None
Description
If the target Java data class has a circular reference, Spark will fail fast from creating the Dataset or running Encoders.
For example, with protobuf class, there is a reference with Descriptor, there is no way to build a dataset from the protobuf class.
From this line
Encoders.bean(ProtoBuffOuterClass.ProtoBuff.class);
It will throw out immediately
Exception in thread "main" java.lang.UnsupportedOperationException: Cannot have circular references in bean class, but got the circular reference of class class com.google.protobuf.Descriptors$Descriptor
Can we add a parameter, for example,
Encoders.bean(Class<T> clas, List<Fields> fieldsToIgnore);
````
or
Encoders.bean(Class<T> clas, boolean skipCircularRefField);
which subsequently, instead of throwing an exception @ https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L556, it instead skip the field.
if (seenTypeSet.contains(t)) { if(skipCircularRefField) println("field skipped") //just skip this field else throw new UnsupportedOperationException( s"cannot have circular references in class, but got the circular reference of class $t") }