-
Type:
Improvement
-
Status: Open
-
Priority:
Minor
-
Resolution: Unresolved
-
Affects Version/s: 2.4.7
-
Fix Version/s: None
-
Component/s: Java API
-
Labels:None
If the target Java data class has a circular reference, Spark will fail fast from creating the Dataset or running Encoders.
For example, with protobuf class, there is a reference with Descriptor, there is no way to build a dataset from the protobuf class.
From this line
Encoders.bean(ProtoBuffOuterClass.ProtoBuff.class);
It will throw out immediately
Exception in thread "main" java.lang.UnsupportedOperationException: Cannot have circular references in bean class, but got the circular reference of class class com.google.protobuf.Descriptors$Descriptor
Can we add a parameter, for example,
Encoders.bean(Class<T> clas, List<Fields> fieldsToIgnore);
````
or
Encoders.bean(Class<T> clas, boolean skipCircularRefField);
which subsequently, instead of throwing an exception @ https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L556, it instead skip the field.
if (seenTypeSet.contains(t)) { if(skipCircularRefField) println("field skipped") //just skip this field else throw new UnsupportedOperationException( s"cannot have circular references in class, but got the circular reference of class $t") }