Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33598

Support Java Class with circular references

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 3.1.2
    • None
    • Java API
    • None

    Description

      If the target Java data class has a circular reference, Spark will fail fast from creating the Dataset or running Encoders.

       

      For example, with protobuf class, there is a reference with Descriptor, there is no way to build a dataset from the protobuf class.

      From this line

      Encoders.bean(ProtoBuffOuterClass.ProtoBuff.class);

       

      It will throw out immediately

       

      Exception in thread "main" java.lang.UnsupportedOperationException: Cannot have circular references in bean class, but got the circular reference of class class com.google.protobuf.Descriptors$Descriptor

       

      Can we add  a parameter, for example, 

       

      Encoders.bean(Class<T> clas, List<Fields> fieldsToIgnore);

      ````

      or

       

      Encoders.bean(Class<T> clas, boolean skipCircularRefField);

       

       which subsequently, instead of throwing an exception @ https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L556, it instead skip the field.

       

      if (seenTypeSet.contains(t)) {
      if(skipCircularRefField)
        println("field skipped") //just skip this field
      else throw new UnsupportedOperationException( s"cannot have circular references in class, but got the circular reference of class $t")
      }
      

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            jacklzg jacklzg
            Votes:
            2 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: