Uploaded image for project: 'Apache Sedona'
  1. Apache Sedona
  2. SEDONA-133

Allow user-defined schemas in Adapter.toDf()

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • None
    • 1.3.0

    Description

      Hello!

      I would like to propose a new overloaded method for supporting user-defined schemas in Adapter.toDf() (for both SpatialRDD and JavaPairRDD). Currently fields are coerced to StringType, which does not work for all use cases (specifically, I have structs that lose all their nested columns if casted to StringType). I can do a workaround, but it would be nice to have this off the shelf. Some sample code from Adapter.scala:

      cols = cols ++ fieldNames.map(f => StructField(f, StringType))
       
      ...
       
      cols = cols ++ leftFieldnames.map(fName => StructField(fName, StringType))
      cols = cols ++ rightFieldNames.map(fName => StructField(fName, StringType))
       
      My thinking is that the user could provide the schema directly in the form of a StructType object. The expectation would be that they are responsible enough to provide the correct field names and data types if they want to provide the schema at all.
       
      I would be happy to work on a PR if it's deemed appropriate. What are your thoughts?

      Attachments

        Issue Links

          Activity

            People

              brianrice Brian Rice
              brianrice Brian Rice
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 50m
                  50m