Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-9108

Spark DataFrames With Cache Key and Value Objects

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.9
    • 3.0
    • spark
    • None
    • Docs Required

    Description

      Add support for _key and _val columns within Ignite-provided Spark DataFrames, which represent the cache key and value objects similar to the current _key/_val column semantics in Ignite SQL.
       
      If the cache key or value objects are standard SQL types (eg. String, Int, etc) they will be represented as such in the DataFrame schema, otherwise they are represented as Binary types encoded as either: 1. Ignite BinaryObjects, in which case we'd need to supply a Spark Encoder implementation for BinaryObjects, eg:
       

      IgniteSparkSession session = ...
      Dataset<Row> dataFrame = ...
      Dataset<MyValClass> valDataSet = dataFrame.select("_val_).as(session.binaryObjectEncoder(MyValClass.class))
      

      Or 2. Kryo-serialised versions of the objects, eg:
       

      Dataset<Row> dataFrame = ...
      DataSet<MyValClass> dataSet = dataFrame.select("_val_).as(Encoders.kryo(MyValClass.class))
      

      Option 1 would probably be more efficient but option 2 would be more idiomatic Spark.
       
      The rationale behind this is the same as the Ignite SQL _key and _val columns: to allow access to the full cache objects from a SQL context.

      Attachments

        Activity

          People

            Unassigned Unassigned
            stuartmacd Stuart Macdonald
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: