Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-1071

Provide integration for exposing Phoenix tables as Spark RDDs

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 4.4.0
    • None
    • None

    Description

      A core concept of Apache Spark is the resilient distributed dataset (RDD), a "fault-tolerant collection of elements that can be operated on in parallel". One can create a RDDs referencing a dataset in any external storage system offering a Hadoop InputFormat, like PhoenixInputFormat and PhoenixOutputFormat. There could be opportunities for additional interesting and deep integration.

      Add the ability to save RDDs back to Phoenix with a saveAsPhoenixTable action, implicitly creating necessary schema on demand.

      Add support for filter transformations that push predicates to the server.

      Add a new select transformation supporting a LINQ-like DSL, for example:

      // Count the number of different coffee varieties offered by each
      // supplier from Guatemala
      phoenixTable("coffees")
          .select(c =>
              where(c.origin == "GT"))
          .countByKey()
          .foreach(r => println(r._1 + "=" + r._2))
      

      Support conversions between Scala and Java types and Phoenix table data.

      Attachments

        Issue Links

          Activity

            People

              jmahonin Josh Mahonin
              apurtell Andrew Kyle Purtell
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: