Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2369

Enable Spark SQL UDF to influence at runtime the decision to read a partition

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 1.0.0
    • None
    • SQL

    Description

      Let's say I have a custom partitioner on my RDD - and that RDD is registered as a SQL table and want to do a "select myfield from mytable where myudf(myfield,"some condition") = somevalue - I do not want to perform a "full table" scan to get myfield.
      However, if the UDF API is extended to say at runtime "ask" where the current partition is "valid" - then it will scan it.
      I see the UDF API been modified with a method such as:
      readPartition(partitioner:Partitioner, partitionId:int):Boolean
      where I can cast partitioner to my own custom one and based on the given partition id and runtime arguments, the method will decide to read that partition

      Attachments

        Activity

          People

            Unassigned Unassigned
            mraad@esri.com Mansour Raad
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: