Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-5092

Querying Hudi table throws NoSuchMethodError in Databricks runtime

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Reopened
    • Blocker
    • Resolution: Unresolved
    • 0.12.0
    • 1.1.0
    • spark
    • None

    Description

      Originally reported by the user:
      https://github.com/apache/hudi/issues/6137

       

      Crux of the issue is that Databricks's DBR runtime diverges from OSS Spark, and in that case `FileStatusCache` API is very clearly divergent b/w the two. 

      There are a few approaches we can take: 

      1. Avoid reliance on Spark's FIleStatusCache implementation altogether and rely on our own one
      2. Apply more staggered approach where we first try to use Spark's FileStatusCache and if it doesn't match expected API, we fallback to our own impl

       

      Approach # 1  would actually mean that we're not sharing cache implementation w/ Spark, which in turn would entail that in some cases we might be keeping 2 instances of the same cache. Approach # 2 remediates that and allows us to only fallback in case API is not compatible. 

      Attachments

        1. image (1).png
          87 kB
          Ethan Guo
        2. image.png
          83 kB
          Ethan Guo

        Issue Links

          Activity

            People

              guoyihua Ethan Guo
              guoyihua Ethan Guo
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: