Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-637

Investigate slower hudi queries in S3 vs HDFS

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 0.9.0
    • 0.9.0
    • performance
    • None

    Description

      Hudi queries in S3 takes abnormally longer time compared to hdfs. 

      S3 listing itself is not taking that long of time. 

      PERFORMANCE BUG:

      the metadata list performance is likely causing performance issues with hudi.

       

      scala> stopwatch({ sql("SELECT * FROM ap_invoices_all_compacted_s3").count})

      {{Elapsed time: 1m 55.078473113s
      res2: Long = xxxxxxxxxxxx}}

      {{}}

      scala> stopwatch({ sql("SELECT * FROM ap_invoices_all_compacted").count}) – this is the exact same table in hdfs

      {{Elapsed time: 6.581217052s
      res3: Long = xxxxxxxxxxx}}

      Attachments

        Activity

          People

            Unassigned Unassigned
            vbalaji Balaji Varadarajan
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: