Details
-
Task
-
Status: Resolved
-
Major
-
Resolution: Not A Problem
-
0.9.0
-
None
Description
Hudi queries in S3 takes abnormally longer time compared to hdfs.
S3 listing itself is not taking that long of time.
PERFORMANCE BUG:
the metadata list performance is likely causing performance issues with hudi.
scala> stopwatch({ sql("SELECT * FROM ap_invoices_all_compacted_s3").count})
{{Elapsed time: 1m 55.078473113s
res2: Long = xxxxxxxxxxxx}}
{{}}
scala> stopwatch({ sql("SELECT * FROM ap_invoices_all_compacted").count}) – this is the exact same table in hdfs
{{Elapsed time: 6.581217052s
res3: Long = xxxxxxxxxxx}}