Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-1608

MOR fetches all records for read optimized query w/ spark sql

    XMLWordPrintableJSON

Details

    Description

      Script to reproduce in local spark:

       

      https://gist.github.com/nsivabalan/7250b794788516f1aec35650c2632364

       

      ```

      scala> spark.sql("select _hoodie_commit_time, _hoodie_record_key, _hoodie_partition_path, id, __op from hudi_trips_snapshot order by _hoodie_record_key").show(false)

      ----------------------------------++--------------------------

      _hoodie_commit_time _hoodie_record_key _hoodie_partition_path id __op

      ----------------------------------++--------------------------

      20210210070347     1                 1970-01-01            null
      20210210070347     2                 1970-01-01            null
      20210210070347     3                 2020-01-04            D  
      20210210070347     4                 1998-04-13            I  
      20210210070347     5                 2020-01-01            I  
      20210210070445     6                 1998-04-13            6  I  

      ----------------------------------++--------------------------

      ```

      After an upsert, read optimized query returns records from both C1 and C2. 

      Also, I don't find any log files in partitions. all of them are parquet files. 

       

      ls /tmp/hudi_trips_cow/1998-04-13/

      0d1e6a84-d036-42e9-806e-a3075b6bc677-0_1-23-12025_20210210065058.parquet

      0d1e6a84-d036-42e9-806e-a3075b6bc677-0_1-61-25595_20210210065127.parquet

      ls /tmp/hudi_trips_cow/1970-01-01/

      7b836833-a656-485d-967a-871bdc653dc3-0_2-61-25596_20210210065127.parquet

      7b836833-a656-485d-967a-871bdc653dc3-0_3-23-12027_20210210065058.parquet

       

      Source of the issue: https://github.com/apache/hudi/issues/2255

       

       

      Attachments

        Issue Links

          Activity

            People

              shivnarayan sivabalan narayanan
              shivnarayan sivabalan narayanan
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: