Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-110

Better defaults for Partition extractor for Spark DataSource and DeltaStreamer

    XMLWordPrintableJSON

Details

    Description

      Currently

      SlashEncodedDayPartitionValueExtractor is the default being used. This is not a common format outside Uber.

       

      Also, Spark DataSource provides partitionedBy clauses which has not been integrated for Hudi Data Source.  We need to investigate how we can leverage partitionBy clause for partitioning.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              vbalaji Balaji Varadarajan
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: