Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-1479

Replace FSUtils.getAllPartitionPaths() with HoodieTableMetadata#getAllPartitionPaths()

    XMLWordPrintableJSON

Details

    Description

      Change #1

      public static List<String> getAllPartitionPaths(FileSystem fs, String basePathStr, boolean useFileListingFromMetadata, boolean verifyListings,
                                                        boolean assumeDatePartitioning) throws IOException {
          if (assumeDatePartitioning) {
            return getAllPartitionFoldersThreeLevelsDown(fs, basePathStr);
          } else {
            HoodieTableMetadata tableMetadata = HoodieTableMetadata.create(fs.getConf(), basePathStr, "/tmp/", useFileListingFromMetadata,
                verifyListings, false, false);
            return tableMetadata.getAllPartitionPaths();
          }
       }
      

      is the current implementation, where `HoodieTableMetadata.create()` always creates `HoodieBackedTableMetadata`. Instead we should create `FileSystemBackedTableMetadata` if useFileListingFromMetadata==false anyways. This helps address https://github.com/apache/hudi/pull/2398/files#r550709687

      Change #2

      On master, we have the `HoodieEngineContext` abstraction, which allows for parallel execution. We should consider moving it to `hudi-common` (its doable) and then have `FileSystemBackedTableMetadata` redone such that it can do parallelized listings using the passed in engine. either HoodieSparkEngineContext or HoodieJavaEngineContext. HoodieBackedTableMetadata#getPartitionsToFilesMapping has some parallelized code. We should take one pass and see if that can be redone a bit as well.  Food for thought: https://github.com/apache/hudi/pull/2398#discussion_r550711216

       

      Change #3

      There are places, where we call fs.listStatus() directly. We should make them go through the HoodieTable.getMetadata()... route as well. Essentially, all listing should be concentrated to `FileSystemBackedTableMetadata`

      Attachments

        1. image-2021-01-05-10-00-35-187.png
          181 kB
          Vinoth Chandar

        Issue Links

          Activity

            People

              uditme Udit Mehrotra
              vinoth Vinoth Chandar
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: