Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-4165

Create a user level API for RFile

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • None
    • 1.8.0
    • None
    • None

    Description

      Users can bulk import RFiles. Currently the only way users can create RFiles using Accumulo's public API is via AccumuloFileOutputFormat. There is no way to read RFiles in the public API. Also, the internal APIs for reading and writing RFiles are cumbersome to use.

      I am experimenting with a simple RFile API like the following. Below is an example of writing data.

          LocalFileSystem localFs = FileSystem.getLocal(new Configuration());
          RFileWriter writer = RFileFactory.newWriter()
                                             .withFileName("/tmp/test100M.rf")
                                             .withFileSystem(localFs).build();
      
          writer.startDefaultLocalityGroup();
          for (int r = 0; r < 10000000; r++) {
            for (int cq = 0; cq < 10; cq++) {
              writer.append(genKey(r, cq), genVal(r, cq));
            }
          }
      
          writer.close();
      

      Below is an example of reading data.

          LocalFileSystem localFs = FileSystem.getLocal(new Configuration());
          Scanner scanner = RFileFactory.newScanner()
                                                .withFileName("/tmp/test100M.rf")
                                                .withFileSystem(localFs)
                                                .withDataCache(250000000)
                                                .withIndexCache(1000000).build();
      

      Attachments

        Activity

          People

            kturner Keith Turner
            kturner Keith Turner
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 6.5h
                6.5h