Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.8.0
    • Component/s: None
    • Labels:
      None

      Description

      Users can bulk import RFiles. Currently the only way users can create RFiles using Accumulo's public API is via AccumuloFileOutputFormat. There is no way to read RFiles in the public API. Also, the internal APIs for reading and writing RFiles are cumbersome to use.

      I am experimenting with a simple RFile API like the following. Below is an example of writing data.

          LocalFileSystem localFs = FileSystem.getLocal(new Configuration());
          RFileWriter writer = RFileFactory.newWriter()
                                             .withFileName("/tmp/test100M.rf")
                                             .withFileSystem(localFs).build();
      
          writer.startDefaultLocalityGroup();
          for (int r = 0; r < 10000000; r++) {
            for (int cq = 0; cq < 10; cq++) {
              writer.append(genKey(r, cq), genVal(r, cq));
            }
          }
      
          writer.close();
      

      Below is an example of reading data.

          LocalFileSystem localFs = FileSystem.getLocal(new Configuration());
          Scanner scanner = RFileFactory.newScanner()
                                                .withFileName("/tmp/test100M.rf")
                                                .withFileSystem(localFs)
                                                .withDataCache(250000000)
                                                .withIndexCache(1000000).build();
      

        Issue Links

          Activity

          Hide
          ctubbsii Christopher Tubbs added a comment -

          It looks that the API you're envisioning will require some understanding of how locality groups are stored in RFiles. Have you considered omitting locality group support entirely, or just using a single default locality group if none are started before the first key is appended?

          For the factories, I'd assume the minimum information to provide is the filename? If so, should it default to "file://" if it begins with a "/"?

          For the parameters which take sizes, it'd be useful to be able to specify a string format, like "20M" instead of 1024*1024*20 bytes.

          Will this API be something that could be used internally to clean up some of our code which uses RFiles? (I hope so.)

          Show
          ctubbsii Christopher Tubbs added a comment - It looks that the API you're envisioning will require some understanding of how locality groups are stored in RFiles. Have you considered omitting locality group support entirely, or just using a single default locality group if none are started before the first key is appended? For the factories, I'd assume the minimum information to provide is the filename? If so, should it default to "file://" if it begins with a "/"? For the parameters which take sizes, it'd be useful to be able to specify a string format, like "20M" instead of 1024*1024*20 bytes. Will this API be something that could be used internally to clean up some of our code which uses RFiles? (I hope so.)
          Hide
          elserj Josh Elser added a comment -

          For the factories, I'd assume the minimum information to provide is the filename? If so, should it default to "file://" if it begins with a "/"?

          If the default FileSystem is against the local filesystem, it seems reasonable to just always create a Path using that FileSystem (deferring to FileSystem to "localize" it, or tell us if it already has the wrong scheme?).

          For the parameters which take sizes, it'd be useful to be able to specify a string format, like "20M" instead of 1024*1024*20 bytes.

          IMO, I think accepting a long representing bytes is fine, but :shrug:. I just think it's pretty easy for someone to make a constant in their code for certain numbers.

          Show
          elserj Josh Elser added a comment - For the factories, I'd assume the minimum information to provide is the filename? If so, should it default to "file://" if it begins with a "/"? If the default FileSystem is against the local filesystem, it seems reasonable to just always create a Path using that FileSystem (deferring to FileSystem to "localize" it, or tell us if it already has the wrong scheme?). For the parameters which take sizes, it'd be useful to be able to specify a string format, like "20M" instead of 1024*1024*20 bytes. IMO, I think accepting a long representing bytes is fine, but :shrug:. I just think it's pretty easy for someone to make a constant in their code for certain numbers.
          Hide
          elserj Josh Elser added a comment -

          Sean Busbey told me that marking "Patch Available" with a weblink to a pull request should trigger Yetus PreCommit. Trying that out now

          Show
          elserj Josh Elser added a comment - Sean Busbey told me that marking "Patch Available" with a weblink to a pull request should trigger Yetus PreCommit. Trying that out now
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          +1 @author 0m 1s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          0 mvndep 1m 27s Maven dependency ordering for branch
          +1 mvninstall 1m 40s 1.8 passed
          +1 compile 1m 12s 1.8 passed with JDK v1.8.0
          +1 compile 0m 54s 1.8 passed with JDK v1.7.0_79
          +1 checkstyle 1m 13s 1.8 passed
          +1 mvneclipse 0m 34s 1.8 passed
          +1 findbugs 2m 58s 1.8 passed
          +1 javadoc 0m 59s 1.8 passed with JDK v1.8.0
          +1 javadoc 0m 59s 1.8 passed with JDK v1.7.0_79
          0 mvndep 0m 15s Maven dependency ordering for patch
          +1 mvninstall 0m 57s the patch passed
          +1 compile 1m 4s the patch passed with JDK v1.8.0
          +1 javac 1m 4s the patch passed
          +1 compile 0m 50s the patch passed with JDK v1.7.0_79
          +1 javac 0m 50s the patch passed
          +1 checkstyle 1m 13s the patch passed
          +1 mvneclipse 0m 25s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 3m 44s the patch passed
          -1 javadoc 0m 35s core-jdk1.8.0 with JDK v1.8.0 generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1)
          +1 javadoc 1m 1s the patch passed with JDK v1.7.0_79
          +1 unit 13m 4s root in the patch passed with JDK v1.8.0.
          +1 unit 12m 46s root in the patch passed with JDK v1.7.0_79.
          +1 asflicense 0m 13s The patch does not generate ASF License warnings.
          49m 15s



          Subsystem Report/Notes
          JIRA Issue ACCUMULO-4165
          GITHUB PR https://github.com/apache/accumulo/pull/103
          Optional Tests asflicense javac javadoc unit findbugs checkstyle compile
          uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /home/jenkins/jenkins-slave/workspace/PreCommit-ACCUMULO-Build/test_framework/yetus-0.3.0/lib/precommit/personality/accumulo.sh
          git revision 1.8 / 85ff374
          Default Java 1.7.0_79
          Multi-JDK versions /home/jenkins/tools/java/jdk1.8.0:1.8.0 /usr/local/jenkins/java/jdk1.7.0_79:1.7.0_79
          findbugs v3.0.1
          javadoc https://builds.apache.org/job/PreCommit-ACCUMULO-Build/24/artifact/patchprocess/diff-javadoc-javadoc-core-jdk1.8.0.txt
          JDK v1.7.0_79 Test Results https://builds.apache.org/job/PreCommit-ACCUMULO-Build/24/testReport/
          modules C: core server/tserver U: .
          Console output https://builds.apache.org/job/PreCommit-ACCUMULO-Build/24/console
          Powered by Apache Yetus 0.3.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment +1 @author 0m 1s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. 0 mvndep 1m 27s Maven dependency ordering for branch +1 mvninstall 1m 40s 1.8 passed +1 compile 1m 12s 1.8 passed with JDK v1.8.0 +1 compile 0m 54s 1.8 passed with JDK v1.7.0_79 +1 checkstyle 1m 13s 1.8 passed +1 mvneclipse 0m 34s 1.8 passed +1 findbugs 2m 58s 1.8 passed +1 javadoc 0m 59s 1.8 passed with JDK v1.8.0 +1 javadoc 0m 59s 1.8 passed with JDK v1.7.0_79 0 mvndep 0m 15s Maven dependency ordering for patch +1 mvninstall 0m 57s the patch passed +1 compile 1m 4s the patch passed with JDK v1.8.0 +1 javac 1m 4s the patch passed +1 compile 0m 50s the patch passed with JDK v1.7.0_79 +1 javac 0m 50s the patch passed +1 checkstyle 1m 13s the patch passed +1 mvneclipse 0m 25s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 3m 44s the patch passed -1 javadoc 0m 35s core-jdk1.8.0 with JDK v1.8.0 generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1) +1 javadoc 1m 1s the patch passed with JDK v1.7.0_79 +1 unit 13m 4s root in the patch passed with JDK v1.8.0. +1 unit 12m 46s root in the patch passed with JDK v1.7.0_79. +1 asflicense 0m 13s The patch does not generate ASF License warnings. 49m 15s Subsystem Report/Notes JIRA Issue ACCUMULO-4165 GITHUB PR https://github.com/apache/accumulo/pull/103 Optional Tests asflicense javac javadoc unit findbugs checkstyle compile uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /home/jenkins/jenkins-slave/workspace/PreCommit-ACCUMULO-Build/test_framework/yetus-0.3.0/lib/precommit/personality/accumulo.sh git revision 1.8 / 85ff374 Default Java 1.7.0_79 Multi-JDK versions /home/jenkins/tools/java/jdk1.8.0:1.8.0 /usr/local/jenkins/java/jdk1.7.0_79:1.7.0_79 findbugs v3.0.1 javadoc https://builds.apache.org/job/PreCommit-ACCUMULO-Build/24/artifact/patchprocess/diff-javadoc-javadoc-core-jdk1.8.0.txt JDK v1.7.0_79 Test Results https://builds.apache.org/job/PreCommit-ACCUMULO-Build/24/testReport/ modules C: core server/tserver U: . Console output https://builds.apache.org/job/PreCommit-ACCUMULO-Build/24/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
          Hide
          kturner Keith Turner added a comment -

          Have you considered omitting locality group support entirely, or just using a single default locality group if none are started before the first key is appended?

          Sorry I did not see these comments until now. Not going to drop support for locality groups. I think I will make it automatically start the default locality group. That should be a quick change.

          Show
          kturner Keith Turner added a comment - Have you considered omitting locality group support entirely, or just using a single default locality group if none are started before the first key is appended? Sorry I did not see these comments until now. Not going to drop support for locality groups. I think I will make it automatically start the default locality group. That should be a quick change.

            People

            • Assignee:
              kturner Keith Turner
              Reporter:
              kturner Keith Turner
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 6.5h
                6.5h

                  Development