Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.6.0
    • Component/s: test
    • Labels:

      Description

      Saw this across three different RW clients. Obviously the key differs each time.

      java.lang.Exception: Error running node Bulk.xml
              at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:285)
              at org.apache.accumulo.test.randomwalk.Framework.run(Framework.java:65)
              at org.apache.accumulo.test.randomwalk.Framework.main(Framework.java:125)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:606)
              at org.apache.accumulo.start.Main$1.run(Main.java:137)
              at java.lang.Thread.run(Thread.java:744)
      Caused by: java.lang.Exception: Error running node bulk.Verify
              at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:285)
              at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:254)
              ... 8 more
      Caused by: java.lang.Exception: Bad key at r08930 cf:000 [] 1388209514232 false 1
              at org.apache.accumulo.test.randomwalk.bulk.Verify.visit(Verify.java:65)
              at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:254)
              ... 9 more
      

      Some relevant logs from the RW client, but there's nothing in the server log that I've seen.

      27 21:18:34,660 [bulk.BulkPlusOne] DEBUG: preparing bulk files with start rows [r00000, r00483, r08930, r0a413, r0a603, r0b20e, r10a31, r1481e, r15e3d, r1853e] last row r1869f marker 0000379
      ...
      27 21:21:03,710 [bulk.BulkPlusOne] DEBUG: Finished bulk import, start rows [r00000, r00483, r08930, r0a413, r0a603, r0b20e, r10a31, r1481e, r15e3d, r1853e] last row r1869f marker 0000379
      27 21:21:03,710 [bulk.BulkPlusOne] INFO : Incrementing
      ...
      27 21:48:36,260 [impl.ThriftTransportPool] INFO : Thread "bulkImportPool 3" no longer stuck on IO  to master:9999 (0) sawError = false
      27 21:48:36,347 [bulk.Compact] INFO : Compaction (r0349f -> r08a2e] finished
      27 21:48:44,229 [impl.ThriftScanner] DEBUG: Scan failed, not serving tablet (lw;r0077a;r004a1,tserver1:9997,1432840570f049e)
      27 21:48:46,145 [impl.ThriftScanner] DEBUG: Scan failed, not serving tablet (lw;r007d5;r0077a,tserver1:9997,1432840570f049e) 
      27 21:48:48,738 [impl.ThriftScanner] DEBUG: Scan failed, not serving tablet (lw;r00f3b;r00e54,tserver3:9997,34328404fc00428)
      27 21:48:50,331 [impl.ThriftScanner] DEBUG: Scan failed, not serving tablet (lw;r014f1;r00f3b,tserver3:9997,34328404fc00428)
      27 21:49:00,030 [impl.ThriftScanner] DEBUG: Scan failed, not serving tablet (lw;r017a3;r014f1,tserver1:9997,1432840570f049e)
      27 21:49:07,122 [impl.ThriftScanner] DEBUG: Scan failed, not serving tablet (lw;r02f24;r02824,tserver4:9997,1432840570f045b)
      
      
      

        Issue Links

          Activity

          Hide
          Josh Elser added a comment -

          Not really sure about this one. It's not outwardly clear if this is actually a bug in Accumulo or if it's a bug in the test.

          Show
          Josh Elser added a comment - Not really sure about this one. It's not outwardly clear if this is actually a bug in Accumulo or if it's a bug in the test.
          Hide
          Eric Newton added a comment -

          Josh Elser, it's probably a bug in accumulo. This test has been stable.

          Show
          Eric Newton added a comment - Josh Elser , it's probably a bug in accumulo. This test has been stable.
          Hide
          Josh Elser added a comment -

          Ok, that's good to know.

          Show
          Josh Elser added a comment - Ok, that's good to know.
          Hide
          Eric Newton added a comment -

          I replicated this problem on a 20-node AWS cluster.

          30 22:17:54,447 [randomwalk.Framework] ERROR: Error during random walk
          java.lang.Exception: Error running node Bulk.xml
                  at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:285)
                  at org.apache.accumulo.test.randomwalk.Framework.run(Framework.java:65)
                  at org.apache.accumulo.test.randomwalk.Framework.main(Framework.java:125)
                  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
                  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                  at java.lang.reflect.Method.invoke(Method.java:606)
                  at org.apache.accumulo.start.Main$1.run(Main.java:137)
                  at java.lang.Thread.run(Thread.java:744)
          Caused by: java.lang.Exception: Error running node bulk.Verify
                  at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:285)
                  at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:254)
                  ... 8 more
          Caused by: java.lang.Exception: Bad key at r13684 cf:000 [] 1388441757199 false -1
                  at org.apache.accumulo.test.randomwalk.bulk.Verify.visit(Verify.java:65)
                  at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:254)
                  ... 9 more
          

          The Bulk RW test does a bunch of random operations concurrently, but at the end (during the Verify), it should have imported an equal number of "1" and "-1" values. The table has a Combiner to add the values up. Every row value should be zero. In this case, rows r13684 - r1442d are "-1". So, somewhere, data was lost, or imported more than once.

          The row r1442d shows up as a split added during the import of a file containing "1".

          Show
          Eric Newton added a comment - I replicated this problem on a 20-node AWS cluster. 30 22:17:54,447 [randomwalk.Framework] ERROR: Error during random walk java.lang.Exception: Error running node Bulk.xml at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:285) at org.apache.accumulo.test.randomwalk.Framework.run(Framework.java:65) at org.apache.accumulo.test.randomwalk.Framework.main(Framework.java:125) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.accumulo.start.Main$1.run(Main.java:137) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.Exception: Error running node bulk.Verify at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:285) at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:254) ... 8 more Caused by: java.lang.Exception: Bad key at r13684 cf:000 [] 1388441757199 false -1 at org.apache.accumulo.test.randomwalk.bulk.Verify.visit(Verify.java:65) at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:254) ... 9 more The Bulk RW test does a bunch of random operations concurrently, but at the end (during the Verify), it should have imported an equal number of "1" and "-1" values. The table has a Combiner to add the values up. Every row value should be zero. In this case, rows r13684 - r1442d are "-1". So, somewhere, data was lost, or imported more than once. The row r1442d shows up as a split added during the import of a file containing "1".
          Hide
          Eric Newton added a comment -

          In addition to importing "1" and "-1" values, a marker column is added to each row for each bulk file. No marker columns are missing, so it's probably bulk importing a file multiple times.

          Show
          Eric Newton added a comment - In addition to importing "1" and "-1" values, a marker column is added to each row for each bulk file. No marker columns are missing, so it's probably bulk importing a file multiple times.
          Hide
          Eric Newton added a comment -

          The combiner should be adding up the markers, but they are always "1". If a bulk file was imported multiple times, the count would be higher.

          Show
          Eric Newton added a comment - The combiner should be adding up the markers, but they are always "1". If a bulk file was imported multiple times, the count would be higher.
          Hide
          Eric Newton added a comment -

          Oh, it looks like there is a multiple bulk import:

          ...
          r13686 marker:0000050 []    1
          r13686 marker:0000051 []    1
          r13686 marker:0000052 []    2
          r13686 marker:0000053 []    1
          r13686 marker:0000054 []    1
          ...
          
          Show
          Eric Newton added a comment - Oh, it looks like there is a multiple bulk import: ... r13686 marker:0000050 [] 1 r13686 marker:0000051 [] 1 r13686 marker:0000052 [] 2 r13686 marker:0000053 [] 1 r13686 marker:0000054 [] 1 ...
          Hide
          Eric Newton added a comment -

          git seems to be read-only at the moment

          Show
          Eric Newton added a comment - git seems to be read-only at the moment
          Hide
          ASF subversion and git services added a comment -

          Commit 16ccbf5e16467729b558cf08a379770cba10f5ab in branch refs/heads/1.6.0-SNAPSHOT from Eric Newton
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=16ccbf5 ]

          ACCUMULO-2110 cannot remove a FileRef with a Path

          Show
          ASF subversion and git services added a comment - Commit 16ccbf5e16467729b558cf08a379770cba10f5ab in branch refs/heads/1.6.0-SNAPSHOT from Eric Newton [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=16ccbf5 ] ACCUMULO-2110 cannot remove a FileRef with a Path
          Hide
          ASF subversion and git services added a comment -

          Commit ec7724823628c240f6f31a3fbd5b4b59bb03053e in branch refs/heads/1.6.0-SNAPSHOT from Eric Newton
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=ec77248 ]

          ACCUMULO-2110 cannot remove a FileRef with a Path

          Show
          ASF subversion and git services added a comment - Commit ec7724823628c240f6f31a3fbd5b4b59bb03053e in branch refs/heads/1.6.0-SNAPSHOT from Eric Newton [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=ec77248 ] ACCUMULO-2110 cannot remove a FileRef with a Path
          Hide
          ASF subversion and git services added a comment -

          Commit ec7724823628c240f6f31a3fbd5b4b59bb03053e in branch refs/heads/master from Eric Newton
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=ec77248 ]

          ACCUMULO-2110 cannot remove a FileRef with a Path

          Show
          ASF subversion and git services added a comment - Commit ec7724823628c240f6f31a3fbd5b4b59bb03053e in branch refs/heads/master from Eric Newton [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=ec77248 ] ACCUMULO-2110 cannot remove a FileRef with a Path
          Hide
          ASF subversion and git services added a comment -

          Commit 16ccbf5e16467729b558cf08a379770cba10f5ab in branch refs/heads/master from Eric Newton
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=16ccbf5 ]

          ACCUMULO-2110 cannot remove a FileRef with a Path

          Show
          ASF subversion and git services added a comment - Commit 16ccbf5e16467729b558cf08a379770cba10f5ab in branch refs/heads/master from Eric Newton [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=16ccbf5 ] ACCUMULO-2110 cannot remove a FileRef with a Path
          Hide
          Keith Turner added a comment -

          I was looking at the patch, It may not go far enough to fix the issue. The comparison w/ the load marker is done using absolute paths. Its possible that different absolute paths could be constructed that ref the same file.

          Show
          Keith Turner added a comment - I was looking at the patch, It may not go far enough to fix the issue. The comparison w/ the load marker is done using absolute paths. Its possible that different absolute paths could be constructed that ref the same file.
          Hide
          Keith Turner added a comment -

          I opened ACCUMULO-2173 to address the more general issue of comparing absolute paths.

          Show
          Keith Turner added a comment - I opened ACCUMULO-2173 to address the more general issue of comparing absolute paths.

            People

            • Assignee:
              Eric Newton
              Reporter:
              Josh Elser
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development