Details
Description
Just helped someone debug an issue. Their scans were getting stuck on a certain tserver (determined tserver by turning on debug in shell). On the tserver, there was a contant stream of messages about a metadata table contstraint violate because Bulk load transaction no longer running.
The following code in Tablet.importMapFiles()
synchronized (timeLock) { if (bulkTime > persistedTime) persistedTime = bulkTime; MetadataTableUtil.updateTabletDataFile(tid, extent, paths, tabletTime.getMetadataValue(persistedTime), creds, tabletServer.getLock()); }
Ended up calling the following code in MetadataTableUtil.
public static void update(Credentials credentials, ZooLock zooLock, Mutation m, KeyExtent extent) { Writer t = extent.isMeta() ? getRootTable(credentials) : getMetadataTable(credentials); if (zooLock != null) putLockID(zooLock, m); while (true) { try { t.update(m); return; } catch (AccumuloException e) { log.error(e, e); } catch (AccumuloSecurityException e) { log.error(e, e); } catch (ConstraintViolationException e) { log.error(e, e); } catch (TableNotFoundException e) { log.error(e, e); } UtilWaitThread.sleep(1000); } }
So when the constraint failed, it retried forever. It did this while holding timeLock, which in turn prevented compactions from completing, which eventually gummed up scans.
Attachments
Attachments
Issue Links
- links to