Uploaded image for project: 'Apache Jena'
  1. Apache Jena
  2. JENA-2044

tdb2.tdbloader crashses loading wikidata

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Feedback Received
    • Jena 3.14.0, Jena 3.17.0
    • Jena 4.2.0
    • Cmd line tools, Jena, TDB2
    • None
    • Important

    Description

      Apache jena crashes when loading wikidata truthy 2020-11-11 (it is not available any more via wikidata, but a backup can be found here: https://hobbitdata.informatik.uni-leipzig.de/wikidata-20201111-truthy-BETA.nt.bz2

      command run was:

      cgmemtime /upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/triplestores/jena/apache-jena-3.17.0/bin/tdb2.tdbloader --loader=sequential --loc /upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/databases/fuseki/wikidata-2020-11-11/ /upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/datasets/wikidata-2020-11-11/wikidata-20201111-truthy-BETA.nt 2>&1 | tee /upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/logs/load/fuseki-wikidata-2020-11-11.log
      

       

      The end of the logfile is:

      06:56:52 INFO  loader          :: Add: 729,000,000 SPO->OSP (Batch: 237,642 / Avg: 300,050)
      06:56:57 INFO  loader          :: Add: 730,000,000 SPO->OSP (Batch: 234,962 / Avg: 299,936)
      06:56:57 INFO  loader          ::   Elapsed: 2,433.85 seconds [2021/02/12 06:56:57 CET]
      06:57:01 INFO  loader          :: Add: 731,000,000 SPO->OSP (Batch: 233,863 / Avg: 299,820)
      06:57:05 INFO  loader          :: Add: 732,000,000 SPO->OSP (Batch: 269,978 / Avg: 299,775)
      06:57:08 INFO  loader          :: Add: 733,000,000 SPO->OSP (Batch: 281,373 / Avg: 299,748)
      06:57:12 INFO  loader          :: Add: 734,000,000 SPO->OSP (Batch: 285,143 / Avg: 299,727)
      06:57:15 INFO  loader          :: Add: 735,000,000 SPO->OSP (Batch: 290,023 / Avg: 299,714)
      06:57:19 INFO  loader          :: Add: 736,000,000 SPO->OSP (Batch: 290,951 / Avg: 299,701)
      #
      # There is insufficient memory for the Java Runtime Environment to continue.
      # Native memory allocation (malloc) failed to allocate 2097152 bytes for AllocateHeap
      # An error report file with more information is saved as:
      # /home/d/dice-gr/profiles/unix/cs/triplestore-benchmark/hs_err_pid28709.log
      Child user: 70961.063 s
      Child sys : 5817.787 s
      Child wall: 76243.666 s
      Child high-water RSS                    :  534037652 KiB
      Recursive and acc. high-water RSS+CACHE :  585081976 KiB
      

       The machine has a AMD EPYC 7742 64-Core Processor, 1TB RAM and 2 TB free ssd storage on /home. So there should still be plenty of RAM have been available.
      The loc folder has at the time of crash 543GB.

      I also tried with -loader=parallel and -loader=phased . Same result.

      update 23/Feb/2021:

      ulimit -a

      core file size          (blocks, -c) 0
      data seg size           (kbytes, -d) unlimited
      scheduling priority             (-e) 0
      file size               (blocks, -f) unlimited
      pending signals                 (-i) 4110480
      max locked memory       (kbytes, -l) 65536
      max memory size         (kbytes, -m) unlimited
      open files                      (-n) 1024
      pipe size            (512 bytes, -p) 8
      POSIX message queues     (bytes, -q) 819200
      real-time priority              (-r) 0
      stack size              (kbytes, -s) 8192
      cpu time               (seconds, -t) unlimited
      max user processes              (-u) 4110480
      virtual memory          (kbytes, -v) unlimited
      file locks                      (-x) unlimited
      

      output from prlimit --pid=<PID of tdb.loader2>:

      RESOURCE   DESCRIPTION                             SOFT      HARD UNITS
      AS         address space limit                unlimited unlimited bytes
      CORE       max core file size                         0 unlimited bytes
      CPU        CPU time                           unlimited unlimited seconds
      DATA       max data size                      unlimited unlimited bytes
      FSIZE      max file size                      unlimited unlimited bytes
      LOCKS      max number of file locks held      unlimited unlimited locks
      MEMLOCK    max locked-in-memory address space  67108864  67108864 bytes
      MSGQUEUE   max bytes in POSIX mqueues            819200    819200 bytes
      NICE       max nice prio allowed to raise             0         0 
      NOFILE     max number of open files             1048576   1048576 files
      NPROC      max number of processes              4110480   4110480 processes
      RSS        max resident set size              unlimited unlimited bytes
      RTPRIO     max real-time priority                     0         0 
      RTTIME     timeout for real-time tasks        unlimited unlimited microsecs
      SIGPENDING max number of pending signals        4110480   4110480 signals
      STACK      max stack size                       8388608 unlimited bytes
      

      I tried several things. Nothing worked so far. Here is what I tried:

      • Setting several JVM_ARGS to avoid resource over-consumption:
        -Xmx16g -XX:MaxDirectMemorySize=16g -Djdk.nio.maxCachedBufferSize=17179869184 -XX:ActiveProcessorCount=4
        
      • Setting limits for Metaspace and CodeCache:
        -XX:MaxMetaspaceSize=4g -XX:ReservedCodeCacheSize=4g
        
      • limiting kernel shared memory to (set in /etc/sysctl.conf + reboot)
        kernel.shmmax=17179869185 # 16 GB
        kernel.shmall=4194304 # *4096 pagesize = 16 GB
        
      • putting jena into a dockerfile and run it with the commands below. The idea was to limit the resources and have a default environment. I first tried with 16g memory but it within a couple 100M triples dropping below 10k triples/s loading speed. So I increased the reseources because I didn't have that much time on the server. Dockerfile is provided as attachment as well as the output. Here the output might be interesting. Because tdb2.loader did not crash completely but simply logged an Exception and idled ever since.
        docker build -f FusekiWikidataLoaderDockerfile -t fuseki-wikidata_01 empty_folder/
        dockerun --user $(id -u):$(id -g) --kernel-memory=256g --memory=256g --cpuset-cpus=0-7 --name fuseki_wikidata_8c_J16g_R256g_K256g_phased -v "$(pwd)"/databases:/databases -v "$(pwd)"/datasets:/datasets -v "$(pwd)"/logs:/logs fuseki-wikidata_01:latest
        

        error:

        05:27:31 INFO  loader          :: Add: 3,156,000,000 Index (Batch: 57,074 / Avg: 159,787)
        05:27:35 ERROR Indexer         :: Interrupted
        org.apache.jena.dboe.base.file.FileException: BlockMgrMapped.segmentAllocate: Segment = 16313 : Offset = 136843362304
        	at org.apache.jena.dboe.base.file.BlockAccessMapped.allocSegment(BlockAccessMapped.java:218) ~[jena-dboe-base-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.base.file.BlockAccessMapped.getByteBuffer(BlockAccessMapped.java:150) ~[jena-dboe-base-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.base.file.BlockAccessMapped.allocate(BlockAccessMapped.java:80) ~[jena-dboe-base-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.base.block.BlockMgrFileAccess.allocate(BlockMgrFileAccess.java:51) ~[jena-dboe-base-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.base.block.BlockMgrBase.allocate(BlockMgrBase.java:39) ~[jena-dboe-base-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.base.block.BlockMgrWrapper.allocate(BlockMgrWrapper.java:41) ~[jena-dboe-base-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.base.block.BlockMgrFreeChain.allocate(BlockMgrFreeChain.java:44) ~[jena-dboe-base-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.base.page.PageBlockMgr.create(PageBlockMgr.java:53) ~[jena-dboe-base-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.trans.bplustree.BPTreeRecordsMgr.create(BPTreeRecordsMgr.java:77) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.trans.bplustree.BPTreeRecords.create(BPTreeRecords.java:236) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.trans.bplustree.BPTreeRecords.insertNewPage(BPTreeRecords.java:230) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.trans.bplustree.BPTreeRecords.split(BPTreeRecords.java:206) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.trans.bplustree.BPTreeNode.split(BPTreeNode.java:568) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:509) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:522) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:522) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:522) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.trans.bplustree.BPTreeNode.insert(BPTreeNode.java:203) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.trans.bplustree.BPlusTree.insertAndReturnOld(BPlusTree.java:278) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.trans.bplustree.BPlusTree.insert(BPlusTree.java:271) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
        	at org.apache.jena.tdb2.store.tupletable.TupleIndexRecord.performAdd(TupleIndexRecord.java:94) ~[jena-tdb2-3.17.0.jar:3.17.0]
        	at org.apache.jena.tdb2.store.tupletable.TupleIndexBase.add(TupleIndexBase.java:66) ~[jena-tdb2-3.17.0.jar:3.17.0]
        	at org.apache.jena.tdb2.loader.main.Indexer.lambda$loadTuples$1(Indexer.java:134) ~[jena-tdb2-3.17.0.jar:3.17.0]
        	at org.apache.jena.tdb2.loader.main.Indexer.stageIndex(Indexer.java:116) ~[jena-tdb2-3.17.0.jar:3.17.0]
        	at org.apache.jena.tdb2.loader.main.Indexer.lambda$startBulk$0(Indexer.java:93) ~[jena-tdb2-3.17.0.jar:3.17.0]
        	at java.lang.Thread.run(Thread.java:834) [?:?]
        [53484.193s][warning][os,thread] Attempt to deallocate stack guard pages failed (0x00007f33244ff000-0x00007f3324503000).
        05:27:36 ERROR Indexer         :: Interrupted
        org.apache.jena.dboe.base.file.FileException: BlockMgrMapped.segmentAllocate: Segment = 16459 : Offset = 138068099072
        	at org.apache.jena.dboe.base.file.BlockAccessMapped.allocSegment(BlockAccessMapped.java:218) ~[jena-dboe-base-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.base.file.BlockAccessMapped.getByteBuffer(BlockAccessMapped.java:150) ~[jena-dboe-base-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.base.file.BlockAccessMapped.allocate(BlockAccessMapped.java:80) ~[jena-dboe-base-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.base.block.BlockMgrFileAccess.allocate(BlockMgrFileAccess.java:51) ~[jena-dboe-base-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.base.block.BlockMgrBase.allocate(BlockMgrBase.java:39) ~[jena-dboe-base-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.base.block.BlockMgrWrapper.allocate(BlockMgrWrapper.java:41) ~[jena-dboe-base-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.base.block.BlockMgrFreeChain.allocate(BlockMgrFreeChain.java:44) ~[jena-dboe-base-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.base.page.PageBlockMgr.create(PageBlockMgr.java:53) ~[jena-dboe-base-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.trans.bplustree.BPTreeRecordsMgr.create(BPTreeRecordsMgr.java:77) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.trans.bplustree.BPTreeRecords.create(BPTreeRecords.java:236) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.trans.bplustree.BPTreeRecords.insertNewPage(BPTreeRecords.java:230) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.trans.bplustree.BPTreeRecords.split(BPTreeRecords.java:206) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.trans.bplustree.BPTreeNode.split(BPTreeNode.java:568) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:509) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:522) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:522) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:522) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.trans.bplustree.BPTreeNode.insert(BPTreeNode.java:203) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.trans.bplustree.BPlusTree.insertAndReturnOld(BPlusTree.java:278) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
        	at org.apache.jena.dboe.trans.bplustree.BPlusTree.insert(BPlusTree.java:271) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
        	at org.apache.jena.tdb2.store.tupletable.TupleIndexRecord.performAdd(TupleIndexRecord.java:94) ~[jena-tdb2-3.17.0.jar:3.17.0]
        	at org.apache.jena.tdb2.store.tupletable.TupleIndexBase.add(TupleIndexBase.java:66) ~[jena-tdb2-3.17.0.jar:3.17.0]
        	at org.apache.jena.tdb2.loader.main.Indexer.lambda$loadTuples$1(Indexer.java:134) ~[jena-tdb2-3.17.0.jar:3.17.0]
        	at org.apache.jena.tdb2.loader.main.Indexer.stageIndex(Indexer.java:116) ~[jena-tdb2-3.17.0.jar:3.17.0]
        	at org.apache.jena.tdb2.loader.main.Indexer.lambda$startBulk$0(Indexer.java:93) ~[jena-tdb2-3.17.0.jar:3.17.0]
        	at java.lang.Thread.run(Thread.java:834) [?:?]
        [53485.632s][warning][os,thread] Attempt to deallocate stack guard pages failed (0x00007f33242fd000-0x00007f3324301000).
        

      Side observation:

      • Fusekis loading performance seems to be influenced by linux' cache significantly. If there is 1TB available it had had a read rate of ~2.5GB/s (measured with htop). With only 16GB it dropped to at max 250MB/s.

      Attachments

        1. FusekiWikidataLoaderDockerfile
          0.9 kB
          Alexander Bigerl
        2. fuseki-wikidata-2020-11-11_(docker_8c_J16g_R256g_K256g_phased).log
          1.63 MB
          Alexander Bigerl
        3. hs_err_pid28709.log
          11.49 MB
          Alexander Bigerl

        Activity

          People

            andy Andy Seaborne
            bigerl Alexander Bigerl
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: