Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-893

DataStore.put() silently loses records when executed from multiple processes

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Invalid
    • nutchgora
    • nutchgora
    • None
    • None
    • Gora HEAD, SqlStore, MySQL 5.1, Ubuntu 10.4 x64, Sun JDK 1.6

    Description

      In order to debug the issue described in NUTCH-879 I created a test to simulate multiple clients appending to webtable (please see the patch), which is the situation that we have in distributed map-reduce jobs.

      There are two tests there: one that uses multiple threads within the same JVM, and another that uses single thread in multiple JVMs. Each test first clears webtable (be careful!), and then puts a bunch of pages, and finally counts that all are present and their values correspond to keys. To make things more interesting each execution context (thread or process) closes and reopens its instance of DataStore a few times.

      The multithreaded test passes just fine. However, the multi-process test fails with missing keys, as many as 30%.

      Attachments

        1. NUTCH-893.patch
          7 kB
          Andrzej Bialecki
        2. NUTCH-893_v2.patch
          7 kB
          Enis Soztutar

        Activity

          People

            Unassigned Unassigned
            ab Andrzej Bialecki
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: