Uploaded image for project: 'Apache Gora'
  1. Apache Gora
  2. GORA-401

Serialization and deserialization of Persistent does not hold the entity dirty state from Map to Reduce

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 0.4, 0.5
    • 0.7
    • gora-core
    • Tested on gora-0.4, but seems logically to hold on gora-0.5. HBase backend.

    Description

      After removing _g_dirty field in GORA-326, dirty field is not serialized. In GORA-321 PersistentSerializer went from using PersistentDatumWriter(/Reader) to Avro's SpecificDatumWriter, delegating the serialization of the dirty field to Avro (but really not desirable to have that field as a main field in the entities).

      The proposal is to reintroduce the PersistentDatumWriter/Reader which will serialize the internal fields of the entities.

      This bug affects, for example, Nutch, which loads only some fields in it's phases, serializes entities (from Map to Reduce), and when deserializes finds all fields as "dirty", independently of what fields were modified in the Map, and overwrite all data in datastore (deleting much things: downloaded content, parsed content, etc).

      This effect can be seen in TestPersistentSerialization#testSerderEmployeeTwoFields, when debuging in TestIOUtils#testSerializeDeserialize. Proper breakpoints an inspections shows that, entities are "equal" when it's fields are equal. This is fine as "equal" definition, but another test must be added to check that serialization an deserialization keeps the dirty state.

      Attachments

        1. GORA-401-tests.patch
          11 kB
          alfonso.nishikawa
        2. GORA-401v1.patch
          67 kB
          alfonso.nishikawa
        3. GORA-401v2.patch
          64 kB
          alfonso.nishikawa
        4. GORA-401v3.patch
          64 kB
          alfonso.nishikawa
        5. GORA-401v4.patch
          64 kB
          alfonso.nishikawa
        6. GORA-401v5.patch
          60 kB
          alfonso.nishikawa

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            djkevincr Kevin Ratnasekera
            alfonsonishikawa Alfonso Nishikawa Muñumer
            Votes:
            1 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 35h
                35h
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 21h Time Not Required
                21h

                Slack

                  Issue deployment