Details

    • Type: Sub-task Sub-task
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      It would be useful run a compaction that deletes all of the nodes written by a given ingest client (each ingest client writes a uuid that this filter could use). This would probably be best done after verification( or on a clone in parallel to verification). For example could do the following steps in testing.

      1. run ingest for a time period
      2. stop ingest
      3. verify
      4. run compaction filter to delete data written by one or more ingest clients
      5. verify

      Its possible that ingest clients can over write each others nodes, but it seems like this would not cause a problem. Below is one example where this does not cause a problem

      1. ingest client A writes 2:A->3:A->5:A->6:A->7:A
      2. ingest client B writes 12:B->13:B->5:B->16:B->17:B
      3. every thing written by B is deleted

      In the above case, 2:A->3:A and 6:A->7:A would be the only thing left. There are not pointers to undefined nodes.

        Activity

        Hide
        Mike Drob added a comment -

        At the risk of asking the obvious, what happens to the 3->5 link?

        Show
        Mike Drob added a comment - At the risk of asking the obvious, what happens to the 3->5 link?
        Hide
        Keith Turner added a comment -

        The example is completely wrong. I forgot about columns and was only thinking of random rows. I was thinking that row 5 would be deleted, but thats not the case because B would most likely write different columns. The chance collision in the row and column space is 1/2^93, I think there is a greater chance of two ingest client choosing the same seed for the PRNG.

        By default, continuous ingest writes the following

        • row = <63 bit random>
        • cf = <15 bit random>
        • cq = <15 bit random>
        • val = <ingest client uuid>:<pointer to another row> (there are some other fields in the value)

        I was thinking row 5 would be deleted. Also if this were the case it would have left a dangling pointer from 3 to 5, but its not.

        Show
        Keith Turner added a comment - The example is completely wrong. I forgot about columns and was only thinking of random rows. I was thinking that row 5 would be deleted, but thats not the case because B would most likely write different columns. The chance collision in the row and column space is 1/2^93, I think there is a greater chance of two ingest client choosing the same seed for the PRNG. By default, continuous ingest writes the following row = <63 bit random> cf = <15 bit random> cq = <15 bit random> val = <ingest client uuid>:<pointer to another row> (there are some other fields in the value) I was thinking row 5 would be deleted. Also if this were the case it would have left a dangling pointer from 3 to 5, but its not.

          People

          • Assignee:
            Unassigned
            Reporter:
            Keith Turner
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Development