Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-17125

Inconsistent result when use filter to read data

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0.0, 3.0.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change
    • Release Note:
      Marked Scan and Get's setMaxVersions() and setMaxVersions(int) as deprecated. They are easy to misunderstand with column family's max versions, so use readAllVersions() and readVersions(int) instead.

      Description

      Assume a cloumn's max versions is 3, then we write 4 versions of this column. The oldest version doesn't remove immediately. But from the user view, the oldest version has gone. When user use a filter to query, if the filter skip a new version, then the oldest version will be seen again. But after compact the region, then the oldest version will never been seen. So it is weird for user. The query will get inconsistent result before and after region compaction.

      The reason is matchColumn method of UserScanQueryMatcher. It first check the cell by filter, then check the number of versions needed. So if the filter skip the new version, then the oldest version will be seen again when it is not removed.

      Have a discussion offline with Duo Zhang and Honghua Feng, now we have two solution for this problem. The first idea is check the number of versions first, then check the cell by filter. As the comment of setFilter, the filter is called after all tests for ttl, column match, deletes and max versions have been run.

        /**
         * Apply the specified server-side filter when performing the Query.
         * Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
         * for ttl, column match, deletes and max versions have been run.
         * @param filter filter to run on the server
         * @return this for invocation chaining
         */
        public Query setFilter(Filter filter) {
          this.filter = filter;
          return this;
        }
      

      But this idea has another problem, if a column's max version is 5 and the user query only need 3 versions. It first check the version's number, then check the cell by filter. So the cells number of the result may less than 3. But there are 2 versions which don't read anymore.

      So the second idea has three steps.
      1. check by the max versions of this column
      2. check the kv by filter
      3. check the versions which user need.
      But this will lead the ScanQueryMatcher more complicated. And this will break the javadoc of Query.setFilter.

      Now we don't have a final solution for this problem. Suggestions are welcomed.

        Attachments

        1. 17125-slack-13.txt
          45 kB
          Ted Yu
        2. example.diff
          4 kB
          Guanghao Zhang
        3. HBASE-17125.master.001.patch
          12 kB
          Guanghao Zhang
        4. HBASE-17125.master.002.patch
          13 kB
          Guanghao Zhang
        5. HBASE-17125.master.002.patch
          13 kB
          Guanghao Zhang
        6. HBASE-17125.master.003.patch
          28 kB
          Guanghao Zhang
        7. HBASE-17125.master.004.patch
          29 kB
          Guanghao Zhang
        8. HBASE-17125.master.005.patch
          29 kB
          Guanghao Zhang
        9. HBASE-17125.master.006.patch
          30 kB
          Guanghao Zhang
        10. HBASE-17125.master.007.patch
          62 kB
          Guanghao Zhang
        11. HBASE-17125.master.008.patch
          62 kB
          Guanghao Zhang
        12. HBASE-17125.master.009.patch
          66 kB
          Guanghao Zhang
        13. HBASE-17125.master.009.patch
          65 kB
          Guanghao Zhang
        14. HBASE-17125.master.010.patch
          65 kB
          Guanghao Zhang
        15. HBASE-17125.master.011.patch
          65 kB
          Guanghao Zhang
        16. HBASE-17125.master.011.patch
          65 kB
          Guanghao Zhang
        17. HBASE-17125.master.012.patch
          51 kB
          Guanghao Zhang
        18. HBASE-17125.master.013.patch
          40 kB
          Guanghao Zhang
        19. HBASE-17125.master.014.patch
          33 kB
          Guanghao Zhang
        20. HBASE-17125.master.015.patch
          33 kB
          Guanghao Zhang
        21. HBASE-17125.master.016.patch
          34 kB
          Guanghao Zhang
        22. HBASE-17125.master.017.patch
          34 kB
          Guanghao Zhang
        23. HBASE-17125.master.018.patch
          35 kB
          Guanghao Zhang
        24. HBASE-17125.master.019.patch
          32 kB
          Guanghao Zhang
        25. HBASE-17125.master.020.patch
          35 kB
          Guanghao Zhang
        26. HBASE-17125.master.020.patch
          35 kB
          Guanghao Zhang
        27. HBASE-17125.master.021.patch
          35 kB
          Guanghao Zhang
        28. HBASE-17125.master.022.patch
          35 kB
          Guanghao Zhang
        29. HBASE-17125.master.checkReturnedVersions.patch
          44 kB
          Guanghao Zhang
        30. HBASE-17125.master.no-specified-filter.patch
          39 kB
          Guanghao Zhang

          Issue Links

            Activity

              People

              • Assignee:
                zghaobac Guanghao Zhang
                Reporter:
                zghaobac Guanghao Zhang
              • Votes:
                0 Vote for this issue
                Watchers:
                19 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: