Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.20.0
    • Component/s: io, regionserver
    • Labels:
      None

      Description

      Consider supporting:

      • 2GB store files
      • 1TB per node (500 store files)
      • Cell values up to ~100MB
      • Typical use case of RS running with 1GB of heap only

      Some ideas:

      • Drop MapFile and make a custom store file format with (competing) design goals:
        • heap efficiency
        • fast lookups
        • minimize I/O operations
        • optimize for typical DFS blocksizes (8MB, 64MB)
      • MRU cache for filehandles and store file indexes
      • Memory mapped store file indexes – don't hold the indexes in heap; rely on the OS blockcache for performance
      • "Zero copy" I/O from IPC to store file and vice versa, like NIO buffers

        Activity

        Hide
        stack added a comment -

        Good stuff Andrew. I concur. Lets take on these goals. Shall we put up a wiki page pointing to this issue that talks of rearchitecting project? and maybe belong in a working document of their own. Perhaps a page up on wiki – like http://wiki.apache.org/hadoop/Hbase/NewFileFormat – with perhaps a link under roadmap or maybe we need an architectural goals section where we stable these targets (and X-them out as we knock them off)?

        Show
        stack added a comment - Good stuff Andrew. I concur. Lets take on these goals. Shall we put up a wiki page pointing to this issue that talks of rearchitecting project? and maybe belong in a working document of their own. Perhaps a page up on wiki – like http://wiki.apache.org/hadoop/Hbase/NewFileFormat – with perhaps a link under roadmap or maybe we need an architectural goals section where we stable these targets (and X-them out as we knock them off)?
        Hide
        Andrew Purtell added a comment -

        Hi Stack. I was thinking of setting up subtasks under this issue in part. A page on the wiki would be good also. I'll set one up if someone doesn't get to it first.

        Show
        Andrew Purtell added a comment - Hi Stack. I was thinking of setting up subtasks under this issue in part. A page on the wiki would be good also. I'll set one up if someone doesn't get to it first.
        Hide
        stack added a comment -

        Bunch of us discussed the zero-copy above this evening at HUG6. In particular remove HStoreKey and just do byte array all the ways into the HFile and then on the way out, carry the HFile key and value all the ways out to IPC. Will make an issue once I have better handle on it.

        Show
        stack added a comment - Bunch of us discussed the zero-copy above this evening at HUG6. In particular remove HStoreKey and just do byte array all the ways into the HFile and then on the way out, carry the HFile key and value all the ways out to IPC. Will make an issue once I have better handle on it.
        Hide
        Andrew Purtell added a comment -

        0.20.0 did enough of this.

        Show
        Andrew Purtell added a comment - 0.20.0 did enough of this.

          People

          • Assignee:
            Unassigned
            Reporter:
            Andrew Purtell
          • Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development