Accumulo
  1. Accumulo
  2. ACCUMULO-387

Support map reduce directly over files

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4.0
    • Component/s: None
    • Labels:
      None

      Description

      Support map reduce jobs that directly read Accumulo files.

        Activity

        Hide
        Keith Turner added a comment -

        I think this is much easier now that we have clone table. I think the following needs to be done :

        • Clone table (pass in options to disable writes and major compactions on clone)
        • Run map reduce over files referenced by clone
        • Delete clone

        Would need a special input format that instantiates the iterator stack in the mapper for each tablet. Doing this instead of reading the files directly is important for the following reasons.

        • The iterator stack will properly process updates and deletes that were made
        • The iterator stack will only read the data in a file that falls within a tablet. This is important because tablets can reference files that contain data outside of a tablet, data that could have been deleted in another tablet. Using the iterator stack will prevent this from happening.
        Show
        Keith Turner added a comment - I think this is much easier now that we have clone table. I think the following needs to be done : Clone table (pass in options to disable writes and major compactions on clone) Run map reduce over files referenced by clone Delete clone Would need a special input format that instantiates the iterator stack in the mapper for each tablet. Doing this instead of reading the files directly is important for the following reasons. The iterator stack will properly process updates and deletes that were made The iterator stack will only read the data in a file that falls within a tablet. This is important because tablets can reference files that contain data outside of a tablet, data that could have been deleted in another tablet. Using the iterator stack will prevent this from happening.
        Hide
        Keith Turner added a comment -

        For mapper locality can use a tablets last location, I think clone copies this. Alternatively could choose node with majority of blocks for tablets files.

        Show
        Keith Turner added a comment - For mapper locality can use a tablets last location, I think clone copies this. Alternatively could choose node with majority of blocks for tablets files.
        Hide
        Jason Rutherglen added a comment -

        +1 Hive currently runs slowly on HBase, a major drawback. However I wonder if the tablet files should be 'force' compacted first, to ensure sequential reads of map reduce jobs.

        Show
        Jason Rutherglen added a comment - +1 Hive currently runs slowly on HBase, a major drawback. However I wonder if the tablet files should be 'force' compacted first, to ensure sequential reads of map reduce jobs.
        Hide
        Joey Echeverria added a comment -

        I would only force compact if you're going to run over the clone multiple times. Otherwise you're going to pay the same cost of reading the non-sequential data, plus writing it back out, plus reading the sequential copy. I don't see how that's a net win.

        Show
        Joey Echeverria added a comment - I would only force compact if you're going to run over the clone multiple times. Otherwise you're going to pay the same cost of reading the non-sequential data, plus writing it back out, plus reading the sequential copy. I don't see how that's a net win.
        Hide
        Keith Turner added a comment -

        I am thinking the clone operation will be left up the user to give them the flexibility that you mentioned. The input format will just work against a table. So you could clone and compact if you plan to read the table multiple times. You could also compact and then clone, this will save disk space. Just need to properly document how to clone a table such that automatic compactions are disabled.

        Show
        Keith Turner added a comment - I am thinking the clone operation will be left up the user to give them the flexibility that you mentioned. The input format will just work against a table. So you could clone and compact if you plan to read the table multiple times. You could also compact and then clone, this will save disk space. Just need to properly document how to clone a table such that automatic compactions are disabled.
        Hide
        Keith Turner added a comment -

        Something we may want to consider for 1.5 if this catches on is making RFile splittable. This would help in the case you mentioned where you compact tablets down to one file.

        Show
        Keith Turner added a comment - Something we may want to consider for 1.5 if this catches on is making RFile splittable. This would help in the case you mentioned where you compact tablets down to one file.
        Hide
        Keith Turner added a comment - - edited

        This input format could run against offline tables. It does not care if you clone or not, but it will only start if the table is offline. This is easy to achieve, just clone the table and take it offline. This is simpler than trying to adjust settings to disable compactions and writes, setting that may change over time.

        One draw back with this approach is that the current code to take a table offline is async. It starts a table going offline, but does not wait for it to happen. The inputformat could probably get around this pretty easily. It could check that the table states is offline and then wait for there to be no locations in the metadata table. Once there are no locations it could start computing input splits.

        Show
        Keith Turner added a comment - - edited This input format could run against offline tables. It does not care if you clone or not, but it will only start if the table is offline. This is easy to achieve, just clone the table and take it offline. This is simpler than trying to adjust settings to disable compactions and writes, setting that may change over time. One draw back with this approach is that the current code to take a table offline is async. It starts a table going offline, but does not wait for it to happen. The inputformat could probably get around this pretty easily. It could check that the table states is offline and then wait for there to be no locations in the metadata table. Once there are no locations it could start computing input splits.
        Hide
        Keith Turner added a comment -

        We could add an option to clone table to start a cloned table off in the offline state. So we never even bother loading the tablets for the clone. This would be a fairly simple change.

        One other nice thing about this approach is that tables in the offline state can be deleted. So the cloned table never has to come online. It basically ends up being a snapshot of the tablets files that prevents garbage collection of those files.

        Show
        Keith Turner added a comment - We could add an option to clone table to start a cloned table off in the offline state. So we never even bother loading the tablets for the clone. This would be a fairly simple change. One other nice thing about this approach is that tables in the offline state can be deleted. So the cloned table never has to come online. It basically ends up being a snapshot of the tablets files that prevents garbage collection of those files.
        Hide
        Keith Turner added a comment -

        I was looking at the current input format code to determine whats the best way to share code with this new direct file input format. I realized a good solution may be to create a new Scanner that reads directly from files. Then the current accumulo input format could just use this new scanner. So a new input format would not be needed. However, specialization would be needed in the computation of input splits. One nice thing about doing this way is that the change can be made in InputFormatBase and then AccumuloRowInputFormat and AccumuloInputFormat both get the ability to read directly over files.

        This scanner would also be useful for any non mapreduce application that wanted to read the files of a table directly. One possible way to achieve this is with an API like the following. Calls to create batch writers and batch scanners could throw unsupported operations exceptions or just pass through to the wrapped instance. Of course the batch scanner could be supported directly on files too, but I would not implement this until there is a need for it.

          DirectFileInstance instance = new DirectFileInstance(new ZooKeeperInstance(...));
        
          //the call below creates a scanner that will read directly from a tables files
          //this could check that the table (and tablets in the range) are offline  
          Scanner scanner = instance.getConnector(...).createScanner(...);
        
        
        Show
        Keith Turner added a comment - I was looking at the current input format code to determine whats the best way to share code with this new direct file input format. I realized a good solution may be to create a new Scanner that reads directly from files. Then the current accumulo input format could just use this new scanner. So a new input format would not be needed. However, specialization would be needed in the computation of input splits. One nice thing about doing this way is that the change can be made in InputFormatBase and then AccumuloRowInputFormat and AccumuloInputFormat both get the ability to read directly over files. This scanner would also be useful for any non mapreduce application that wanted to read the files of a table directly. One possible way to achieve this is with an API like the following. Calls to create batch writers and batch scanners could throw unsupported operations exceptions or just pass through to the wrapped instance. Of course the batch scanner could be supported directly on files too, but I would not implement this until there is a need for it. DirectFileInstance instance = new DirectFileInstance(new ZooKeeperInstance(...)); //the call below creates a scanner that will read directly from a tables files //this could check that the table (and tablets in the range) are offline Scanner scanner = instance.getConnector(...).createScanner(...);
        Hide
        Billie Rinaldi added a comment -

        ACCUMULO-418 is related to this. A table could have RFiles of widely varying sizes, which could make map reduces run poorly unless RFile is splittable.

        Show
        Billie Rinaldi added a comment - ACCUMULO-418 is related to this. A table could have RFiles of widely varying sizes, which could make map reduces run poorly unless RFile is splittable.
        Hide
        jv added a comment -

        We need to make sure that we have the documentation regarding security clear enough to let users know that they need read access to the files in order to do the file access, both map reduce and local. Also we should look into integrating the authenticator / authorizer into this scanner if possible.

        Show
        jv added a comment - We need to make sure that we have the documentation regarding security clear enough to let users know that they need read access to the files in order to do the file access, both map reduce and local. Also we should look into integrating the authenticator / authorizer into this scanner if possible.

          People

          • Assignee:
            Keith Turner
            Reporter:
            Keith Turner
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development