Hive
  1. Hive
  2. HIVE-968

map join may lead to very large files

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.5.0
    • Component/s: Query Processor
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      If the table under consideration is a very large file, it may lead to very large files on the mappers.
      The job may never complete, and the files will never be cleaned from the tmp directory.
      It would be great if the table can be placed in the distributed cache, but minimally the following should be added:

      If the table (source) being joined leads to a very big file, it should just throw an error.
      New configuration parameters can be added to limit the number of rows or for the size of the table.
      The mapper should not be tried 4 times, but it should fail immediately.

      I cant think of any better way for the mapper to communicate with the client, but for it to write in some well known
      hdfs file - the client can read the file periodically (while polling), and if sees an error can just kill the job, but with
      appropriate error messages indicating to the client why the job died.

      1. HIVE-968.patch
        13 kB
        Ning Zhang
      2. HIVE-968_4.patch
        45 kB
        Ning Zhang
      3. HIVE-968_3.patch
        39 kB
        Ning Zhang
      4. HIVE-968_2.patch
        22 kB
        Ning Zhang

        Activity

        Hide
        Ning Zhang added a comment -

        HIVE-968.patch solves the issue of leaving tmp file created by the mapjoin undeleted when the job is killed. It also introduced a HashMapWrapper class to use main memory HashMap when the number of keys are within a certain threshold. New data are put to persistent JDBM only when the number of entries are larger than the threshold. It should improve mapjoin performance significantly.

        Show
        Ning Zhang added a comment - HIVE-968 .patch solves the issue of leaving tmp file created by the mapjoin undeleted when the job is killed. It also introduced a HashMapWrapper class to use main memory HashMap when the number of keys are within a certain threshold. New data are put to persistent JDBM only when the number of entries are larger than the threshold. It should improve mapjoin performance significantly.
        Hide
        Namit Jain added a comment -

        Can you regenerate the patch ?

        Show
        Namit Jain added a comment - Can you regenerate the patch ?
        Hide
        Ning Zhang added a comment -

        The new HIVE-968_2.patch solves the compilation issue.

        Show
        Ning Zhang added a comment - The new HIVE-968 _2.patch solves the compilation issue.
        Hide
        Namit Jain added a comment -

        Overall looks good, but I had a couple of concerns:

        1.

        285 // commit every 100 rows to prevent Out-of-memory exception
        286 if ( (res.size() % 100 == 0) && recman != null )

        { 287 recman.commit(); 288 }

        Instead of committing every 100 rows in HTree, now we are committing always - wont it have a performance issue ?

        2. Although we have a new cache on top, but we are loosing the MRU property. It should be implemented in the wrapper.

        Show
        Namit Jain added a comment - Overall looks good, but I had a couple of concerns: 1. 285 // commit every 100 rows to prevent Out-of-memory exception 286 if ( (res.size() % 100 == 0) && recman != null ) { 287 recman.commit(); 288 } Instead of committing every 100 rows in HTree, now we are committing always - wont it have a performance issue ? 2. Although we have a new cache on top, but we are loosing the MRU property. It should be implemented in the wrapper.
        Hide
        Ning Zhang added a comment -

        I tested the performance of committing always and per 100 updates, there is no difference I can notice. So that's why I removed this.

        For MRU I'm not sure whether it will lead to better performance or not. The use case in MapJoin is the data is sequentially read, so every entry is read once for each tuple in the LHS. So there is MRU seems useless.

        Show
        Ning Zhang added a comment - I tested the performance of committing always and per 100 updates, there is no difference I can notice. So that's why I removed this. For MRU I'm not sure whether it will lead to better performance or not. The use case in MapJoin is the data is sequentially read, so every entry is read once for each tuple in the LHS. So there is MRU seems useless.
        Hide
        Namit Jain added a comment -

        OK for 1.

        I dont think removing MRU is a good idea. Data is not read sequentially - that depends on the larger table being read by the mapper.
        It includes random reads.

        Show
        Namit Jain added a comment - OK for 1. I dont think removing MRU is a good idea. Data is not read sequentially - that depends on the larger table being read by the mapper. It includes random reads.
        Hide
        Ning Zhang added a comment -

        That's true. But even though the hash table is randomly accessed, I'm not sure if MRU will help with performance here. This wrapper is meant to provide a simple wrapper for the HashMap data structure. MRU added cost to memory consumption as well as CPU cost. I'm assuming most case it will fall into the case where the threshold is not reached. In that case MRU is not useful and wasting resources.

        Show
        Ning Zhang added a comment - That's true. But even though the hash table is randomly accessed, I'm not sure if MRU will help with performance here. This wrapper is meant to provide a simple wrapper for the HashMap data structure. MRU added cost to memory consumption as well as CPU cost. I'm assuming most case it will fall into the case where the threshold is not reached. In that case MRU is not useful and wasting resources.
        Hide
        Namit Jain added a comment -

        I agree it will increase memory consumption (but should be very small - probably one pointer per entry).
        But still, In general, MRU seems like a good idea.

        Of course, it depends on the data - so in absence of statistics, it is difficult to come up with the right answer.
        But, I would still err in favor of MRU

        Show
        Namit Jain added a comment - I agree it will increase memory consumption (but should be very small - probably one pointer per entry). But still, In general, MRU seems like a good idea. Of course, it depends on the data - so in absence of statistics, it is difficult to come up with the right answer. But, I would still err in favor of MRU
        Hide
        Ning Zhang added a comment -

        Attaching a new version HIVE-968_3.patch. Changes from the last version:
        1) two more new files MRU.java and DCLLItem.java implementing an efficient MRU cache replacement policy.
        2) updated HashMapWrapper to use MRU
        3) a unit test for HashMapWrapper

        Show
        Ning Zhang added a comment - Attaching a new version HIVE-968 _3.patch. Changes from the last version: 1) two more new files MRU.java and DCLLItem.java implementing an efficient MRU cache replacement policy. 2) updated HashMapWrapper to use MRU 3) a unit test for HashMapWrapper
        Hide
        Namit Jain added a comment -

        1. some System.out.println's are present in HashMapWrapper.
        2. MapJoinOperator.java: not sure if o.setObj() would work for the persistent case.
        3. HashMapWrapper.java: not sure about the logic in put

        Show
        Namit Jain added a comment - 1. some System.out.println's are present in HashMapWrapper. 2. MapJoinOperator.java: not sure if o.setObj() would work for the persistent case. 3. HashMapWrapper.java: not sure about the logic in put
        Hide
        Ning Zhang added a comment -

        Discussed with Namit offline. Below are the updates:
        1) The System.out.println() is for printing out some debugging info when assertion failed. I'll change it to use LOG.error().
        2) o.setObj() works because whenever an object is got from get, it is guaranteed to be in main memory cache. So setObj will set the object in MRUItem which change the HashMap value. It is an performance optimization and I will add more comments in the code.
        3) there is one issue in the put() code path when key is not in main memory but in persistent hash. fixed that and added a unit test for that case.
        4) changed the JDBM TransactionManager to delete log file if NO_TRANSACTION is set.

        Will upload the patch shortly after the uni tests finish.

        Show
        Ning Zhang added a comment - Discussed with Namit offline. Below are the updates: 1) The System.out.println() is for printing out some debugging info when assertion failed. I'll change it to use LOG.error(). 2) o.setObj() works because whenever an object is got from get, it is guaranteed to be in main memory cache. So setObj will set the object in MRUItem which change the HashMap value. It is an performance optimization and I will add more comments in the code. 3) there is one issue in the put() code path when key is not in main memory but in persistent hash. fixed that and added a unit test for that case. 4) changed the JDBM TransactionManager to delete log file if NO_TRANSACTION is set. Will upload the patch shortly after the uni tests finish.
        Hide
        Ning Zhang added a comment -

        Uploading patch containing the above changes.

        Show
        Ning Zhang added a comment - Uploading patch containing the above changes.
        Hide
        Namit Jain added a comment -

        +1

        looks good - will commit if the tests pass

        Show
        Namit Jain added a comment - +1 looks good - will commit if the tests pass
        Hide
        Namit Jain added a comment -

        Committed. Thanks Ning

        Show
        Namit Jain added a comment - Committed. Thanks Ning

          People

          • Assignee:
            Ning Zhang
            Reporter:
            Namit Jain
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development