1. Hive
  2. HIVE-968

map join may lead to very large files


    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.5.0
    • Component/s: Query Processor
    • Labels:
    • Hadoop Flags:


      If the table under consideration is a very large file, it may lead to very large files on the mappers.
      The job may never complete, and the files will never be cleaned from the tmp directory.
      It would be great if the table can be placed in the distributed cache, but minimally the following should be added:

      If the table (source) being joined leads to a very big file, it should just throw an error.
      New configuration parameters can be added to limit the number of rows or for the size of the table.
      The mapper should not be tried 4 times, but it should fail immediately.

      I cant think of any better way for the mapper to communicate with the client, but for it to write in some well known
      hdfs file - the client can read the file periodically (while polling), and if sees an error can just kill the job, but with
      appropriate error messages indicating to the client why the job died.

      1. HIVE-968_2.patch
        22 kB
        Ning Zhang
      2. HIVE-968_3.patch
        39 kB
        Ning Zhang
      3. HIVE-968_4.patch
        45 kB
        Ning Zhang
      4. HIVE-968.patch
        13 kB
        Ning Zhang


        Carl Steinbach made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Namit Jain made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Fix Version/s 0.5.0 [ 12314156 ]
        Resolution Fixed [ 1 ]
        Ning Zhang made changes -
        Attachment HIVE-968_4.patch [ 12427538 ]
        Ning Zhang made changes -
        Attachment HIVE-968_3.patch [ 12427409 ]
        Ning Zhang made changes -
        Attachment HIVE-968_2.patch [ 12427006 ]
        Ning Zhang made changes -
        Attachment HIVE-968.patch [ 12426957 ]
        Ning Zhang made changes -
        Field Original Value New Value
        Assignee Ning Zhang [ nzhang ]
        Namit Jain created issue -


          • Assignee:
            Ning Zhang
            Namit Jain
          • Votes:
            0 Vote for this issue
            0 Start watching this issue


            • Created: