Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-885

More efficient SQL queries for DBInputFormat

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      DBInputFormat generates InputSplits by counting the available rows in a table, and selecting subsections of the table via the "LIMIT" and "OFFSET" SQL keywords. These are only meaningful in an ordered context, so the query also includes an "ORDER BY" clause on an index column. The resulting queries are often inefficient and require full table scans. Actually using multiple mappers with these queries can lead to O(n^2) behavior in the database, where n is the number of splits. Attempting to use parallelism with these queries is counter-productive.

      A better mechanism is to organize splits based on data values themselves, which can be performed in the WHERE clause, allowing for index range scans of tables, and can better exploit parallelism in the database.

      1. MAPREDUCE-885.2.patch
        56 kB
        Aaron Kimball
      2. MAPREDUCE-885.3.patch
        55 kB
        Aaron Kimball
      3. MAPREDUCE-885.4.patch
        54 kB
        Aaron Kimball
      4. MAPREDUCE-885.5.patch
        63 kB
        Aaron Kimball
      5. MAPREDUCE-885.6.patch
        64 kB
        Aaron Kimball
      6. MAPREDUCE-885.patch
        67 kB
        Aaron Kimball

        Issue Links

          Activity

          Aaron Kimball created issue -
          Aaron Kimball made changes -
          Field Original Value New Value
          Attachment MAPREDUCE-885.patch [ 12416936 ]
          Aaron Kimball made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Aaron Kimball made changes -
          Link This issue blocks MAPREDUCE-907 [ MAPREDUCE-907 ]
          Aaron Kimball made changes -
          Attachment MAPREDUCE-885.2.patch [ 12417692 ]
          Aaron Kimball made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Aaron Kimball made changes -
          Attachment MAPREDUCE-885.3.patch [ 12417849 ]
          Aaron Kimball made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Aaron Kimball made changes -
          Attachment MAPREDUCE-885.4.patch [ 12418033 ]
          Aaron Kimball made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Aaron Kimball made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Assignee Aaron Kimball [ kimballa ] zhuweimin [ chinashuimin ]
          Aaron Kimball made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Aaron Kimball made changes -
          Assignee zhuweimin [ chinashuimin ] Aaron Kimball [ kimballa ]
          Aaron Kimball made changes -
          Attachment MAPREDUCE-885.5.patch [ 12418034 ]
          Aaron Kimball made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Aaron Kimball made changes -
          Attachment MAPREDUCE-885.6.patch [ 12419012 ]
          Aaron Kimball made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Aaron Kimball made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Enis Soztutar made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Fix Version/s 0.21.0 [ 12314045 ]
          Resolution Fixed [ 1 ]
          Tom White made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Gavin made changes -
          Link This issue blocks MAPREDUCE-907 [ MAPREDUCE-907 ]
          Gavin made changes -
          Link This issue is depended upon by MAPREDUCE-907 [ MAPREDUCE-907 ]

            People

            • Assignee:
              Aaron Kimball
              Reporter:
              Aaron Kimball
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development