Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16663

String Caching For Rows

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 2.0.1
    • 3.0.0
    • Beeline
    • None
    • Patch

    Description

      It is very common that there are many repeated values in the result set of a query, especially when JOINs are present in the query. As it currently stands, beeline does not attempt to cache any of these values and therefore it consumes a lot of memory.

      Adding a string cache may save a lot of memory. There are organizations that use beeline to perform ETL processing of result sets into CSV. This will better support those organizations.

      Attachments

        1. HIVE-16663.1.patch
          2 kB
          David Mollitor
        2. HIVE-16663.2.patch
          2 kB
          David Mollitor
        3. HIVE-16663.3.patch
          1 kB
          David Mollitor
        4. HIVE-16663.4.patch
          2 kB
          David Mollitor
        5. HIVE-16663.5.patch
          1 kB
          David Mollitor
        6. HIVE-16663.6.patch
          2 kB
          David Mollitor
        7. HIVE-16663.7.patch
          1 kB
          Naveen Gangam
        8. HIVE-16663.7.patch
          1 kB
          David Mollitor

        Activity

          People

            belugabehr David Mollitor
            belugabehr David Mollitor
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: