Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4059 Pig on Spark
  3. PIG-5192

Remove schema tuple reference overhead for replicate join hashmap in POFRJoinSpark

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • spark-branch
    • spark
    • None

    Description

      Currently even if pig.schematuple is set to false which is the default, the usage of TupleToMapKey and TuplesToSchemaTupleList instead of plain HashMap<Object, ArrayList<Tuple>> costs a lot of memory. Also key is currently converted to a tuple which is unnecessary. Detail see PIG-4874

      Attachments

        Activity

          People

            Unassigned Unassigned
            kellyzly liyunzhang
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: