Pig
  1. Pig
  2. PIG-2348

Bloom should be able to take a relation or a file

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.11
    • Fix Version/s: None
    • Component/s: internal-udfs
    • Labels:
      None

      Description

      Currently Bloom requires the user to have stored the result of a previous query using BuildBloom to an HDFS file before it can be used. This means the user must submit an "exec" between this store and the filter that uses Bloom if they are to be used in the same script.

      If Bloom could take a relation as its first input (ala the relation cast to scalar) then users would not need to put an exec in their script or manage a storage location on HDFS.

      Sometimes storing the results in the file makes sense, so we don't want to remove the current behavior, just add another option.

        Issue Links

          Activity

          Hide
          Daniel Dai added a comment -

          PIG-2348-0.patch is a partial patch. We also need to address backward compatibility, so it can work both in scalar and a distributed cache file mode.

          Show
          Daniel Dai added a comment - PIG-2348 -0.patch is a partial patch. We also need to address backward compatibility, so it can work both in scalar and a distributed cache file mode.
          Hide
          Dmitriy V. Ryaboy added a comment -

          Daniel, can you give an example of a script that worked with PIG-2328 but breaks after this patch?

          Show
          Dmitriy V. Ryaboy added a comment - Daniel, can you give an example of a script that worked with PIG-2328 but breaks after this patch?
          Hide
          Daniel Dai added a comment -

          In the patch, I get rid of the construct Bloom(String filename), which make the old way (save BuildBloom result into a file, pass to Bloom) not working. I think we can retain that construct to maintain backward compatibility.

          Show
          Daniel Dai added a comment - In the patch, I get rid of the construct Bloom(String filename), which make the old way (save BuildBloom result into a file, pass to Bloom) not working. I think we can retain that construct to maintain backward compatibility.
          Hide
          Julien Le Dem added a comment -

          This will go in a future version

          Show
          Julien Le Dem added a comment - This will go in a future version

            People

            • Assignee:
              Alan Gates
              Reporter:
              Alan Gates
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:

                Development