Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-1661

AccumuloInputFormat cannot fetch empty column family

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.4.3, 1.5.0
    • Fix Version/s: 1.5.1, 1.6.0
    • Component/s: client
    • Labels:
      None

      Description

      The following fails:

      Job job = new Job();
      HashSet<Pair<Text,Text>> cols = new HashSet<Pair<Text,Text>>();
      cols.add(new Pair<Text,Text>(new Text(""), null));
      AccumuloInputFormat.fetchColumns(job, cols);
      Set<Pair<Text,Text>> setCols = AccumuloInputFormat.getFetchedColumns(job);
      assertEquals(cols.size(), setCols.size());
      
      1. ACCUMULO-1661.1.5.1-SNAPSHOT.patch.txt
        3 kB
        Vikram Srivastava
      2. ACCUMULO-1661.v1.patch.txt
        3 kB
        Vikram Srivastava
      3. ACCUMULO-1661.v2.patch.txt
        3 kB
        Vikram Srivastava

        Activity

        Hide
        medined David Medinets added a comment -

        Can you include the import statements? I am trying to run this test but the Job is not the right object for the fetchColumns call.

        Show
        medined David Medinets added a comment - Can you include the import statements? I am trying to run this test but the Job is not the right object for the fetchColumns call.
        Hide
        vines John Vines added a comment -

        The issue is in Configuration.getStringCollection. That utilizes a tokenizer to split on commas. Unfortunately, when there's an empty COLF, that gets base 64ed to an empty string, which causes the tokenizer to interpret it not as a value but as cruft. Configuration.getStrings has the same behavior, so the only way around it is to not rely on the Configuration helpers, do the String transformations ourselves. Or, we can go a slightly hacky route and just manually check if that property is an empty String (vs. null) to know it's there. However, cases where the empty string is among other options would still probably get lost.

        Show
        vines John Vines added a comment - The issue is in Configuration.getStringCollection. That utilizes a tokenizer to split on commas. Unfortunately, when there's an empty COLF, that gets base 64ed to an empty string, which causes the tokenizer to interpret it not as a value but as cruft. Configuration.getStrings has the same behavior, so the only way around it is to not rely on the Configuration helpers, do the String transformations ourselves. Or, we can go a slightly hacky route and just manually check if that property is an empty String (vs. null) to know it's there. However, cases where the empty string is among other options would still probably get lost.
        Hide
        vickyuec Vikram Srivastava added a comment -

        Newbie question: Do "cols" and "setCols" have to be identical? Javadoc for fetchColumns says "An empty set is the default and is equivalent to scanning the all columns." So can we put a condition that if ColumnFamily == new Text(""), it implies that all CFs are selected? Also, in that case we should add an argument checker that "cols" has a size of 1 only.

        Show
        vickyuec Vikram Srivastava added a comment - Newbie question: Do "cols" and "setCols" have to be identical? Javadoc for fetchColumns says "An empty set is the default and is equivalent to scanning the all columns." So can we put a condition that if ColumnFamily == new Text(""), it implies that all CFs are selected? Also, in that case we should add an argument checker that "cols" has a size of 1 only.
        Hide
        billie.rinaldi Billie Rinaldi added a comment -

        No, fetching "" would mean fetching all key/value pairs that have an empty column family – there is one family in the set of families that is being fetched.

        Show
        billie.rinaldi Billie Rinaldi added a comment - No, fetching "" would mean fetching all key/value pairs that have an empty column family – there is one family in the set of families that is being fetched.
        Hide
        vickyuec Vikram Srivastava added a comment -

        Attached patch. Splitting the string inside InputConfigurator.getFetchedColumns explicitly to include empty strings.

        Show
        vickyuec Vikram Srivastava added a comment - Attached patch. Splitting the string inside InputConfigurator.getFetchedColumns explicitly to include empty strings.
        Hide
        kturner Keith Turner added a comment -

        Would be nice to add a few more test cases. Does the following cover all the cases of empty fam and qual?

            cols.add(new Pair<Text,Text>(new Text(""), null));
            cols.add(new Pair<Text,Text>(new Text("foo"), new Text("bar")));
            cols.add(new Pair<Text,Text>(new Text(""), new Text("bar")));
            cols.add(new Pair<Text,Text>(new Text(""), new Text("")));
            cols.add(new Pair<Text,Text>(new Text("foo"), new Text("")));
        
        Show
        kturner Keith Turner added a comment - Would be nice to add a few more test cases. Does the following cover all the cases of empty fam and qual? cols.add(new Pair<Text,Text>(new Text(""), null)); cols.add(new Pair<Text,Text>(new Text("foo"), new Text("bar"))); cols.add(new Pair<Text,Text>(new Text(""), new Text("bar"))); cols.add(new Pair<Text,Text>(new Text(""), new Text(""))); cols.add(new Pair<Text,Text>(new Text("foo"), new Text("")));
        Hide
        vickyuec Vikram Srivastava added a comment -

        Added more cf:cq combinations and verified empty strings are handled correctly.

        Show
        vickyuec Vikram Srivastava added a comment - Added more cf:cq combinations and verified empty strings are handled correctly.
        Hide
        ecn Eric Newton added a comment -

        Thanks Vikram. Can you provide the patch against 1.5.1-SNAPSHOT so I can patch forward from there?

        Show
        ecn Eric Newton added a comment - Thanks Vikram. Can you provide the patch against 1.5.1-SNAPSHOT so I can patch forward from there?
        Hide
        vickyuec Vikram Srivastava added a comment -

        Attached patch against 1.5.1-SNAPSHOT branch.

        Show
        vickyuec Vikram Srivastava added a comment - Attached patch against 1.5.1-SNAPSHOT branch.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 13eb19c2b92180bc4752d75dd74b76d036eb38e2 in branch refs/heads/1.5.1-SNAPSHOT from Vikram Srivastava
        [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=13eb19c ]

        ACCUMULO-1661 Handle empty column family correctly for AccumuloInputFormat

        Signed-off-by: Eric Newton <eric.newton@gmail.com>

        Show
        jira-bot ASF subversion and git services added a comment - Commit 13eb19c2b92180bc4752d75dd74b76d036eb38e2 in branch refs/heads/1.5.1-SNAPSHOT from Vikram Srivastava [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=13eb19c ] ACCUMULO-1661 Handle empty column family correctly for AccumuloInputFormat Signed-off-by: Eric Newton <eric.newton@gmail.com>
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 13eb19c2b92180bc4752d75dd74b76d036eb38e2 in branch refs/heads/1.6.0-SNAPSHOT from Vikram Srivastava
        [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=13eb19c ]

        ACCUMULO-1661 Handle empty column family correctly for AccumuloInputFormat

        Signed-off-by: Eric Newton <eric.newton@gmail.com>

        Show
        jira-bot ASF subversion and git services added a comment - Commit 13eb19c2b92180bc4752d75dd74b76d036eb38e2 in branch refs/heads/1.6.0-SNAPSHOT from Vikram Srivastava [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=13eb19c ] ACCUMULO-1661 Handle empty column family correctly for AccumuloInputFormat Signed-off-by: Eric Newton <eric.newton@gmail.com>
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 13eb19c2b92180bc4752d75dd74b76d036eb38e2 in branch refs/heads/master from Vikram Srivastava
        [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=13eb19c ]

        ACCUMULO-1661 Handle empty column family correctly for AccumuloInputFormat

        Signed-off-by: Eric Newton <eric.newton@gmail.com>

        Show
        jira-bot ASF subversion and git services added a comment - Commit 13eb19c2b92180bc4752d75dd74b76d036eb38e2 in branch refs/heads/master from Vikram Srivastava [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=13eb19c ] ACCUMULO-1661 Handle empty column family correctly for AccumuloInputFormat Signed-off-by: Eric Newton <eric.newton@gmail.com>

          People

          • Assignee:
            vickyuec Vikram Srivastava
            Reporter:
            billie.rinaldi Billie Rinaldi
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development