Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4340

PigStorage fails parsing empty map.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 0.15.0
    • impl
    • None
    • Reviewed

    Description

      I've found that PigStorage doesn't parse empty maps properly.
      I'm using pig-0.11.0-cdh4.4.0, but reading the source code, it would be reproduced in the later versions.

      An empty map in a field of a tuple is parsed as null.

      test.txt
      empty []
      nonempty [foo#bar]
      
      test.pig
      A = LOAD '/tmp/test.txt' USING PigStorage(' ') AS (a:chararray, b:map[chararray]);
      DUMP A;
      
      $ pig test.pig
      ...
      (empty,)
      (nonempty,[foo#bar])
      

      Moreover, if the empty map is nested in a parent field, the entire field is interpreted as null.

      test-nested.txt
      empty (f1,[])
      nonempty (f1,[foo#bar])
      
      test.pig
      A = LOAD '/tmp/test.txt' USING PigStorage(' ') AS (a:chararray, (b:chararray, b:map[chararray]));
      DUMP A;
      
      $ pig test.pig
      ...
      (empty,)
      (nonempty,(f1,[foo#bar]))
      

      Investigating this, I've found it is because Utf8StorageConverter#consumeMap throws IOException when it receives empty map as string '[]'. It seems like always assuming there should be a content of map, more specifically '#' character.

          private Map<String, Object> consumeMap(PushbackInputStream in, ResourceFieldSchema fieldSchema) throws IOException {
              int buf;
              
              while ((buf=in.read())!='[') {
                  if (buf==-1) {
                      throw new IOException("Unexpect end of map");
                  }
              }
              HashMap<String, Object> m = new HashMap<String, Object>();
              ByteArrayOutputStream mOut = new ByteArrayOutputStream(BUFFER_SIZE);
              while (true) {
                  // Read key (assume key can not contains special character such as #, (, [, {, }, ], )
                  while ((buf=in.read())!='#') {
                      if (buf==-1) {
                          throw new IOException("Unexpect end of map");
                      }
                      mOut.write(buf);
                  }
                  String key = bytesToCharArray(mOut.toByteArray());
                  if (key.length()==0)
                      throw new IOException("Map key can not be null");
      

      I would appreciate if you could fix this problem.
      Thanks.

      Attachments

        1. PIG-4340-1.patch
          2 kB
          Daniel Dai

        Activity

          People

            daijy Daniel Dai
            akirakiron Akira Murashita
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: