Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-15491

Failures are masked/swallowed in GenericUDTFJSONTuple::process().

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 2.1.1
    • None
    • None
    • None

    Description

      I draw your attention to the following piece of code in GenericUDTFJSONTuple::process():

        @Override
        public void process(Object[] o) throws HiveException {
        ...
              for (int i = 0; i < numCols; ++i) {
              if (retCols[i] == null) {
                retCols[i] = cols[i]; // use the object pool rather than creating a new object
              }
              Object extractObject = ((Map<String, Object>)jsonObj).get(paths[i]);
              if (extractObject instanceof Map || extractObject instanceof List) {
                retCols[i].set(MAPPER.writeValueAsString(extractObject));
              } else if (extractObject != null) {
                retCols[i].set(extractObject.toString());
              } else {
                retCols[i] = null;
              }
            }
            forward(retCols);
            return;
          } catch (Throwable e) {  <================= Yikes.
            LOG.error("JSON parsing/evaluation exception" + e);
            forward(nullCols);
          }
        }
      

      The error-handling here seems suspect. Judging from the error message, the intention here seems to be to catch JSON-specific errors arising from MAPPER.readValue() and MAPPER.writeValueAsString(). By catching Throwable, this code masks the errors that arise from the call to forward(retCols).

      I just ran into this in production. A user with a nearly exhausted HDFS quota attempted to use json_tuple to extract fields from json strings in his data. The data turned out to have large record counts and the query used over 25K mappers. Every task failed to create a RecordWriter, thanks to the exhausted quota. But the thrown exception was swallowed in the code above. process() ignored the failure for the record and proceeded to the next one. Eventually, this resulted in DDoS-ing the name-node.

      I'll have a patch for this shortly.

      Attachments

        1. HIVE-15491.patch
          1 kB
          Mithun Radhakrishnan

        Activity

          People

            mithun Mithun Radhakrishnan
            mithun Mithun Radhakrishnan
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: