Pig
  1. Pig
  2. PIG-3671

CONCAT operation bleeds into ToDate, making it ERROR

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Invalid
    • Affects Version/s: 0.12.0
    • Fix Version/s: None
    • Component/s: internal-udfs
    • Labels:
      None
    • Environment:

      Moonlight on the ocean last night was pretty.

      Description

      date_and_time = LOAD 'date_and_time.txt' AS (date:chararray, time:chararray);
      date_time_concat = FOREACH date_and_time GENERATE CONCAT(CONCAT(date, ' '), time) AS date_time;
      date_time_problem = FOREACH date_time_concat GENERATE ToDate(date_time) AS date_time:datetime;
      dump date_time_problem

      g.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing [POUserFunc (Name: POUserFunc(org.apache.pig.builtin.ToDateISO)[datetime] - scope-12 Operator Key: scope-12) children: null at []]: java.lang.IllegalArgumentException: Invalid format: "#date time"
      at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:338)
      at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:378)
      at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:298)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:282)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
      at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
      at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
      Caused by: java.lang.IllegalArgumentException: Invalid format: "#date time"
      at org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:683)
      at org.apache.pig.builtin.ToDate.extractDateTime(ToDate.java:124)
      at org.apache.pig.builtin.ToDateISO.exec(ToDateISO.java:38)
      at org.apache.pig.builtin.ToDateISO.exec(ToDateISO.java:31)
      at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:330)
      at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextDateTime(POUserFunc.java:422)
      at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:329)

      1. date_and_time.txt
        2 kB
        Russell Jurney
      2. date_time_bug.pig
        0.3 kB
        Russell Jurney

        Activity

        Hide
        Russell Jurney added a comment -

        Attaching date and time file for reproducing bug.

        Show
        Russell Jurney added a comment - Attaching date and time file for reproducing bug.
        Hide
        Russell Jurney added a comment -

        This file reproduces the bug with the attached file.

        Show
        Russell Jurney added a comment - This file reproduces the bug with the attached file.
        Hide
        Russell Jurney added a comment -

        Note: Workaround is to STORE the entire relation after the CONCAT, and LOAD it as a datetime, or load it as a chararray and apply ToDate() without a problem.

        Thus the plan is borked.

        Show
        Russell Jurney added a comment - Note: Workaround is to STORE the entire relation after the CONCAT, and LOAD it as a datetime, or load it as a chararray and apply ToDate() without a problem. Thus the plan is borked.
        Hide
        Russell Jurney added a comment -

        Also note: StringConcat, if done on the PREVIOUS line (not the current one), will not bleed through:

        bluecoat_datetime = FOREACH bluecoat GENERATE StringConcat(date, 'T', time, 'Z') AS date_time, *;
        blucoat_datetime = FOREACH bluecoat_datetime GENERATE ToDate(date_time) AS date_time, *;
        STORE bluecoat_datetime INTO '../../data/bluecoat_datetime.log';

        • THIS WORKS *

        bluecoat_datetime = FOREACH bluecoat GENERATE ToDate(StringConcat(date, 'T', time, 'Z')) AS date_time, *;

        • THIS FAILS *
        Show
        Russell Jurney added a comment - Also note: StringConcat, if done on the PREVIOUS line (not the current one), will not bleed through: bluecoat_datetime = FOREACH bluecoat GENERATE StringConcat(date, 'T', time, 'Z') AS date_time, *; blucoat_datetime = FOREACH bluecoat_datetime GENERATE ToDate(date_time) AS date_time, *; STORE bluecoat_datetime INTO '../../data/bluecoat_datetime.log'; THIS WORKS * bluecoat_datetime = FOREACH bluecoat GENERATE ToDate(StringConcat(date, 'T', time, 'Z')) AS date_time, *; THIS FAILS *
        Hide
        Russell Jurney added a comment -

        This is not a bug. There was a header line in my text file.

        HA

        Show
        Russell Jurney added a comment - This is not a bug. There was a header line in my text file. HA

          People

          • Assignee:
            Daniel Dai
            Reporter:
            Russell Jurney
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development