Uploaded image for project: 'Falcon'
  1. Falcon
  2. FALCON-1149

The 'today' EL date expression is resolving to yesterday's date, for process instance input feed ranges

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Cannot Reproduce
    • 0.5, 0.6
    • 0.6.1
    • None
    • None
    • HDP 2.1 sandbox, HDP 2.2 sandbox; server in UTC

    Description

      Steps to reproduce
      1. Submit a cluster named 'sandbox':

      <cluster colo="local" description="Sandbox Cluster" name="sandbox" xmlns="uri:falcon:cluster:0.1">
        <interfaces>
          <interface type="readonly" endpoint="hftp://sandbox.hortonworks.com:50070" version="2.2.0" />
          <interface type="write" endpoint="hdfs://sandbox.hortonworks.com:8020" version="2.2.0" />
          <interface type="execute" endpoint="sandbox.hortonworks.com:8050" version="2.2.0" />
          <interface type="workflow" endpoint="http://sandbox.hortonworks.com:11000/oozie/" version="4.0.0" />
          <interface type="messaging" endpoint="tcp://sandbox.hortonworks.com:61616?daemon=true" version="5.1.6" />
        </interfaces>
        <locations>
          <location name="staging" path="/apps/falcon/sandbox/staging" />
          <location name="temp" path="/tmp" />
          <location name="working" path="/apps/falcon/sandbox/working" />
        </locations>
      </cluster>
      

      2. Submit a feed f1:

      <feed name="f1" description="f1" xmlns="uri:falcon:feed:0.1">
        <frequency>days(1)</frequency>
        <timezone>UTC</timezone>
        <late-arrival cut-off="hours(48)" />
        <clusters>
          <cluster name="sandbox" type="source">
            <validity start="2013-01-01T13:00Z" end="2099-12-31T13:00Z" />
            <retention limit="months(9999)" action="delete" />
          </cluster>
        </clusters>
        <locations>
          <location type="data"
            path="/f1/${YEAR}/${MONTH}/${DAY}" />
        </locations>
        <ACL owner="ambari-qa" group="users" permission="0775" />
        <schema location="/none" provider="none" />
      </feed>
      

      3. Submit a process p1:

      <process name="p1" xmlns="uri:falcon:process:0.1">
        <clusters>
          <cluster name="sandbox">
            <validity start="<TODAY>T08:30Z" end="2099-12-31T00:00Z"/>
          </cluster>
        </clusters>
        <parallel>1</parallel>
        <order>FIFO</order>
        <frequency>days(1)</frequency>
        <outputs>
          <output name="output" feed="f1" instance="today(0,0)" />
        </outputs>
        <properties>
        </properties>
        <workflow name="p1-wf" engine="oozie" path="/apps/p1" />
        <retry policy="periodic" delay="minutes(60)" attempts="24" />
      </process>
      

      4. Submit a feed f2:

      <feed name="f2" description="f2" xmlns="uri:falcon:feed:0.1">
        <frequency>days(1)</frequency>
        <timezone>UTC</timezone>
        <late-arrival cut-off="hours(48)" />
        <clusters>
          <cluster name="sandbox" type="source">
            <validity start="2013-01-01T13:00Z" end="2099-12-31T13:00Z" />
            <retention limit="months(9999)" action="delete" />
          </cluster>
        </clusters>
        <locations>
          <location type="data"
            path="/f2/${YEAR}/${MONTH}/${DAY}" />
        </locations>
        <ACL owner="ambari-qa" group="users" permission="0775" />
        <schema location="/none" provider="none" />
      </feed>
      

      5. Submit a process p2:

      <process name="p2" xmlns="uri:falcon:process:0.1">
        <clusters>
          <cluster name="sandbox">
            <validity start="<TODAY>T08:30Z" end="2099-12-31T00:00Z"/>
          </cluster>
        </clusters>
        <parallel>1</parallel>
        <order>FIFO</order>
        <frequency>days(1)</frequency>
        <inputs>
          <input name="input" feed="f1" start="today(0,0)" end="today(0,0)" />
        </inputs>
        <outputs>
          <output name="output" feed="f2" instance="today(0,0)" />
        </outputs>
        <workflow name="p2-wf" engine="oozie" path="/apps/p2" />
        <retry policy="periodic" delay="minutes(60)" attempts="24" />
      </process>
      

      6. Note that:

      • Process p1 has no input feed (the data is fetched from some other location by p1).
      • Feed f1 is referenced in the output of p1, and also referenced in the input of p2.
      • All feeds are daily, and process input feed ranges and output feeds are daily, by way of the 'today(0,0)' EL expression.

      7. Finally, schedule all feeds and processes after 08:30Z on a given day, 'today'..

      Expected:
      1. The first scheduled instance for p1 proceeds to COMPLETED, and produces a partition in f1 for 'today'
      2. The first scheduled instance for p2 proceeds to COMPLETED, and produces a partition in f2 for 'today', since it looks for and finds a corresponding partition for 'today' in f1.

      Actual:
      1. The first scheduled instance for p1 proceeds to COMPLETED, and produces a partition in f1 for 'today'
      2. However, the first scheduled instance for p2 is left in WAITING state, since it is looking for a partition in f1 for 'yesterday', which does not exist (and will never exist).

      I am currently working around this unexpected behaviour by specifying the input feed range start and end for p2 as 'today(24,0)' instead of 'today(0,0)'

      Please advise if this is indeed a) a bug or b) a mistake in the configuration.

      Many thanks,

      Attachments

        Activity

          People

            ajayyadava Ajay Yadav
            alza Alex C
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: