Uploaded image for project: 'Bigtop'
  1. Bigtop
  2. BIGTOP-1586

BigPetStore-Spark only works on the East Coast .

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.0
    • Fix Version/s: 1.0.0
    • Component/s: blueprints
    • Labels:
      None

      Description

      Yup, its true. i think

      When visiting my parents in oklahoma I found that the way bigpetstore-spark is set up, only people on the right coast can run the unit tests... something with the default time zone setup in java and the unit tests which test for set equivalence.

      s
      Failed to match [Lscala.Tuple5;@7ac2933e and [Lscala.Tuple5;@7c510a68
      missing (Store(5,11553),Location(11553,Uniondale,NY),Customer(999,Cesareo,Lamplough,20152),Location(20152,Chantilly,VA),TransactionProduct(999,32,5,java.util.GregorianCalendar[time=1446530891000,areFieldsSet=true,areAllFieldsSet=false,lenient=true,zone=sun.util.calendar.ZoneInfo[id="America/Chicago",offset=-21600000,dstSavings=3600000,useDaylight=true,transitions=235,lastRule=java.util.SimpleTimeZone[id=America/Chicago,offset=-21600000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]],firstDayOfWeek=1,minimalDaysInFirstWeek=1,ERA=1,YEAR=2015,MONTH=10,WEEK_OF_YEAR=45,WEEK_OF_MONTH=1,DAY_OF_MONTH=3,DAY_OF_YEAR=307,DAY_OF_WEEK=3,DAY_OF_WEEK_IN_MONTH=1,AM_PM=0,HOUR=0,HOUR_OF_DAY=0,MINUTE=8,SECOND=11,MILLISECOND=0,ZONE_OFFSET=-21600000,DST_OFFSET=0],category=dry dog food;brand=Happy Pup;flavor=Fish & Potato;size=30.0;per_unit_cost=2.67;))
      ... not found= (Store(5,11553),Location(11553,Uniondale,NY),Customer(999,Cesareo,Lamplough,20152),Location(20152,Chantilly,VA),TransactionProduct(999,32,5,java.util.GregorianCalendar[time=?,areFieldsSet=false,areAllFieldsSet=true,lenient=true,zone=sun.util.calendar.ZoneInfo[id="GMT",offset=0,dstSavings=0,useDaylight=false,transitions=0,lastRule=null],firstDayOfWeek=1,minimalDaysInFirstWeek=1,ERA=1,YEAR=2015,MONTH=9,WEEK_OF_YEAR=2,WEEK_OF_MONTH=2,DAY_OF_MONTH=12,DAY_OF_YEAR=5,DAY_OF_WEEK=2,DAY_OF_WEEK_IN_MONTH=1,AM_PM=0,HOUR=6,HOUR_OF_DAY=4,MINUTE=29,SECOND=46,MILLISECOND=0,ZONE_OFFSET=0,DST_OFFSET=0],category=dry dog food;brand=Happy Pup;flavor=Fish & Potato;size=30.0;per_unit_cost=2.67;))
      ... not found= (Store(1,98110),Location(98110,Bainbridge Islan,WA),Customer(999,Cesareo,Lamplough,20152),Location(20152,Chantilly,VA),TransactionProduct(999,31,1,java.util.GregorianCalendar[time=?,areFieldsSet=false,areAllFieldsSet=true,lenient=true,zone=sun.util.calendar.ZoneInfo[id="GMT",offset=0,dstSavings=0,useDaylight=false,transitions=0,lastRule=null],firstDayOfWeek=1,minimalDaysInFirstWeek=1,ERA=1,YEAR=2015,MONTH=10,WEEK_OF_YEAR=2,WEEK_OF_MONTH=2,DAY_OF_MONTH=3,DAY_OF_YEAR=5,DAY_OF_WEEK=2,DAY_OF_WEEK_IN_MONTH=1,AM_PM=0,HOUR=6,HOUR_OF_DAY=1,MINUTE=8,SECOND=11,MILLISECOND=0,ZONE_OFFSET=0,DST_OFFSET=0],category=poop bags;brand=Dog Days;color=Blue;size=60.0;per_unit_cost=0.21;))
      ... not found= (Store(6,66067),Location(66067,Ottawa,KS),Customer(999,Cesareo,Lamplough,20152),Location(20152,Chantilly,VA),TransactionProduct
      

      I've got a patch coming in which has some more general improvements to it, and in also some removal of the monads in the unit tests to make it easier to debug/run.

      I'll submit it shortly once i fully fix the time zone issue.

      Even though i love the east coast, I have a dream that anyone, regardless of the coast they are on, race, religion, or creed, can use spark to generate petabytes of fake data !

      1. BIGTOP-1586.patch
        6 kB
        jay vyas
      2. BIGTOP-1586.patch
        14 kB
        jay vyas
      3. diff.patch
        3 kB
        jay vyas
      4. dirty.patch
        18 kB
        jay vyas

        Activity

        Hide
        jayunit100 jay vyas added a comment - - edited

        I think the issue is that the case class needs a explicity tostring, like this...

        case class TransactionProduct(customerId: Long, transactionId: Long,
          storeId: Long, dateTime: Calendar, product: String){
          override def toString():String = {
            customerId + "," + transactionId+","+storeId+","+dateTime.getTime()+","+product;
          }
        }
        

        the root cause is that it appears that we cant rely on the gregorian time zone toString . Any thoughts RJ Nowling ?

        Show
        jayunit100 jay vyas added a comment - - edited I think the issue is that the case class needs a explicity tostring, like this... case class TransactionProduct(customerId: Long, transactionId: Long, storeId: Long, dateTime: Calendar, product: String){ override def toString():String = { customerId + "," + transactionId+","+storeId+","+dateTime.getTime()+","+product; } } the root cause is that it appears that we cant rely on the gregorian time zone toString . Any thoughts RJ Nowling ?
        Hide
        rnowling RJ Nowling added a comment -

        Hi jay vyas,

        The Calendar objects in Java are a PITA. They contain many fields, which are populated via the current system properties. There is no way to override them in the constructor but you can set them through the API. And not all of these fields are written out when you convert the Calendar to a String to be able to write it out. For example, in the unit tests, I set the millisecond to 0 because the Calendar object will use the milliseconds of the current system time and there are no milliseconds values for the time data in the BigPetStore output, which is generated by Calendar's own `toString` method.

        This is further complicated by how I wrote the unit test. I do a simple equality comparison on the case classes, which does an equality comparison on the Calendar objects.

        I think we need to clean up the unit tests first to make debugging easier than figure out the correct solution. For the unit tests, we should compare every field individually instead of the Transaction objects themselves. E.g., does `transaction1.store == transaction2.store`, etc. For the date/time, we should not directly compare the Calendar objects but just the day, month, year, hour, minute, second, and time zone fields. The transaction comparisons should be wrapped in a utility method so we can call it from other unit tests and standardize it across the test suite.

        Once that's done, we can figure out where the problem is with date/time parsing.

        What do you think?

        Show
        rnowling RJ Nowling added a comment - Hi jay vyas , The Calendar objects in Java are a PITA. They contain many fields, which are populated via the current system properties. There is no way to override them in the constructor but you can set them through the API. And not all of these fields are written out when you convert the Calendar to a String to be able to write it out. For example, in the unit tests, I set the millisecond to 0 because the Calendar object will use the milliseconds of the current system time and there are no milliseconds values for the time data in the BigPetStore output, which is generated by Calendar's own `toString` method. This is further complicated by how I wrote the unit test. I do a simple equality comparison on the case classes, which does an equality comparison on the Calendar objects. I think we need to clean up the unit tests first to make debugging easier than figure out the correct solution. For the unit tests, we should compare every field individually instead of the Transaction objects themselves. E.g., does `transaction1.store == transaction2.store`, etc. For the date/time, we should not directly compare the Calendar objects but just the day, month, year, hour, minute, second, and time zone fields. The transaction comparisons should be wrapped in a utility method so we can call it from other unit tests and standardize it across the test suite. Once that's done, we can figure out where the problem is with date/time parsing. What do you think?
        Hide
        jayunit100 jay vyas added a comment - - edited

        okay. so in summary In this next upgrade to bigpetstore (which will be general cleanup) I'll also fix this bug.

        • we will keep the text dates in the datagenerator, as its good for us to exemplify how to deal with textual dates, even though some places like twitter use epochs to make life easier.
        • cleanup unit tests to not use === . I'll get something simple working for this.
        • switch to Joda time. I think thats the standard in other projects which attempt to simplify date processing (like datafu and also bigpetstore-mapreduce ).

        and After this next modification; I promise to start adding new features RJ Nowling

        Show
        jayunit100 jay vyas added a comment - - edited okay. so in summary In this next upgrade to bigpetstore (which will be general cleanup) I'll also fix this bug. we will keep the text dates in the datagenerator, as its good for us to exemplify how to deal with textual dates, even though some places like twitter use epochs to make life easier. cleanup unit tests to not use === . I'll get something simple working for this. switch to Joda time. I think thats the standard in other projects which attempt to simplify date processing (like datafu and also bigpetstore-mapreduce ). and After this next modification; I promise to start adding new features RJ Nowling
        Hide
        rnowling RJ Nowling added a comment -

        Sounds like a good plan. I'd be happy to review.

        Show
        rnowling RJ Nowling added a comment - Sounds like a good plan. I'd be happy to review.
        Hide
        jayunit100 jay vyas added a comment - - edited

        heres a quick cut of a patch. any initial thoughts RJ Nowling (or anyone else)

        Show
        jayunit100 jay vyas added a comment - - edited heres a quick cut of a patch. any initial thoughts RJ Nowling (or anyone else)
        Hide
        rnowling RJ Nowling added a comment -

        Thanks jay vyas! I gave the patch a quick scan – I think this will definitely help us debug date / time issues more easily. I'll apply it and give it a deeper look tomorrow,

        Show
        rnowling RJ Nowling added a comment - Thanks jay vyas ! I gave the patch a quick scan – I think this will definitely help us debug date / time issues more easily. I'll apply it and give it a deeper look tomorrow,
        Hide
        jayunit100 jay vyas added a comment -

        sure !

        but FYI, in intereste of time... it might not even apply properly... might be better for you at this stage, to just take a good look and ill work out all the details of how to rebase it and make it apply/test perfectly..

        Show
        jayunit100 jay vyas added a comment - sure ! but FYI, in intereste of time... it might not even apply properly... might be better for you at this stage, to just take a good look and ill work out all the details of how to rebase it and make it apply/test perfectly..
        Hide
        jayunit100 jay vyas added a comment -

        for review (ignore the commit message) RJ Nowling

        Show
        jayunit100 jay vyas added a comment - for review (ignore the commit message) RJ Nowling
        Hide
        rnowling RJ Nowling added a comment -

        Thanks, jay vyas! I'll look at it tomorrow.

        Show
        rnowling RJ Nowling added a comment - Thanks, jay vyas ! I'll look at it tomorrow.
        Hide
        rnowling RJ Nowling added a comment -

        Comments on patch:

        1. Import of SparkETL in ETLSuite causes an import warning in the build. I'm not sure it's necessary since the test suite and SparkETL are in the same package.

        2. Customers, locations, products in ETLSuite were reordered, making it look like a lot of changes when there isn't. Hard to see what was actually changed. I also think you commented out a line which is a duplicate of a line a few lines above (customers).

        3. Not a fan of removing the Locations arrays – you end up with duplications which makes it harder to change things.

        4. Why were the additional calendars removed?

        5. We should add more tests for dateTime in the equality method in TransactionProduct. Please check timezone, year, month, day, hour, minutes, seconds. If any of those fail, then we can fix the date/time parsing code. I'm assuming the failures may not be in those fields but it will help us debug the code.

        6. I think the approach used in the ParseRawData test is hard to read. Can it be cleaned up? I think it would be better to define a series of methods like compare(Transaction, Transaction), compare(Store, Store), compare(Calendar, Calendar), etc. that call each other recursively. More verbose but easier to understand. Or define equals() methods for each case class.

        Show
        rnowling RJ Nowling added a comment - Comments on patch: 1. Import of SparkETL in ETLSuite causes an import warning in the build. I'm not sure it's necessary since the test suite and SparkETL are in the same package. 2. Customers, locations, products in ETLSuite were reordered, making it look like a lot of changes when there isn't. Hard to see what was actually changed. I also think you commented out a line which is a duplicate of a line a few lines above (customers). 3. Not a fan of removing the Locations arrays – you end up with duplications which makes it harder to change things. 4. Why were the additional calendars removed? 5. We should add more tests for dateTime in the equality method in TransactionProduct. Please check timezone, year, month, day, hour, minutes, seconds. If any of those fail, then we can fix the date/time parsing code. I'm assuming the failures may not be in those fields but it will help us debug the code. 6. I think the approach used in the ParseRawData test is hard to read. Can it be cleaned up? I think it would be better to define a series of methods like compare(Transaction, Transaction), compare(Store, Store), compare(Calendar, Calendar), etc. that call each other recursively. More verbose but easier to understand. Or define equals() methods for each case class.
        Hide
        rnowling RJ Nowling added a comment -

        Alternative minimal subset of a patch:

        Define a series of methods like compare(Transaction, Transaction), compare(Store, Store), compare(Calendar, Calendar), etc. that call each other recursively. equals() methods don't use === so ScalaTest won't show you which fields are different. Please check timezone, year, month, day, hour, minutes, seconds. If any of those fail, then we can fix the date/time parsing code. I'm assuming the failures may not be in those fields but it will help us debug the code.

        Show
        rnowling RJ Nowling added a comment - Alternative minimal subset of a patch: Define a series of methods like compare(Transaction, Transaction), compare(Store, Store), compare(Calendar, Calendar), etc. that call each other recursively. equals() methods don't use === so ScalaTest won't show you which fields are different. Please check timezone, year, month, day, hour, minutes, seconds. If any of those fail, then we can fix the date/time parsing code. I'm assuming the failures may not be in those fields but it will help us debug the code.
        Hide
        jayunit100 jay vyas added a comment - - edited

        okay,,, thanks RJ Nowling for looking at this guy — i submitted this patch without actually thinking too much about the review process. really its just a hack to get the tests working and clean up some things that were bugging me.

        let me make a modular, clean, precise patch that fixes nothing but the bare minimum requirements you've specified and resubmit.
        also im thinking w/ this i'll create a follow on jira for adding the functional stuff/cleaning up data structures if that seems necessary in places

        Show
        jayunit100 jay vyas added a comment - - edited okay,,, thanks RJ Nowling for looking at this guy — i submitted this patch without actually thinking too much about the review process. really its just a hack to get the tests working and clean up some things that were bugging me. let me make a modular, clean, precise patch that fixes nothing but the bare minimum requirements you've specified and resubmit. also im thinking w/ this i'll create a follow on jira for adding the functional stuff/cleaning up data structures if that seems necessary in places
        Hide
        rnowling RJ Nowling added a comment -

        jay vyas I think separating out tests for debugging Calendar details versus functional refactor is a great idea. I'd be happy to review the second JIRA as well.

        Show
        rnowling RJ Nowling added a comment - jay vyas I think separating out tests for debugging Calendar details versus functional refactor is a great idea. I'd be happy to review the second JIRA as well.
        Hide
        jayunit100 jay vyas added a comment -

        so, heres a quick an simple, direct patch for this...

        RJ Nowling i think this is all we really need to fix it.

        shall we go with this approach ( see the diff.patch file ) ?

        I can add other ideas changes in subsequent later patches.

        Show
        jayunit100 jay vyas added a comment - so, heres a quick an simple, direct patch for this... RJ Nowling i think this is all we really need to fix it. shall we go with this approach ( see the diff.patch file ) ? I can add other ideas changes in subsequent later patches.
        Hide
        rnowling RJ Nowling added a comment -

        jay vyas It's clean, simple, and improves the previous tests. Do you want to add comparisons for years, timezone, and seconds? And how did you find the bug? Set your machine's clock to a different timezone? I'd like to see where the error is occurring with this new test.

        Show
        rnowling RJ Nowling added a comment - jay vyas It's clean, simple, and improves the previous tests. Do you want to add comparisons for years, timezone, and seconds? And how did you find the bug? Set your machine's clock to a different timezone? I'd like to see where the error is occurring with this new test.
        Hide
        jayunit100 jay vyas added a comment -

        Just set your machine time zone to OKLAHOMA time .... that will break it. Attaching final patch shortly !

        Show
        jayunit100 jay vyas added a comment - Just set your machine time zone to OKLAHOMA time .... that will break it. Attaching final patch shortly !
        Hide
        jayunit100 jay vyas added a comment - - edited

        Here's a final patch for review.

        • added a couple of minor comments and formatting fixes, but otherwise there are no major changes... I can remove those if we really want to.
        • removed millisecond settings: They complicate the test unnecessary and are not something we care about.

        Otherwise the patch is minimal and allows our users in the cornfields and hipsters on the west coast to run bigpetstore-spark unit tests without fear of failure !

        RJ Nowling final thoughts ? then a commiter can come in and give a final official +1/-1 based on your feedback pretty easily....

        Show
        jayunit100 jay vyas added a comment - - edited Here's a final patch for review. added a couple of minor comments and formatting fixes, but otherwise there are no major changes... I can remove those if we really want to. removed millisecond settings: They complicate the test unnecessary and are not something we care about. Otherwise the patch is minimal and allows our users in the cornfields and hipsters on the west coast to run bigpetstore-spark unit tests without fear of failure ! RJ Nowling final thoughts ? then a commiter can come in and give a final official +1/-1 based on your feedback pretty easily....
        Hide
        evans_ye Evans Ye added a comment -

        Hi jay vyas
        It looks like 2 lines of TransactionProduct are dominated by space but there're no changes on them.
        The rest of part looks good.
        I don't have machine to test now.
        If you have tested it and it works, Here's my +1

        Show
        evans_ye Evans Ye added a comment - Hi jay vyas It looks like 2 lines of TransactionProduct are dominated by space but there're no changes on them. The rest of part looks good. I don't have machine to test now. If you have tested it and it works, Here's my +1
        Hide
        jayunit100 jay vyas added a comment -

        commited... rj just create a follow on JIRA if any other updates needed !

        next stop : SparkSQL !

        Show
        jayunit100 jay vyas added a comment - commited... rj just create a follow on JIRA if any other updates needed ! next stop : SparkSQL !
        Hide
        jayunit100 jay vyas added a comment -

        btw folks, i tested this by changing timezone on my computer and running tests, reproducing the failure, and then modifying the tests and they worked. anyone in a funny time zone would be great to see if it works correctly now

        Show
        jayunit100 jay vyas added a comment - btw folks, i tested this by changing timezone on my computer and running tests, reproducing the failure, and then modifying the tests and they worked. anyone in a funny time zone would be great to see if it works correctly now
        Hide
        jayunit100 jay vyas added a comment - - edited

        btw folks, i tested this by changing timezone on my computer and running tests, reproducing the failure, and then modifying the tests and they worked. anyone in a funny time zone, interested in this issue, would be great feedback to see if it works correctly now for you as well

        Show
        jayunit100 jay vyas added a comment - - edited btw folks, i tested this by changing timezone on my computer and running tests, reproducing the failure, and then modifying the tests and they worked. anyone in a funny time zone, interested in this issue, would be great feedback to see if it works correctly now for you as well

          People

          • Assignee:
            jayunit100 jay vyas
            Reporter:
            jayunit100 jay vyas
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development