Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-3877

Getting Geo Latitude/Longitude from Address Lines

Details

    • Improvement
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 0.10.1
    • 0.10.1
    • piggybank
    • patch

    Description

      In many datasets mining use cases, it is needed to get latitude, longitude just from address lines.The IP fields are missing.
      The Attached udfs for getting the geo latitude/longitude on address lines.

      Attachments

        1. PIG-3877.1.patch
          9 kB
          Rekha Joshi

        Activity

          rekhajoshm Rekha Joshi added a comment -

          Attached patch.

          rekhajoshm Rekha Joshi added a comment - Attached patch.
          rekhajoshm Rekha Joshi added a comment -

          Attached patch.

          rekhajoshm Rekha Joshi added a comment - Attached patch.
          daijy Daniel Dai added a comment -

          Thanks rekhajoshm, can you add tests and javadoc?

          daijy Daniel Dai added a comment - Thanks rekhajoshm , can you add tests and javadoc?
          mrflip Flip Kromer added a comment -
          • This makes separate HTTP calls for the latitude, then the longitude. Better to have one method that returns a tuple prepared from the fully-parsed reponse and let the caller project what they want.
          • What happens on a response that fails to geocode or for any other reason doesn't have a latLng element? the JSONObject latLng = (JSONObject) ((JSONObject)locations.get(0)).get("latLng"); geolongitude = (String) latLng.get("lng"); sequence feels like a recipe for NPE.
          • Is the intuit backend ready for people who might use this in production? Or even for apache and the world's automated build systems to hit it without standing as abusive?
          • I worry about having Pig make a network call on every record. There's no facility for throttling, backoff, or HTTP keep-alive.
          • Even with those, the only way I can imagine to make this workable at production scale using an over-the-network geocoder would be to deploy an instance on each machine. Pete Warden's Data Science Toolkit has a Standalone Geocoder; this should target that and refer to it (or acceptable alternative) in the docs.
          mrflip Flip Kromer added a comment - This makes separate HTTP calls for the latitude, then the longitude. Better to have one method that returns a tuple prepared from the fully-parsed reponse and let the caller project what they want. What happens on a response that fails to geocode or for any other reason doesn't have a latLng element? the JSONObject latLng = (JSONObject) ((JSONObject)locations.get(0)).get("latLng"); geolongitude = (String) latLng.get("lng"); sequence feels like a recipe for NPE. Is the intuit backend ready for people who might use this in production? Or even for apache and the world's automated build systems to hit it without standing as abusive? I worry about having Pig make a network call on every record. There's no facility for throttling, backoff, or HTTP keep-alive. Even with those, the only way I can imagine to make this workable at production scale using an over-the-network geocoder would be to deploy an instance on each machine. Pete Warden's Data Science Toolkit has a Standalone Geocoder ; this should target that and refer to it (or acceptable alternative) in the docs.

          People

            rekhajoshm Rekha Joshi
            rekhajoshm Rekha Joshi
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: