Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-36024

Switch the datasource example due to the depreciation of the dataset

    XMLWordPrintableJSON

    Details

    • Type: Documentation
    • Status: Open
    • Priority: Trivial
    • Resolution: Unresolved
    • Affects Version/s: 3.1.2
    • Fix Version/s: None
    • Component/s: Documentation
    • Labels:
      None

      Description

      The S3 bucket that used for an example in "Integration with Cloud Infrastructures" document will be deleted on Jul 1, 2021 https://registry.opendata.aws/landsat-8/ 

      The dataset will move to another bucket but it requires `--request-payer requester` option so users have to pay S3 cost. https://registry.opendata.aws/usgs-landsat/

       

      So I think it's better to change the datasource like this.

      https://github.com/yoda-mon/spark/commit/cdb24acdbb57a429e5bf1729502653b91a600022

       

      I chose [NYC Taxi data| https://registry.opendata.aws/nyc-tlc-trip-records-pds/] here for an example.
      Unlike landat data it's not compressed, but it is just an example and there are several tutorials using Spark  (e.g. https://github.com/aws-samples/amazon-eks-apache-spark-etl-sample)

       

      Reed test result

      scala> sc.textFile("s3a://nyc-tlc/misc/taxi _zone_lookup.csv").take(10).foreach(println) "LocationID","Borough","Zone","service_zone" 1,"EWR","Newark Airport","EWR" 2,"Queens","Jamaica Bay","Boro Zone" 3,"Bronx","Allerton/Pelham Gardens","Boro Zone" 4,"Manhattan","Alphabet City","Yellow Zone" 5,"Staten Island","Arden Heights","Boro Zone" 6,"Staten Island","Arrochar/Fort Wadsworth","Boro Zone" 7,"Queens","Astoria","Boro Zone" 8,"Queens","Astoria Park","Boro Zone" 9,"Queens","Auburndale","Boro Zone"
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                yoda-mon Leona Yoda
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: