Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5181

Make it possible to get Python package metadata from an HTML web page in pip_download.py

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: Impala 2.9.0
    • Component/s: Infrastructure
    • Labels:
      None
    • Epic Color:
      ghx-label-1

      Description

      Currently pip_download.py allows retrieving Python package metadata only from a JSON file, for example https://pypi.python.org/pypi/pyparsing/json. This is a problem because some PYPI mirrors might not implement this interface.

      The data in the JSON file should also be accessible through a web interface - for example, https://pypi.python.org/simple/pyparsing/

      pip_download.py should be able to parse the web page and extract the information we need.

        Activity

        Hide
        tarasbob Taras Bobrovytsky added a comment -
        commit 4a79c9e7e3928f919b5fb60bab4145ba886d6252
        Author: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
        Date:   Thu Mar 30 13:08:21 2017 -0700
        
            IMPALA-5181: Extract PYPI metadata from a webpage
            
            There were some build failures due to a failure to download a JSON file
            containing package metadata from PYPI. We need to switch to downloading
            this from a PYPI mirror. In order to be able to download the metadata
            from a PYPI mirror, we need be able to extract the data from a web page,
            because PYPI mirrors do not always have a JSON interface.
            
            We implement a regex based html parser in this patch. Also, we increase
            the number of download attempts and randomly vary the amount of time
            between each attempt.
            
            Testing:
            - Tested locally against PYPI and a PYPI mirror.
            - Ran a private build that passed (which used a PYPI mirror).
            
            Change-Id: If3845a0d5f568d4352e3cc4883596736974fd7de
            Reviewed-on: http://gerrit.cloudera.org:8080/6579
            Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
            Tested-by: Impala Public Jenkins
        
        Show
        tarasbob Taras Bobrovytsky added a comment - commit 4a79c9e7e3928f919b5fb60bab4145ba886d6252 Author: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Date: Thu Mar 30 13:08:21 2017 -0700 IMPALA-5181: Extract PYPI metadata from a webpage There were some build failures due to a failure to download a JSON file containing package metadata from PYPI. We need to switch to downloading this from a PYPI mirror. In order to be able to download the metadata from a PYPI mirror, we need be able to extract the data from a web page, because PYPI mirrors do not always have a JSON interface . We implement a regex based html parser in this patch. Also, we increase the number of download attempts and randomly vary the amount of time between each attempt. Testing: - Tested locally against PYPI and a PYPI mirror. - Ran a private build that passed (which used a PYPI mirror). Change-Id: If3845a0d5f568d4352e3cc4883596736974fd7de Reviewed-on: http: //gerrit.cloudera.org:8080/6579 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins

          People

          • Assignee:
            tarasbob Taras Bobrovytsky
            Reporter:
            tarasbob Taras Bobrovytsky
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development