Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24623

Hadoop - Spark Cluster - Python XGBoost - Not working in distributed mode

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Incomplete
    • Affects Version/s: 2.1.1
    • Fix Version/s: None
    • Component/s: Deploy
    • Labels:
    • Environment:

      Hadoop - Hortonworks Cluster

       

      Total Nodes - 18

      Worker Nodes - 13

      Description

      Hi

      We recently installed python on the Hadoop cluster with lot of data science python modules including xgboost , spicy , scikit learn , pandas
      Using pyspark the data scientists are able to test there scoring models in the distributed mode on the Hadoop cluster. But with python - xgboost the pyspark job is not getting distributed and it is trying to run only on one instance.
      we are trying to achieve the distributed mode when using python xgboost via pyspark.
      It would be a great help if you can direct me on how to achieve this.

      Thanks,
      Abhishek

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              abhishek.chamakura Abhishek Reddy Chamakura
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: