Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25462

hive on spark - got a weird output when count(*) from this script

    XMLWordPrintableJSON

Details

    • Question
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • 1.6.2
    • None
    • SQL
    • None
    • spark 1.6.2

      hive 1.2.2

      hadoop 2.7.1

    Description

       

      use hiveContext to exec a script below:

      with nt as (select label, score from (select * from (select label, score, row_number() over (order by score desc) as position from t1)t_1 join (select count as countall from t1)t_2 )ta where position <= countall * 0.4) select count as c_positive from nt where label = 1

      and i got this result.

      it is weird when call the 'count()' func on rdd and dataframe,

      as the pic says: different output here....

      can someone help me out? thanks a lot!!!!

       

      PS: the parquet file i used is the 'test.gz.parquet' in Attachments.

       

      Attachments

        1. jira.png
          871 kB
          Gu Yuchen
        2. test.gz.parquet
          1 kB
          Gu Yuchen

        Activity

          People

            Unassigned Unassigned
            guyc Gu Yuchen
            Jeremy Jeremy
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: