Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22113

Dataset shows in Hive is inconsistent with JDBC

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • 2.2.0
    • None
    • SQL
    • None
    • version 2.2.0

    Description

      I am trying to query data from Hive in spark. According spark-sql document, there're two ways to do this:
      The first way is Init session with enableHiveSupport

      SparkSession session = SparkSession.builder().enableHiveSupport().getOrCreate();
      session.sql("select dw_date from tfdw.dwd_dim_date limit 10").show();
      

      the dataset shows the correct result

      The second way is through JDBC

      Dataset<Row> ds = session.read()
                        .format("jdbc")
                        .option("driver", "org.apache.hive.jdbc.HiveDriver")
                        .option("url", "jdbc:hive2://iZ11syxr6afZ:21050/;auth=noSasl")
                        .option("dbtable", "tfdw.dwd_dim_date")
                        .load();
      ds.select("dw_date").limit(10).show();
      

      But the dataset only show the column name in the result rather than the data in the column

      The two pictures should be consistent I think. Any outstanding I missed ? Many thanks!

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            Michael Fu Michael Fu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment