Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-3297

[Spark SQL][UI] SchemaRDD toString with many columns messes up Storage tab display

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.0.2
    • 1.1.1, 1.2.0
    • SQL, Web UI

    Description

      When a SchemaRDD with many columns (for example, 57 columns in this example) is cached using sqlContext.cacheTable, the Storage tab of the driver Web UI display gets messed up, because the long string of the SchemaRDD causes the first column to be much much wider than the others, and in fact much wider than the width of the browser. It would be nice to have the first column be restricted to, say, 50% of the width of the browser window, with some minimum.

      For example this is the SchemaRDD text for my table:

      RDD Storage Info for ExistingRdd ActionGeo_ADM1Code#198,ActionGeo_CountryCode#199,ActionGeo_FeatureID#200,ActionGeo_FullName#201,ActionGeo_Lat#202,ActionGeo_Long#203,ActionGeo_Type#204,Actor1Code#205,Actor1CountryCode#206,Actor1EthnicCode#207,Actor1Geo_ADM1Code#208,Actor1Geo_CountryCode#209,Actor1Geo_FeatureID#210,Actor1Geo_FullName#211,Actor1Geo_Lat#212,Actor1Geo_Long#213,Actor1Geo_Type#214,Actor1KnownGroupCode#215,Actor1Name#216,Actor1Religion1Code#217,Actor1Religion2Code#218,Actor1Type1Code#219,Actor1Type2Code#220,Actor1Type3Code#221,Actor2Code#222,Actor2CountryCode#223,Actor2EthnicCode#224,Actor2Geo_ADM1Code#225,Actor2Geo_CountryCode#226,Actor2Geo_FeatureID#227,Actor2Geo_FullName#228,Actor2Geo_Lat#229,Actor2Geo_Long#230,Actor2Geo_Type#231,Actor2KnownGroupCode#232,Actor2Name#233,Actor2Religion1Code#234,Actor2Religion2Code#235,Actor2Type1Code#236,Actor2Type2Code#237,Actor2Type3Code#238,AvgTone#239,DATEADDED#240,Day#241,EventBaseCode#242,EventCode#243,EventId#244,EventRootCode#245,FractionDate#246,GoldsteinScale#247,IsRootEvent#248,MonthYear#249,NumArticles#250,NumMentions#251,NumSources#252,QuadClass#253,Year#254, MappedRDD[200]

      I would personally love to fix the toString method to not necessarily print every column, but to cut it off after a while. This would aid the printout in the Spark Shell as well. For example:

      ActionGeo_ADM1Code#198,ActionGeo_CountryCode#199,ActionGeo_FeatureID#200,ActionGeo_FullName#201,ActionGeo_Lat#202 .... and 52 more columns

      Attachments

        Issue Links

          Activity

            People

              falaki Hossein Falaki
              velvia Evan Chan
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: