Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-2806

Querying data from compressed csv file returns nulls and unreadable data

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • 0.9.0
    • 1.0.0
    • Storage - Text & CSV
    • None

    Description

      Project columns from a compressed CSV data file returns unreadable data and nulls in the query results. Querying the same CSV file in uncompressed format, the query returns correct results, readable data and no nulls. Test was performed on 4 node cluster on CentOS.

      0: jdbc:drill:> select columns[0], columns[1], columns[2], columns[3], columns[4], columns[5], columns[6], columns[7] from `deletions-00000-of-00020.tgz` limit 10;
      +------------+------------+------------+------------+------------+------------+------------+------------+
      |   EXPR$0   |   EXPR$1   |   EXPR$2   |   EXPR$3   |   EXPR$4   |   EXPR$5   |   EXPR$6   |   EXPR$7   |
      +------------+------------+------------+------------+------------+------------+------------+------------+
      | 0U[ˮȑ|axaR)ﺫ=鲍i̊HDJ|?3̑$%Q$%
                                                      TdfD8'2i$E^/Y}C'>|/7
                                                                                        H1o0! | 0g TMUܸW`ʙ&T
                                                                                                                                      \uXپN|2I~Y 0RAX6UaXe+ow*]s | null       | null       | null       | null       | null       | null       |
      | oM.ڻU/ | ̼\
                                 )qwda7((
                                                     	y[) | 9>^0>WM[{r]iE$ze&!EküIfa | null       | null       | null       | null       | null       |
      | SRŒ      | null       | null       | null       | null       | null       | null       | null       |
      | 6imJ\f_dYڿ]%ln3IaE*BGA-a$j:M!Uc)ﶘD~wUx0ɼgme]ӘcQ*pk$%\2ER-)(ÈxTn?SϓxeҜݠºI|'(Cni	s | null       | null       | null       | null       | null       | null       | null       |
      | bxΜkr4ü_nIxl_s`vN	ó.$OL7Eބyڗia;Pu$M!AoCӦnlS-`ۢ+o~>%wzcgwtMge7"lMgZ=WྃgMRX1"a | X=Rd.fab{t{
                                                                                                                                                                                             A!t
                                                                                                                                                                                                     1$ڧw-0EXURg
                                                                                                                                                                                                                            p	#qzߤ΢gWMem{=z{
                                                                                                                                                                                                                                                          eiA]^ | null       | null       | null       | null       | null       | null       |
      | ֌        | null       | null       | null       | null       | null       | null       | null       |
      | !{1H*m71`˰]oZ | 𾳔] &f4Z)4SP7Rm4^5WWXȧ<p.́3L
                                                                                            q%|WL-p[ | null       | null       | null       | null       | null       | null       |
      | dqyd\K#"ԁ@ | null       | null       | null       | null       | null       | null       | null       |
      | [GԊKFlɢ(ZK8h#D/[(U=_8ΏE%
                                                                 [;
                                                                    w}Fr`#Xk
                                                                                    lT'15:y
                                                                                                     ņPz(-ȓ񆹞Cs)1v	 | null       | null       | null       | null       | null       | null       | null       |
      | LyPO|Ώ(+n+H]
                               Ņ2?糩s/_ l
                                                  +ӯb	 | null       | null       | null       | null       | null       | null       | null       |
      +------------+------------+------------+------------+------------+------------+------------+------------+
      10 rows selected (0.176 seconds)
      
      0: jdbc:drill:> select columns[0], columns[1], columns[2], columns[3], columns[4], columns[5], columns[6], columns[7] from `deletions/deletions-00000-of-00020.csv` limit 10;
      +------------+------------+------------+------------+------------+------------+------------+------------+
      |   EXPR$0   |   EXPR$1   |   EXPR$2   |   EXPR$3   |   EXPR$4   |   EXPR$5   |   EXPR$6   |   EXPR$7   |
      +------------+------------+------------+------------+------------+------------+------------+------------+
      | 1354980518007 | /user/mwcl_musicbrainz | 1356247116000 | /user/google_gardener | /m/0nj707g | /music/track_contribution/contributor | /m/09xmq3  | en         |
      | 1359609261000 | /user/ahsan2002us | 1359697206000 | /user/mjsigua | /m/0q47ym9 | /common/topic/description | Afrosheen CEO is the fictional character from the 2003 film The Watermelon Heist. | en         |
      | 1258294630005 | /user/book_bot | 1260214155000 | /user/book_bot | /m/08g19rh | /book/book_edition/book | /m/04sty07 | en         |
      | 1260232964000 | /user/book_bot | 1360880749000 | /user/turtlewax_bot | /m/0872_f2 | /book/book_edition/book | /m/069_gyc | en         |
      | 1320298552000 | /user/gardening_bot | 1358083965004 | /user/googlebot | /m/01dy3t2 | /type/object/type | /music/single | en         |
      | 1360430129006 | /user/mwcl_musicbrainz | 1362830875001 | /user/mwcl_musicbrainz | /m/0qm1x62 | /music/release_track/release | /m/0ql38vr | en         |
      | 1269251105000 | /user/mwcl_images | 1336539194001 | /user/gardening_bot | /m/06w7yw7 | /common/topic/image | /m/0bcncxt | en         |
      | 1225386250001 | /user/mwcl_images | 1336080683003 | /user/gardening_bot | /m/04sb526 | /common/licensed_object/license | /m/02x6b   | en         |
      | 1286991487000 | /user/mw_template_bot | 1362532733000 | /user/wikipedia_facts | /m/0dgs170 | /people/person/date_of_birth | 1975       | en         |
      | 1258986090000 | /user/book_bot | 1260138587000 | /user/book_bot | /m/08r_m33 | /book/book_edition/book | /m/04sty07 | en         |
      +------------+------------+------------+------------+------------+------------+------------+------------+
      10 rows selected (0.25 seconds)
      
      Details of the files (compressed and uncompressed)
      
      [root@centos-01 ~]# hadoop fs -ls /tmp/deletions-00000-of-00020.tgz
      -rwxr-xr-x   3 root root  111364147 2015-04-16 20:35 /tmp/deletions-00000-of-00020.tgz
      [root@centos-01 ~]# hadoop fs -ls /tmp/deletions/deletions-00000-of-00020.csv
      -rwxr-xr-x   3 root root  395624293 2015-04-14 18:10 /tmp/deletions/deletions-00000-of-00020.csv
      
      

      Attachments

        Activity

          People

            sphillips Steven Phillips
            khfaraaz Khurram Faraaz
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: