Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-2208

Error message must be updated when query contains operations on a flattened column

    XMLWordPrintableJSON

Details

    Description

      Currently i observe that if there is a flatten/kvgen operation applied on a column, no further operations can be performed on the said column unless it is wrapped inside a nested query.

      Consider a simple flatten/kvgen operation on a complex JSON file :

      > select flatten(kvgen(f.`people`)) as p from `factbook/world.json` f limit 1;
      ------------

      p

      ------------

      {"key":"languages","value":{"text":"Mandarin Chinese 12.44%, Spanish 4.85%, English 4.83%, Arabic 3.25%, Hindi 2.68%, Bengali 2.66%, Portuguese 2.62%, Russian 2.12%, Japanese 1.8%, Standard German 1.33%, Javanese 1.25% (2009 est.)","note_1":"percents are for \"first language\" speakers only; the six UN languages - Arabic, Chinese (Mandarin), English, French, Russian, and Spanish (Castilian) - are the mother tongue or second language of about half of the world's population, and are the official languages in more than half the states in the world; some 150 to 200 languages have more than a million speakers","note_2":"all told, there are an estimated 7,100 languages spoken in the world; aproximately 80% of these languages are spoken by less than 100,000 people; about 50 languages are spoken by only 1 person; communities that are isolated from each other in mountainous regions often develop multiple languages; Papua New Guinea, for example, boasts about 836 separate languages","note_3":"approximately 2,300 languages are spoken in Asia, 2,150, in Africa, 1,311 in the Pacific, 1,060 in the Americas, and 280 in Europe"}}
      {"key":"religions","value":{"text":"Christian 33.39% (of which Roman Catholic 16.85%, Protestant 6.15%, Orthodox 3.96%, Anglican 1.26%), Muslim 22.74%, Hindu 13.8%, Buddhist 6.77%, Sikh 0.35%, Jewish 0.22%, Baha'i 0.11%, other religions 10.95%, non-religious 9.66%, atheists 2.01% (2010 est.)"}}
      {"key":"population","value":{"text":"7,095,217,980 (July 2013 est.)","top_ten_most_populous_countries_in_millions":"China 1,349.59; India 1,220.80; United States 316.67; Indonesia 251.16; Brazil 201.01; Pakistan 193.24; Nigeria 174.51; Bangladesh 163.65; Russia 142.50; Japan 127.25"}}
      {"key":"age_structure","value":{"0_14_years":"26% (male 953,496,513/female 890,372,474)","15_24_years":"16.8% (male 614,574,389/female 579,810,490)","25_54_years":"40.6% (male 1,454,831,900/female 1,426,721,773)","55_64_years":"8.4% (male 291,435,881/female 305,185,398)","65_years_and_over":"8.2% (male 257,035,416/female 321,753,746) (2013 est.)"}}
      {"key":"dependency_ratios","value":{"total_dependency_ratio":"52 %","youth_dependency_ratio":"39.9 %","elderly_dependency_ratio":"12.1 %","potential_support_ratio":"8.3 (2013)"}}

      ------------

      Adding a WHERE clause with conditions on this column fails:

      > select flatten(kvgen(f.`people`)) as p from `factbook/world.json` f where f.p.`key` = 'languages';
      Query failed: RemoteRpcException: Failure while running fragment., languages [ 686bcd40-c23b-448c-93d8-b98a3b092657 on abhi5.qa.lab:31010 ]
      [ 686bcd40-c23b-448c-93d8-b98a3b092657 on abhi5.qa.lab:31010 ]
      Error: exception while executing query: Failure while executing query. (state=,code=0)

      Logs indicate a NumberFormat Exception in the above case.

      And query fails to parse in the below case

      > select flatten(kvgen(f.`people`)).`value` as p from `factbook/world.json` f limit 5;
      Query failed: ParseException: Encountered "." at line 1, column 34.
      Was expecting one of:
      "FROM" ...
      "," ...
      "AS" ...
      ....
      ....
      "OVER" ...
      Error: exception while executing query: Failure while executing query. (state=,code=0)

      Rewriting using an inner query succeeds:

      select g.p.`value`.`note_3` from (select flatten(kvgen(f.`people`)) as p from `factbook/world.json` f) g where g.p.`key`='languages';
      ------------

      EXPR$0

      ------------

      approximately 2,300 languages are spoken in Asia, 2,150, in Africa, 1,311 in the Pacific, 1,060 in the Americas, and 280 in Europe

      ------------

      *In both the failure cases the error message needs to be updated to indicate that the operation is not supported. The current error message and logs are not clear for an end user. *

      Attachments

        1. drillbit_flatten.log
          63 kB
          Abhishek Girish

        Issue Links

          Activity

            People

              jaltekruse Jason Altekruse
              agirish Abhishek Girish
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: