Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14159

sorting of tuple array using multiple field[s]

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.3.0
    • Component/s: UDF
    • Labels:
    • Flags:
      Patch, Important

      Description

      Problem Statement:

      When we are working with complex structure of data like avro.
      Most of the times we are encountering array contains multiple tuples and each tuple have struct schema.

      Suppose here struct schema is like below:

      {
      	"name": "employee",
      	"type": [{
      		"type": "record",
      		"name": "Employee",
      		"namespace": "com.company.Employee",
      		"fields": [{
      			"name": "empId",
      			"type": "int"
      		}, {
      			"name": "empName",
      			"type": "string"
      		}, {
      			"name": "age",
      			"type": "int"
      		}, {
      			"name": "salary",
      			"type": "double"
      		}]
      	}]
      }
      
      

      Then while running our hive query complex array looks like array of employee objects.

      Example: 
      	//(array<struct<empId,empName,age,salary>>)
      	Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)]
      
      

      When we are implementing business use cases day to day life we are encountering problems like sorting a tuple array by specific field[s] like empId,name,salary,etc by ASC or DESC order.

      Proposal:

      I have developed a udf 'sort_array_by' which will sort a tuple array by one or more fields in ASC or DESC order provided by user ,default is ascending order .

      Example:
      	1.Select sort_array_by(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary","ASC");
      	output: array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)]
      	
      	2.Select sort_array_by(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","ASC");
      	output: array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
      
      	3.Select sort_array_by(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age,"ASC");
      	output: array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
      

        Attachments

        1. HIVE-14159.1.patch
          27 kB
          Simanchal Das
        2. HIVE-14159.2.patch
          29 kB
          Simanchal Das
        3. HIVE-14159.3.patch
          70 kB
          Simanchal Das
        4. HIVE-14159.4.patch
          62 kB
          Simanchal Das

          Issue Links

            Activity

              People

              • Assignee:
                simanchal Simanchal Das
                Reporter:
                simanchal Simanchal Das
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: