August 27, 2015

esProc Simplifies SQL-style Computations– Get Top N Rows of Each Group

The operation of getting the top N rows of data is common, such as the biggest salary raise of each employee, the three lowest scores in playing golf and the five days of each month when each product has its biggest sales. SQL resorts to advanced techniques, like window functions and keep/top/row number, to solve this kind of problem. Therefore the code is quite difficult. And as many databases (like MYSQL) doesn’t support these advanced techniques, they can only realize this operation using complicated JOIN statement and nested subquery. If multi-layered groups or multi-level relationships are involved, the computation will become even more complicated.

Yet esProc’s top function can select the top N rows of data of each group according to the row numbers as well as the maximum and minimum values, thus the computation will be made simpler. The following example will teach you how esProc works.  

golf, a database table, stores scores members have got in playing golf. Please select the three highest scores each member has had. Some of the data are as follows:

esProc Code: 

A1:Select data from the database. Click the cell and you can see the result in detail:

A2:=A1.group(User_ID). The code groups the selected data in A1. Result is as follows:

As shown in the above figure, the selected data have been divided into multiple groups by User_ID, with each row representing a group. Click the hyperlinks in blue and you can see members of each group: 
A3:=A2.(~.top(-Score;3)). The code gets the top three records of Score field of each group. Here “~” represents each group of data and ~.top() represents that top function is used to compute each group of data in order. top function can get the top N records of a data set. For example, top(Score;3) means sorting the records in ascending order by Score field and getting the top 3 records (i.e. the three smallest values); top(-Scroe;3) means sorting the records in descending order by Score field and getting the top 3 records (i.e. the three biggest values); and top(#;3) means getting the top three records according to the original order. The computed result of this step is as follows: 

A4:=A3.union(). It performs union operation on each group of data. Result is as follows: 
The above four steps can also be combined into one step for the convenience of maintenance and debugging. Thus the code will be db.query(“select * from golf”).group(User_ID). (~.top(-Score;3)).union().
In addition, esProc program can be called by the reporting tool or a Java program in a way similar to that in which a Java program calls an ordinary database. The JDBC provided by esProc can be used to return a computed result in the form of ResultSet to the Java main program. Please refer to related documents for more details.

No comments:

Post a Comment