July 7, 2015

esProc Simplifies SQL-style Computations – Interval Merging

There are many complicated SQL-style computations in database application development. Interval merging is one of them, such as the summation of non-overlapping time intervals and merging of overlapping time intervals, etc. Due to the lack of orderliness of an SQL set, SQL solves the problems by resorting to recursive method that is difficult to be used with a database that supports not enough recursive functions. Besides, we need to write a nested SQL statement containing multilayered subqueries that is too lengthy for realization and modification.

In contrast, esProc has an easier way for the computation. It merges intervals step by step through intuitive script. Let’s look at an example. 

Table Udetail stores user operation records in detail. Some of the original data are as follows:
ID     UID           ST                                 ET
1       1001         2014-9-1 10:00:00   2014-9-2 11:30:00
2       1001         2014-9-1 10:30:00   2014-9-2 11:00:00
3       1001         2014-9-3 11:00:00   2014-9-4 12:00:00
4       1001         2014-9-4 10:00:00   2014-9-5 13:00:00
5       1001         2014-9-4 15:00:00   2014-9-5 18:00:00
6       1002         2014-9-1 11:00:00   2014-9-2 11:30:00
7       1002         2014-9-1 10:30:00   2014-9-2 11:00:00
In this table, ST and ET represent respectively the starting time and ending time of the operations and each user may have overlapping operational time intervals. Based on a specified user, we are to:
1.       Merge the overlapping time intervals so as to create a new record of time intervals;
2.       Summarize the total time of non-overlapping intervals.

esProc does it as follows:

An explanation of the script

A1=db1.query("select * from udetail where UID=?",arg1)

Retrieve data of the specified user from the database. arg1 is an external parameter. Suppose the value of it is 1001, then the query result is as follows: 

A2= A1.sort(ST).select(ET>ET[-1]).run(max(ST,ET[-1]):ST)

This line of code first sorts A1’s table by the starting time (.sort(ST)) and selects records by the condition that each ET is greater than the previous one, i.e. to delete the intervals completely covered by others. For every two overlapping records, get the minimum ST value and the maximum ET value to create a new time interval. The result is as follows: 

It can be seen that sets are ordered in esProc, so the previous record can be referenced by its number. It is very different from SQL. If the continuous time intervals need to be combined into one interval, A2’s code can be like thisA2=A1.sort(ST).select(ET>ET[-1]).run(if(ST<ET[-1],ST[-1],ST):ST).group(ST;~.m(-1).ET:ET)

And the following result will be got: 


Summarize the total overlapping time. The result is as follows: 
If the total non-overlapping time is wanted (without details), the code can be
If the whole step of time is relatively small, the code can be modified as
A1.(periods@s(ST,ET)).union().len()-2, which gets the result by counting the number of time points.

Finally, an esProc script can be called by the reporting tool or the Java program much like they call a database. It returns a result in the form of ResultSet through JDBC provided esProc. See related documents for more details.

July 5, 2015

esProc Divides Letters into Segments

The RBP_VALUE field in the rbd_dtl table stores the string “all” and some letters. We want to convert values in the field to strings of segments. The desired result is A-F,H,J-K,N-O,Q-S,U-V,X-Z.

Below is the original data

Rules to be followed are

If RBD_VALUE= “ALL”, then discard the value;
According to the alphabetical order if certain RBD_VALUE values are consecutive letters , then make them a segment with both the first and last letter included and “-” between them, like “A-F”;

If two values of RBD_VALUE are inconsecutive, like K and N between which L and M are absent, then unite the former with the letter or letters before it and join the latter with the letter or letters after it. Separate the two segments with a comma. This is like “J-K”, “N-O”;

If a value in RBD_VALUE has not consecutive letters both before and after it, then take it as a separate segment, like the letter H. G and I are absent respectively before and after it.

esProc approach

A1: Execute the SQL statement.

A2: Align RBD_VALUE field in A1 with the alphabet. Below is a selection of the result:

A3: Group rows in A2. Group rows where RBD_VALUE is null together and put the other rows where RBD_VALUE is non-null into another group. @o means performing merge-style data grouping, which only compares the neighboring values. The result is as follows:

A4: Select the group in which RBD_VALUE is non-null.

A5: Compose strings according to the number of members in each group. ~.m(-1) means getting the last record from the current group. The result is as follows:
A6With commas being delimiters, concatenate strings to get the final result A-F,H,J-K,N-O,Q-S,U-V,X-Z.

Actually we simply need to retrieve data from RBD_VALUE field. Thus the lines of code can be combined into one:

myDB1.query("select RBD_VALUE from rbd_dtl").(RBD_VALUE).align(26.(char(64+~))).group@o(!~).select(~(1)).(~(1)+if(~.len()>1,"-"+~.m(-1))).string@d()

July 3, 2015

esProc Performs Random UPDATE with Priorities

Target:Among the records that satisfy the specified condition in the TMP_SURVEY_TRAN_BZ_3_WORKING table, select 20 ones at random and change values of their Quota_Include_Ind field into “Y”. But there is a list of priorities for the update: If the number of records satisfying customer_type=’r’ is greater than 20, then the 20 random records for updating will be chosen from them; if the number of records satisfying the same condition is less than 20 (say 15), then we’ll update all these 15 records, plus another 5 ones chosen randomly from records that satisfy customer_type<>’r’.

Below is a selection from the original data:

esProc approach: 

A1,A2:Both execute SQL statement to retrieve primary key values of the eligible records according to the parameters. A1’s records meet condition customer_type=’r’ and A2’s record meet condition customer_type<>’r’.

A3: Get primary key values of the records to be updated according to the number of records in A1. A1.sort(rand()) means sorting A1 randomly; to(20) equals [1,2…20], which means getting 20 records in order. The operator | is used to concatenate two sets.

Results of A1, A2 and A3 are listed separately as follows:

A4: Update the table based on A3. @u indicates that only UPDATE statement will be generated.

July 2, 2015

esProc Codes Column Shuffling

In database table REF_VALUES, ID column acts as the primary key and ORIGINAL_VALUE is used as the original column, whose values are to be shuffled and updated to the empty SHUFFLED_VALUE column. Below is a selection from the table: 

SQL approach:

Databases differ in their solution to the problem. Here Oracle is used as the example:

create or replace procedure shuffle as
CUR_1 My_CurType;
V_id NUMBER(10,0);
V_result varchar2(20);
v_sql varchar2(2000);
  OPEN CUR_1 FOR  select t1.id,t2.result from (
        select rownum rn,id,ORIGINAL_VALUE from REF_VALUES order by ORIGINAL_VALUE) t1
        select rownum rn,result from(
            select ORIGINAL_VALUE result from REF_VALUES order by dbms_random.value()))t2
    on t2.rn=t1.rn;
    fetch CUR_1 INTO V_id,V_result ; 
    v_sql:=’update REF_VALUES set SHUFFLED_VALUE=”’||V_result||”’ where id=’||V_id;
    EXECUTE immediate v_sql;  
end shuffle;

You need to write multilayered nested subqueries to get the shuffling result (as shown by the first half of the above code), then you need cursors (or temporary tables) and the stored procedure to insert it to the target table. The code will be rather tedious. 

The esProc approach allows us to dispense with the complex nested subqueries and applies to various types of databases. The code is as follows:

A1: Execute the SQL statement to get data from ID column and ORIGINAL_VALUE column.

A2: Shuffle the values in ORIGINAL_VALUE column.

A3: Join A1’s ID column with A2’s ORIGINAL_VALUE column to create a two-dimensional table, as shown below:

A4:Update REF_VALUES table with A3’s table. @u option means only generating the UPDATE statement. The updated REF_VALUES table is as follows:

July 1, 2015

Transforming Records with esProc

Table 1 (T1) stores single-value id field and name field. A selection of the source data is as follows:
id    name
1     a
2     b
3     c
4     d
5     e

Table 2 (T2) stores multi-value ids field as follows:

According to T1’s id field and name field, we want to transform T2’s ids field into a field with values being the corresponding names. The transformed T2 is expected to be like this:

esProc approach

A1A2Execute SQL statements to retrieve data from T1 and T2.

A3Convert ids values in A2’s records to sequences one by one, and match them with T1’s records to get sequences of names, which then will be converted to records separated by the comma. The result is like this: 

Of course, the above approach is based on the assumption that values of id are consecutive integers. If they are not, then the last step could be rephrased as follows:
A4=A2.(ids.array().(A1.find(~).name ).string@d())

The SQL approach uses subquery to generate distinct group numbers, and then solves the problem with relatively complex JOIN. The following shows how it is coded in MySQL:
  select group_concat(t1.name) names from(
    select @row:=@row+1 as gid,ids from t2,(SELECT @row :=0 ) r
) t2 left join t1 on find_in_set(t1.id,t2.ids) group by t2.gid

But note that this SQL approach can only apply in the situation where there are no duplicate values for ids in T2. Otherwise only one of the duplicates is allowed to be kept. For instance, the transformed result of “2,1,2,3” is “a,b,c” instead of “b,a,b,c”. By comparison, esProc syntax is more intuitive, easier to understand and can be used more widely.

esProc can be called by reporting tools as well as the JAVA program. The way is much like the one a database is called. Through JDBC interface provided by esProc, result can be returned to the JAVA main program in the form of ResultSet. See related documents for detailed method.