esProc, A Script Language for Data Analytics with Parallel Mechanism: esProc Codes Column Shuffling

In database table REF_VALUES, ID column acts as the primary key and ORIGINAL_VALUE is used as the original column, whose values are to be shuffled and updated to the empty SHUFFLED_VALUE column. Below is a selection from the table:

SQL approach:

Databases differ in their solution to the problem. Here Oracle is used as the example:

create or replace procedure shuffle as

TYPE My_CurType IS REF CURSOR;

CUR_1 My_CurType;

V_id NUMBER(10,0);

V_result varchar2(20);

v_sql varchar2(2000);

begin

OPEN CUR_1 FOR select t1.id,t2.result from (

select rownum rn,id,ORIGINAL_VALUE from REF_VALUES order by ORIGINAL_VALUE) t1

join(

select rownum rn,result from(

select ORIGINAL_VALUE result from REF_VALUES order by dbms_random.value()))t2

on t2.rn=t1.rn;

LOOP

fetch CUR_1 INTO V_id,V_result ;

EXIT WHEN CUR_1%NOTFOUND;

v_sql:=’update REF_VALUES set SHUFFLED_VALUE=”’||V_result||”’ where id=’||V_id;

EXECUTE immediate v_sql;

END LOOP;

CLOSE CUR_1;

end shuffle;

You need to write multilayered nested subqueries to get the shuffling result (as shown by the first half of the above code), then you need cursors (or temporary tables) and the stored procedure to insert it to the target table. The code will be rather tedious.

The esProc approach allows us to dispense with the complex nested subqueries and applies to various types of databases. The code is as follows: