January 30, 2015

esProc Helps Process Heterogeneous Data Sources in Java - JSON

Java’s JSON open source package can only parse JSON data and hasn’t the computational function. It is troublesome for programmers to develop a general program for performing computations, such as grouping, sorting, filtering and joining, by themselves. For example, during developing program for performing conditional filtering in JSON files using Java, the code has to be modified if the conditional expression is changed. If they want to make it as flexible as SQL in performing conditional filtering, they have to write code for analyzing and evaluating expressions dynamically. This requires quite a lot of programming work.

esProc supports dynamic expressions. It also can be embedded in Java to write the general program for computing JSON data. Let’s give an example to see how it works. There are to-be-processed JSON strings that contain employee information, including fields such as EID, NAME, SURNAME, GENDER, STATE, BIRTHDAY, HIREDATE and DEPT, etc. Parse the strings and select female employees who were born on and after January 1st, 1981. The content of the strings is shown as follows:

[{EID:1,NAME:"Rebecca",SURNAME:"Moore",GENDER:"F",STATE:"California",BIRTHDAY:1974-11-20,HIREDATE:2005-03-11,DEPT:"R&D",SALARY:7000},
{EID:2,NAME:"Ashley",SURNAME:"Wilson",GENDER:"F",STATE:"New York",BIRTHDAY:1980-07-19,HIREDATE:2008-03-16,DEPT:"Finance",SALARY:11000},
{EID:3,NAME:"Rachel",SURNAME:"Johnson",GENDER:"F",STATE:"New Mexico",BIRTHDAY:1970-12-17,HIREDATE:2010-12-01,DEPT:"Sales",SALARY:9000},…]

Implementation approach: Call esProc program using Java and input the JSON strings which will then be parsed by esProc, perform the conditional filtering and return the result in JSON format to Java. Because esProc supports parsing and evaluating expression dynamically, it enables Java to filter JSON data as flexibly as SQL does.


For example, it is required to query female employees who were born on and after January 1, 1981. esProc can input two parameters: “jsonstr” and “where”, as the conditions. This is shown as follows:
“where” is a string, its values is BIRTHDAY>=date(1981,1,1) && GENDER=="F".

The code written in esProc is as follows:
A1: Parse the JSON data into a table sequence. esProc’s IDE can display the imported data visually, as shown in the right part of the above figure.

A2: Perform the conditional filtering, using macro to realize parsing the expression dynamically. The “where” in this process is an input parameter. In executing, esProc will first compute the expression surrounded by ${…}, take the computed result as macro string value, replace ${…} with it and then interpret and execute the code. The final code to be executed in this example is =A1.select(BIRTHDAY>=date(1981,1,1) && GENDER=="F").

A3: Generate JSON strings using the filtered table sequence.

A4: Return the eligible result set to the external program.
When the filtering condition is changed, you just need to modify “where”– the parameter. For example, it is required to query female employees who were born on and after January 1, 1981, or employees whose NAME+SURNAME is equal to “RebeccaMoore”. The value of “where” can be written as BIRTHDAY>=date(1981,1,1) && GENDER=="F" || NAME+SURNAME=="RebeccaMoore". After the code is executed, the result set in A2 is as follows:
Since the esProc script is called in Java through the JDBC interface, the returned result is set - the object of ResultSet. Fetch the first field of string type in set, and this is the filtered JSON string. Detailed code is as follows (save the above program in esProc as test.dfx):
         // create a connection
         Class.forName("com.esproc.jdbc.InternalDriver");
         con= DriverManager.getConnection("jdbc:esproc:local://");
         // call the program in esProc (the stored procedure); test is the name of file dfx
         com.esproc.jdbc.InternalCStatementst;
         st =(com. esproc.jdbc.InternalCStatement)con.prepareCall("call json(?,?)");
         // set the parameters; as the JSON string is long, part of it is omitted.
// In practice, JSON strings may be generated by various ways; see below for the explanation.
          String jsonstr=
"[{EID:1,NAME:\"Rebecca\",SURNAME:\"Moore\",GENDER:\"F\",STATE:\"California\...}]";
         st.setObject(1,jsonstr);
         st.setObject(2,"BIRTHDAY>=date(1981,1,1) && GENDER==\"F\"");
// execute the esProcstored procedure
         ResultSet set=st.executeQuery();
         // get the filtered JSON string
         String jsonstrResult;
         if(set.next()) jsonstrResult = set.getString(1);

JSON format is common used by interactive data in internet application. In practice, JSON strings may be retrieved from local files or remote HTTP server. esProc can directly read JSON strings from the files or the HTTP server. Take the latter as an example. It is assumed that there is a testServlet which returns JSON strings of employee information. The code for performing the operation is as follows:
A1: Define an httpfile object, the URL is
  http://localhost:6080/myweb/servlet/testServlet?table=employee&type=json.
A2: Read the result returned by the httpfile object.
A3: Parse the JSON string and generate a table sequence.

A4: Filter data according to the conditions.
A5: Convert the filtered table sequence to JSON strings.

A6: Return the result in A4 to the Java code that calls this piece of esProc program. 

January 27, 2015

esProc Helps Process Heterogeneous Data Sources in Java - MongoDB

It is convenient to establish and close a connection to MongoDB in esProc , as well as to call the database to query and count the data, perform distinct and aggregate operations. 

1. Preparation for connecting to MongoDB

The MongoDB Java driver (like mongo-java-driver-2.12.2.jar), which esProc does’nt provide, should be put into [esProc installation directory]\common\jdbc beforehand when esProc designer is used to access the database. The MongoDB Java driver’s download link is https://github.com/mongodb/mongo-java-driver/releases.

To access MongoDB by calling the esProc script through esProc JDBC in a Java program, the above MongoDB Java driver needs to be put into the classpath of the Java program. 

2. Create and close a connection to MongoDB

In an esProc script, mongodb(con) function is used to establish a connection to the MongoDB database and mdb.close() function is used to close it, as shown by the following code:

A1: Connect to MongoDB. IP and port are localhost:27017; database name is test, and so are the user name and the password.
A2: Close the MongoDB connection. 

3. Common data query and selection

Let’s take the employee, a collection in the database whose name is test, of MongoDB as an example to illustrate the esProc script for accessing MongoDB. Some of the data of employee are as follows:
{ "_id" : ObjectId("546aea22ee62ea452bdef4fe"), "EID" : 1, "NAME" : "Rebecca", "
SURNAME" : "Moore", "GENDER" : "F", "STATE" : "California", "BIRTHDAY" : "1974-1
1-20", "HIREDATE" : "2005-03-11", "DEPT" : "R&D", "SALARY" : 7000 }
{ "_id" : ObjectId("546aea22ee62ea452bdef4ff"), "EID" : 2, "NAME" : "Ashley", "S
URNAME" : "Wilson", "GENDER" : "F", "STATE" : "New York", "BIRTHDAY" : "1980-07-
19", "HIREDATE" : "2008-03-16", "DEPT" : "Finance", "SALARY" : 11000 }
{ "_id" : ObjectId("546aea22ee62ea452bdef500"), "EID" : 3, "NAME" : "Rachel", "S
URNAME" : "Johnson", "GENDER" : "F", "STATE" : "New Mexico", "BIRTHDAY" : "1970-
12-17", "HIREDATE" : "2010-12-01", "DEPT" : "Sales", "SALARY" : 9000 }

The following is a sample of esProc script for querying data of the collection of MongoDB: 

A2: Filter and query the data of employee. find function’s first parameter is employee, the collection’s name; its second parameter is the filtering criterion: STATE=” California” and SALARY >4,000; the third one specifies which properties should be fetched (here GENDER won’t be fetched). Noticed that esProc uses the same parameter format as that used in MongoDB’s find statement.

The function returns a cursor, which can fetch data in batch for computing when the result set is big, so as to avoid the memory overflow. The usage of cursor has been explained in detail in esProc-related documents. Here suppose the data are not big and use fetch function to fetch all the results for observation: 

4.Query and filter nested arrays and documents

Nested arrays and docements are multilevel data structure specific to MongoDB. The Java application doesn’t support this multilevel data structure when it accesses MongoDB through JDBC. Because esProc supports referencing data by fields, like referencing a record sequence by a field, it can fetch the nested arrays and documents of MongoDB.

For a nested array, let’s look at a collection – fruitshop:
{ "_id" : ObjectId("548a58c19cfbc9dd702eb71e"), "name" : "big fruit", "fruits" :
 [ "apple", "pear", "orange" ] }
{ "_id" : ObjectId("548a58c19cfbc9dd702eb71f"), "name" : "good fruit", "fruits"
: [ "banana", "pear", "orange" ] }
{ "_id" : ObjectId("548a58c29cfbc9dd702eb720"), "name" : "my fruit", "fruits" :
[ "banana", "apple", "tomato" ] }

And for a nested docment, let’s look at this collection – blogs:
{ "_id" : ObjectId("548a59fa9cfbc9dd702eb721"), "content" : "It is too hot", "co
mment" : [ { "author" : "joe", "score" : 3, "comment" : "just so so!" }, { "auth
or" : "jimmy", "score" : 5, "comment" : "cool! good!" } ] }
{ "_id" : ObjectId("548a59fa9cfbc9dd702eb722"), "content" : "It is too cold", "c
omment" : [ { "author" : "james", "score" : 1, "comment" : "yes!" }, { "author"
: "jimmy", "score" : 5, "comment" : "cool!" } ] }
{ "_id" : ObjectId("548a59fb9cfbc9dd702eb723"), "content" : "It is windy day tod
ay", "comment" : [ { "author" : "tom", "score" : 3, "comment" : "I do not think
so!" }, { "author" : "jimmy", "score" : 5, "comment" : "cool!" } ] }

esProc script for querying and filtering data of the two collections: 

A2: The filtering criterion of find function includes the pear insead of being equal to pear, for the fruits is an array. The return result of find funciton is a cursor, from which the resulting table sequence is fetched using fetch function and you can see fruits field is a referencing field, which references a sequence consisiting of various fruits names, as shown below: 

Click the blue part of the first row to see the detailed information of the referenced sequence, as shown below: 

Based on this table sequence with a referencing field, esProc can further perform more complicated set operations.

A3: comment is a collection of documents, and the filtering criterion is the comment’s documents whose author is jimmy. find function returns a cursor, from which the table sequence is fetched using fetch function. comment field of the table sequence is a referencing field that references a table sequence, as shown below: 

Click the blue part of the second row to see the detailed information of the table sequence, as shown below: 

Based on this table sequence with a referencing field, esProc can further perform more complicated set operations. Here further discussion is omitted. 

5. Count the data, perfor distinct and aggregate operations

With more powerful ability in computing and flow control, usually esProc is used to perform the complicated multi-step operation and MongoDB, a highly-efficient storage solution, is responsible for providing data without participating in the computation. But considering the performance loss during the data transmission, you can use MongoDB alone to perform some simple aggregate operations.

esProc provides the way of calling MongoDB interface to count the data, perform distinct and aggregate operations. Different from the find function for querying and filtering data, the return results of the three operations are table sequences instead of cursors. A sample code is as follows: 

A2: Count the number of eligible files in employee, a collection, using count function. The function’s first parameter is employee, the collection’s name; its second parameter is the filtering criterion: STATE=” California”. The result is count value: 

A3: Get the distinct values from employee using distinct function. The function’s first parameter is employee, the collection’s name, and the second parameter is one of the properties, DEPT. The return result is a table sequence consisting of employee’s DEPT without duplicate members: 
A4: Group and summarize employee. The first parameter of aggregate function is employee, the collection’s name, and the second one includes the grouping property and aggregate property, which means the data of the collection will be grouped by DEPT and then count the number of files in each group. aggregate function returns a cursor as well, whose data will be fetched all at once using fetch function. 

A5: Group and summarize employee, which is the same as the above. But another aggregate property is added to the second parameter of aggregate function, which means grouping the data of the set by DEPT and computing the average value of SALARY in each group. 


January 21, 2015

esProc Helps Process Heterogeneous Data Sources in Java –Cross-Database Relating

JoinRowSet and FilteredRowSet provided by RowSet– Java’s class library for data computing – can perform cross-database related computing, but they have a lot of weaknesses. First, JoinRowSet only supports inner join, it doesn’t support outer join. Second, test shows that db2, mysql and hsql can work with JoinRowSet, yet the result set of join oracle11g to other databases is empty though no error reporting will appear. The fact is there were two users who perform cross-database join using oracle11g database even got the correct result. This suggests that JDBC produced by different database providers will probably affect the result obtained by using this method. Last, the code is complicated.

esProc has proved its ability in assisting Java to perform cross-database relating. It can work with various databases, such as oracle, db2, mysql, sqlserver, sybase and postgresql, to perform a variety of cross-database related computing, like inner join and outer join involving heterogeneous data. An example will teach you the way esProc works. Requirement: relate table sales in db2 to table employee in mysql through sale.sellerid and employee.eid, and then filter data in both sales and employee according to the criterion state=”California”. The way the code is written in this task applies to situations where other types of databases are involved.


The structure and data of table sales are as follows: 

The structure and data of table employee are as follows:

Implementation approach: Call esProc script using Java program, join the multiple databases together to realize the cross-database relating, perform filtering and return the result to Java in the form of ResultSet.

The code written in esProc is as follows:

A1: Connect to the data source db2 configured in advance.
A2: Connect to the data source mysql configured in advance. In fact oracle and other types of databases can be used too.
A3, A4: Retrieve table sequences: sales and employee, from db2 and mysql respectively. esProc’s Integration Development Environment (IDE) can display the retrieved data visually, as shown in the right part of the figure in the above.
A5: Relate sales to employee through sellerid=eid using esProc’s object reference mechanism.
A6: Filter the two table sequences according to state="California".
A7: Generate a new table sequence and get the desired fields.
A8,A9: Close the connection
A10: Return the result to the caller of esProc program.

This piece of program is called in Java using esProc JDBC to get the result. The code is as follows (save the above esProc program as test.dfx):
//create a connection using esProcjdbc
Class.forName("com.esproc.jdbc.InternalDriver");
con= DriverManager.getConnection("jdbc:esproc:local://");
// call esProc program (the stored procedure) in which test is the name of file dfx
com.esproc.jdbc.InternalCStatementst;
st =(com.esproc.jdbc.InternalCStatement)con.prepareCall("call test()");
// execute esProc stored procedure
st.execute();
// get the result set
ResultSet set = st.getResultSet();

January 14, 2015

esProc Helps with Computation in MongoDB – Subquery

MongoDB doesn’t support the complex subquery which can only be realized by retrieving the data out first and then performing further computation. The operation is the same complex even if Java or other programming languages are used to write the program. In view of this, we can consider using esProc to help MongoDB with the computation, the method of which will be illustrated in detail through an example.

In MongoDB, there are two docments: orders, which stores the orders data, and employee, which stores the employee data. They are as follows:
> db.orders.find();
{ "_id" : ObjectId("5434f88dd00ab5276493e270"), "ORDERID" : 1, "CLIENT" : "UJRNP
", "SELLERID" : 17, "AMOUNT" : 392, "ORDERDATE" : "2008/11/2 15:28" }
{ "_id" : ObjectId("5434f88dd00ab5276493e271"), "ORDERID" : 2, "CLIENT" : "SJCH"
, "SELLERID" : 6, "AMOUNT" : 4802, "ORDERDATE" : "2008/11/9 15:28" }
{ "_id" : ObjectId("5434f88dd00ab5276493e272"), "ORDERID" : 3, "CLIENT" : "UJRNP
", "SELLERID" : 16, "AMOUNT" : 13500, "ORDERDATE" : "2008/11/5 15:28" }
{ "_id" : ObjectId("5434f88dd00ab5276493e273"), "ORDERID" : 4, "CLIENT" : "PWQ",
 "SELLERID" : 9, "AMOUNT" : 26100, "ORDERDATE" : "2008/11/8 15:28" }
> db.employee.find();
{ "_id" : ObjectId("5437413513bdf2a4048f3480"), "EID" : 1, "NAME" : "Rebecca", "
SURNAME" : "Moore", "GENDER" : "F", "STATE" : "California", "BIRTHDAY" : "1974-1
1-20", "HIREDATE" : "2005-03-11", "DEPT" : "R&D", "SALARY" : 7000 }
{ "_id" : ObjectId("5437413513bdf2a4048f3481"), "EID" : 2, "NAME" : "Ashley", "S
URNAME" : "Wilson", "GENDER" : "F", "STATE" : "New York", "BIRTHDAY" : "1980-07-
19", "HIREDATE" : "2008-03-16", "DEPT" : "Finance", "SALARY" : 11000 }
{ "_id" : ObjectId("5437413513bdf2a4048f3482"), "EID" : 3, "NAME" : "Rachel", "S
URNAME" : "Johnson", "GENDER" : "F", "STATE" : "New Mexico", "BIRTHDAY" : "1970-
12-17", "HIREDATE" : "2010-12-01", "DEPT" : "Sales", "SALARY" : 9000 }
It is required to select the orders data in which SELLERID is the EID of records of employees where STATE is California in employee. This can be expressed in SQL like this:
Select * from orders where orders.sellerid in (select eid from employee where employee.state=’ California’)

The data of orders cannot be loaded entirely because their size is big, while the data size of both employee and the final result are not big.


The esProc script used to help MongoDB realize this subquery is as follows:

A1: Connect to MongoDB. localhost:27017 is the IP and the port number. test is the database name, user name as well as the password.

A2: Use find function to fetch the data from MongoDB and create a cursor. orders is the collection, with a filtering condition being null and _id , the specified key, not being fetched. Here esProc uses the same parameter format in find function as that in find statement of MongoDB. As esProc’s cursor supports fetching and processing data in batches, the memory overflow caused by importing big data all at once can thus be avoided.

A3: Fetch the desired data from employee according to the condition that STATE is California.

A4: Sort the EID field of A3’s employee data.

A5: Select the desired data from orders according to the condition requiring that SELLERID must exist in the table sequence in A4. That is, the condition can be expressed as SELLERID in A4. The resulting data can be loaded all at once. But if the data are too big, they can be fetched in batches. To fetch the top 1,000 rows, for example, can be expressed as fetch(1000).

The result is as follows:

NoteesProc isn’t equipped with a Java driver included in MongoDB. So to access MongoDB using esProc, you must put MongoDB’s Java driver (a version of 2.12.2 or above is required for esProc, e.g. mongo-java-driver-2.12.2.jar) into [esProc installation directory]\common\jdbc beforehand.

The esProc script used to help MongoDB with the computation is easy to be integrated into the Java program. You just need to add another line of code - result A6 to output a result in the form of resultset to Java program. For the detailed code, please refer to esProc Tutorial. In the same way, MongoDB’s Java driver must be put into the classpath of a Java program before the latter accesses MongoDB by calling an esProc program.

January 7, 2015

esProc Helps with Computation in MongoDB– Query Indexes in an Array

MongoDB can find out elements of a built-in array according to their indexes, but cannot find the indexes through the values of the elements. For example, the elements of an array are names of people stored according to their rankings. In MongoDB, names can be found according to the rankings (indexes of the array), but the values of rankings cannot be determined through names. esProc can help MongoDB in realizing this operation. The following example will teach you how it works in detail.

b–a collection of MongoDB, stores a name and an array of friends. The names in the array of friends are stored in the order of rankings, as shown below:
>db.b.find({"name":"jim"})
{ "_id" : ObjectId("544f3bf8cdb02668db9ab229"), "name" : "jim", "friends" : [ "t
om", "jack", "luke", "rose", "james", "sam", "peter" ] }

In MongoDB, you can find names through specified rankings. For example, find out the name of the first ranking among Jim’s friends:
>db.b.find({"name":"jim"},{"friends":{"$slice":[0,1]}})
{ "_id" : ObjectId("544f3bf8cdb02668db9ab229"), "name" : "jim", "friends" : [ "t
om" ] }


But you cannot find outthe ranking of “luke”,one of Jim’s friends. To solve the problem, esProc has its own script: 


A1: Connect to the MongoDB database. The IP and port number is localhost:27017, the database name is test and both the user name and the password are test. If any other parameters are needed, write them in line with the format mongo://ip:port/db?arg=value&…

A2: Fetch data from the MongoDB database using find function to create a cursor.The collection isb. The filtering criterion is name=jim and the specified keys are name and friends. It can be seen that this find function is similar to the find function of MongoDB. By fetching and processing data in batches, the esProc cursor can avoid the memory overflow resulting from big data importing.

A3: Since the data are not big, fetch function will fetch them all at once.

A4: Find out the position of luke using pos function.

After the code is executed, the result is as follows:

Please note that esProc hasn’t the java driver of MongoDB. To access MongoDB with esProc, the latter (a driver of 2.12.2 version or above is required, i.e. mongo-java-driver-2.12.2.jar) should be put into the [esProc installation directory]\common\jdbc beforehand.

The script for computation in MongoDB with the assistance of esProc is easy to integrate with Java program. By adding another line of code – A5, which is result A4, the result in the form of resultset can be output to Java program. For detailed code, please refer to esProc Tutorial. In the same way, to access MongoDB by calling esProc code with Java program also requires putting the java driver of MongoDB into the classpath of Java program.