July 31, 2014

Constants in esProc

In esProc computing, we may frequently use constants or store the data in the cell directly. In this article, let’s learn the usage of constants in the esProc.

1.Directly using constant in the expression
Constants can be directly used in the esProc expression:

 

When using the constant in the expression, the integer and float can be used directly, while the string must be quoted with the double quotation mark “”. For the Boolean constant true and false, they can be used directly in the expression, but there is few cases of using it. .
In particular, the capitalized L can be added to the integer to indicate the long integer. Compared with the integer, the long integer has a wider range of value; The hexadecimal long integer can be represented with a string whose first two characters are 0x:
 
Because the value range of normal integers is -231~231-1, that is, -2147483648~2147483647, the result in A2 exceeds the value range of integer. In A1, a long integer is used, and the value range is increased to -263~263-1. By this way, the correct result can be obtained. As can be noticed, in the further computing, if one of the operands involved in the integer computing is the long integer, the result is the long integer.
In the string within the expression, if there are special characters like " and \, an escape character \ is required as indicator before each special character:
 
The time and date constants are not allowed in the expression. Instead, only the type conversion functions, such as date(), time(), and datetime(), can be used to convert string or long integer:
 

2. Constant cell 
When populating the data to the cell directly, if the cell can interpret the character string as a constant, then this cell can be regarded as the constant cell. Its cell value is the constant.
The default text in the constant cell is pink. Based on the data in the cell, the constant will be parsed into various data types. If the cell string in the cell is not understandable, they will be interpreted into string. For example:

As may be noticed, the double quotation marks are not necessary when defining a string in the constant cell.
Otherwise, in the constant cell, the constant can be represented as percentage, like 5%:
 
In the constant cell A1, 5% will be converted to the corresponding float value 0.05. Note: The writing style of 5% cannot be used in the expression. It is only valid in the constant cell.
In particular, the value in the constant cell can also be true, false or null. The constant cell value is the corresponding Boolean value or null value:
 
The constant string cell can be used to specify the value in the cell as a string. The cell string in constant string cell is started with the English single quotation mark '. The constant string cell is used to define the string containing special characters or  string composed of figures:
 
When using the constant string cell, the escape character \ is not necessary if any special character exists in the  string:
 
In the constant cell, we can also define the sequence constant with the [], and use , to separate the sequence members. For example,
 
when defining the sequence constants, the double quotation marks are also unnecessary for indicating the members of  string in the sequence.

In the expression, we can use the cell name to make references to the constant cell:

3.The sequence of constant cells
In esProc, once constants are saved in multiple cells, these cells are often used to form constant sequence for use in the expression. For example,
 
In the A2 expression, the [A1:F1] indicates a sequence composed of all cells in the A1~F1 area.
The cells in the sequence can also be used in a range of continuous areas:
 
In the A3 expression, [A1:C2] indicates a sequence composed of cell values in this area from A1 to C2, following the order of first-row-then-column to form the sequence.

In esProc, the range of constant cells are often populated to a TSeq:
 
In A5, a new TSeq is created. In A6, the record function is used to populate the values of constant cells in the range of A1~E4 area to the TSeq. With such method, the dfx file can be used to store the tabular data.

4. Program parameter
In esProc, some common constants are often set as the program parameter. Firstly, we need to define the program parameters in esProc. When defining the default for the  parameters, the similar constant cell will also parse the type automatically:
 
When defining the  parameters, we can also choose to set the parameter before each running, and modify the parameter value before cellset computing. The parameter can be used in a similar way as using constants:
 
Likewise, the sequence can also be used in the program parameter:
 
 

July 30, 2014

Loop Code for esProc

With esProc, the code for loop is mainly implemented with for statement. The for statement will repeat the code block with for as the main cell. There are different formats of the for statements, as listed below:

1.The for loop 
Unconditional loop. The values of the main cell are, in turn, the count for the current loop, which often needs to be exited by break command.
Unconditional loop or endless loop is the simplest structure, for example:


In the above example, A2 executes unconditional loop. According to the loop number in A2, B2 retrieves one employee record every time. The rest of codes save the employee information for Texas in B1. After the information for first 10 Texas employees were retrieved, the loop will jump out with break statement in C6. Alternatively, break C command can also be used to replace break statement to jump out from the loop body with cell C as the main cell.
After execution, B1 contains the following results:
 
Note that when using the unconditional loop, we need to make sure that the break statement can be executed properly, otherwise theprogram will never end at the time of execution.

2.for A
Loop the member of the sequence A, the main cell value will in turn be the current member of A.

In esProc, the most commonly used loop is to loop on every member in a sequence. For example:
 
In this example, all Texas employees are selected in A2. The loop is executed in A3 for each Texas employee to compute their age, and then store the oldest age in B1. When the loop is complete, we can get the maximum age for all Texas employees from B1.
For relatively simple loop statement, sometimes we can use the loop function for sequence to achieve the same result. For example:
 

3.for n
Loop n times;with the main cell value as the current loop count in turn.

In addition to loop on all members in a sequence, loop for specific times is also very common.
For example, suppose we have a piece of paper which is 1000mm * 800mm in size and 0.1mm in thickness. If we fold it ten times, what will be the final length, width and thickness?
 
In the above example, the folding is looped ten times. After each folding the length will be reduced by half, and the thickness will be doubled. If after the folding the width of the paper is greater than the length, the length and width will be switched. After execution, A1, B1 and C1 contains the length, width and thickness of the paper as following:
   
The for n statement can also be seen as a simple form of for to (1, n) . This means "loop from 1 to n".

4.for a, b, s
Loop from a to b, at the step of s. This means "to loop on each member of array to(a, b).step (s)". If s is absent the default value will be 1.
Sometimes the loop does not start from 1, and the step value is not 1. We can then use for a, b, s loop. For example:
 
For any n digits number, if the total ofthe n-th power of each digit equals its own, the number can be called daffodils number. The following example is used to identify all three digits daffodils number:
 
In the loop codesin A2, the loop is excecuted on all 3 digits numbers. After the execution of program, we can see the result in A1:
 

5.for x
Loop when x is true. The value in main cell is the computed value for x.

By specifying the loop condition, we can control when to end the loop. For example, the issue in section 1 can also be solved with for statementwith specific loop conditions:


In the main cell of for loop, the cell value is either true or false, which is the result we got after computing the loop condition every time. At this point we can not use the value from main cell as the count of loop. Therefore to retrieve employee records in B2, we use #C to get the current number of loops. In #C, C is the main cell of loop. After the code is executed, the result is the same as in section 1.

6.for cs,n:x
Loop cursor, and retrieve n records each time, or retrieve records until x has changed. Return table sequence. Close after the reading.
When retrieve the data with cursor, we can use for statement to loop the data in cursor.


7.Nested loop
In the body of the loop, we can also use loop statement. This is what we call nested loops, or multi-layer loops. For example:
 
In the example, A1 stores the prime number sequencewe found; A2 loops the integer from 2 to 1000. The prime number sequence is looped in B2. If the number in A2 is divisible by certain prime number, it will be replaced by the next number in A2 with the command of next A2. If the number in A2 is not divisible by all integer number in current prime number list, we then find a new prime number. It will be added to the sequence in A1. Among them, the next C command in D2 specifies to skip the rest part of the loop body and enter the next round of loop. After execution, the sequence for prime numbers within 1000 are as following in A1:


When using the next statement, we don't have to specify the main cell C, to skip the deepest level of the loop where the next statement is located.
As with the following one hundred chicken puzzle: each rooster is worth of 5 dollars, and each hen 3 dollars. Three chicks is worth of one dollar. If we can buy 100 checken with 100 dollars, how many roosters, hens and chicks do we buy actually?We can resolve the issue in the following way:
 
In the codes, A2 loops on the total number of rooster. We need to judge in B2: if the total price of the rooster is more than 100, there must be too many roosters. We can use break command in C2 to end the loop. In B3 we continue to loop with the possible number of hens. If the current total price of chicken is more than 100 in D4,we know that there are too many hens. We can use next A2 to add the number of roosters and try again. increasing the total number of cocks and try again. In D5, the current total price of chicken is still less than 100. We can continue to increase the total number of hens and try. The next statement here can skip from the deepest level of loop, which has B3 as the main cell.

July 29, 2014

Debugging Function Comparison RLanguage v.s. esProc

As is well known, in the development process of program, the time consumed to remove and correct the error is usually greater than time spent in coding. Therefore, a friendly debug environment can save a lot of time. In this respect, VB.NET and SQL are two extremes that the former provides almost a perfect Debug environment, while the latter nearly provides no error debugging tool.
R language and esProc as two development tools for computation and analysis are both capable to debug to some degree. We will study on their differences in this respect.

Let’s kick off by making ourselves familiar with the debugging environments of both R (take R Studio for example) and esProc from their respective interfaces:

R Studio Debugging Environment:
 
esProc's Debugging Environment

 

Let's compare the basic functions. 
Break point: For R, the break point is set by inserting browser () into the codes. Users have to remove these statements manually once debugged, which seems to be  back to the cherished old times of using BASIC to code when Windows was not invented, impressed us with a strong feeling of reminiscence. In those days, removing the stop break point statement is even an important job before releasing codes. By comparison, the break point style of esProc is similar to that of VB.NET and other alike modern programming languages. By clicking the button or pressing shortcut keys, the break point can be set to the cell in which the mouse cursor is located. This is nothing special.

Debug command: with the same style of break point, debug command of R is input from the console, including to resume running,n to go run the next statement, and Q to exit the debug mode. In addition, there are also functions like trace/set Break point/debug/undebug/stop.It is important to note that it would be best not to have any variable named after c, n, and Q in the codes. Otherwise, accidental conflicts will occur.

Regarding the procedure control, esProc is no different than VB.net and like programming languages for just requires click(s) on button or shortcut keys to implement, not requiring users to memorize any command, as we all know. 

Variable watch: The variable watch window of R language is on the right, in which all current variables will be listed. On clicking it, a new window will prompt to display the value of this variable. Alternatively, R language users can also enter the fix(variable name) at the command line window as shown below. In the right bottom corner of esProc user interface, there is a similar variable list. Seldom do esProc users use this list because esProc does not require users to specially define the variable name. The name of cell is taken as the variable name by default, and thus users can simply click the cell to review the variable value. 

One thing to notice is that R is friendly to display the variables of Frame format. However, it is comparatively not so friendly to support the irregularly-structured variables that we can say it is unreadable at all, as the below typical List for example:
 

esProc does a much better job in this respect. For the same data, in esProc, it is represented by drilling through the hyperlinks:
 

Then, let’s compare some more advanced functions, and start from checking the Immediate Running first.

As for esProc, a cell will be calculated immediately and automatically once codes are entered into this cell. Therefore, the developers can view the execution result immediately and adjust the code for re-run on conditions. This style can speed up the development speed and lower the probability of errors, allowing the green hand to become familiar with it quickly. R Studio provides the similar means that more resembles the "immediate window" of VB, that is,user's type in codes at a command line window and run immediately. If it is run correctly, then copy the codes to the formal code section. Judging on the whole, R is less convenient than esProc in this respect.

Finally, let us discuss the function to debug the functions separately.

R users can use the debug(Function Name) to debug the functions separately and directly so as to modularize in development and implement the large-scale test. esProc users,on the contrary, are not allowed to debug the function separately, which is a pity more or less. However, the debug function of R language has not implemented the true “separate” test. Its working principle is actually to add a browser () command prior to the function to be debugged, still requiring running all codes before entering the function to debug. 

From another perspective, such computational analysis software is rarely used for the large-scale development and test. There is not much significance and value for its ability to debug function separately. 

Through the above comparison, we can see that both R and esProc provide some debugging functions. In which,esProc is performing better in terms of convenience and usability.

Set Operators in esProc

Since sets are commonly used in esProc, the latter provides comprehensive set operations.

1. Binary Operation on Sets

The most basic set in esProc is sequence. Let’s look at some basic binary operations between two sequences A and B.

  • A|B

Concatenate: Concatenate the two sequence straightforwardly, of which the member of B is added behind the members of A. If A or B or both A and B are the single-value member instead of sequence, then it will be handled as a single member sequence.

  • A&B

Union: Join the two sets and remove the members of B that already exist in A. If one of A and B or both vales are the single-value member instead of sequence, then you can take it as single member sequence to process.

  • A^B

Intersection: Intersection set of A and B. Get a sequence composed of members that not only exist in A but also exist in B.

  • A\B

Complement: Members in A but not in B. If B is not a sequence, then treat it as the single member sequence.

  • k*A

A|A|…|A, copy A for k times, positions of k and A are interchangeable in the expression.



These basic operations - concatenate, complement, intersection and union - can also be used to deal with more complex set operations. Such as, to seek the "inverse intersection" of sets A and B, i.e., the set composed of all that is not the common members of A and B.
 

The "inverse intersection" of two sets is the complement of their union and intersection. With this kind of transformation, more complex binary operations will be completed.

2. Alignment Arithmetic Operation on Sets

Two sequences which have the same length will perform the alignment computation on members, and return a new sequence.

  • A++B

[A(1)+B(1),A(2)+B(2),…]

  • A--B

[A(1)-B(1),A(2)-B(2),…]

  • A**B

[A(1)*B(1),A(2)*B(2),…]

  • A//B

[A(1)/B(1),A(2)/B(2),…]

  • A%%B

[A(1)%B(1),A(2)%B(2),…], the % here is the Mod computation.

  • A\\B

[A(1)\B(1),A(2)\B(2),…], the slash here means integer division


3. Comparison of Sets

In esProc, the function cmp(A,B) can be used to compare the sequence A and B.

  • cmp(A,B)

Compare the member values of two sequences at the same location one by one, and return the 1 or -1 when encountering the first unequal member, if A is identical to B, then return 0. Specifically, cmp(A) or cmp(A,0) will compare A with the sequence with the same length and members are all 0, i.e. cmp(A,[0,0,…,0]).


Comparison of two sequences can be briefly expressed as A==B, A>B.

Note that a sequence in esProc is an ordered set, so order plays an important role in comparing the size of two sequences A and B. A.eq(B) is used to see if the two sequences have common members.



Because the member order in A1 and A2 is different, result in A3 is false, showing the two sequences are not equal. While result in A4 is true, showing the two sequences have the same members.

July 28, 2014

Implement Basic Functions of SQL through R Language & esProc

As we know, SQL users have to finish composing all codes and then run them all at one time, resulting in inconvenient interactive computing. However, the simple and easy-to-understand query syntax of SQL is always welcomed by programmers. R language and esProc as the computing and analysis tool are surely required to offer the similar query syntax. In this essay, we will discuss how they implement the basic functions of SQL through R language and esProc through some examples.

The example data is from 2 tables of the classical Northwind database:
Orders table with the main fields: OrderID, EmployeeID, OrderDate, Freight, CustomerID

Customer table with the main fields: CustomerID, CompanyName

Retrieve data of the entire table
SQL: select * from Orders

R: A1<-sqlQuery(conn,'select * from Orders')

esProc: $select * from Orders

Comments: The word count of esProc and SQL codes only differs by one word. The R language has a strong flavor of programming.

Where: Search for the order which has a freightage higher than 100 and is placed before the year of 1998
SQL: select * from Orders where Freight > 100 and OrderDate < '1998-01-01'

R : subset(A1,Freight > 100 & OrderDate < as.POSIXlt('1998-01-01'))

esProc: =A1.select(Freight > 100 && OrderDate < date("1998-01-01"))

Comments: Both SQL and R solutions are close to SQL to some extent. R is of the typical function style, and esProc is of the typical object style.

Order: Sort by employees correctly, and then sort by freightage in reverse order
SQL: select * from Orders order by EmployeeID, Freight desc

R: A1[order(A1$ EmployeeID,-A1$Freight),]

esProc: =A1.sort(EmployeeID,Freight:-1)

Comments: R solution is to retrieve 2 vectors from A1 at first, pass them to the function order to group them together for sorting, and then export the serial numbers. At last, data will be rearranged according to the serial number. The computing process of R language is rather winding, not as straightforward as SQL. It is because that R is good at handling vector, the access to structured data will take column as the basic unit, and the parameter usually takes up a whole column. By comparison, SQL takes the record (column) as the basic unit, with parameter as the column name.

esProc solution resembles that of SQL because esProc takes the record (row) as the basic unit .

Group & Sum: Summarize by employee, sum up the freightage, and count the orders:
SQL: select EmployeeID, count(OrderID), sum(Freight) from Orders group by EmployeeID

R :
A4<-aggregate (A1$Freight,list(A1$EmployeeID),sum)  
A4$count<-tapply(A1$Freight,A1$EmployeeID,length)

esProc : =A1.group(EmployeeID;EmployeeID,~.count(),~.sum(Freight))

Comments: In this case, it is obvious that R and SQL differ greatly. The algorithms available in R may be clearer for mathematicians, and means more learning efforts for users accustomed to SQL.

Join:  Perform left join on Orders table and Customers table by CustomerID.
SQL :select * from Orders leftjoin Customers on Orders.CustomerID=Customers.CustomerID

R :merge(A1,B1,by.x="CustomerID",by.y="CustomerID",all.x=TRUE)

esProc: =join@1(A1:CustomerID:Orders, B1:CustomerID:Customers)

Comments: The join of SQL equals to join of esProc or the merge of R. Similarly, the left join of SQL equals to join@1 of esProc, or merge(...all.x=TRUE) of R. Obviously, esProc is more alike SQL in the respects of both the syntax conventions and the literal meanings.

Distinct: Remove the duplicate CustomerID
SQL: selectdistinct CustomerID fromOrders

R: unique(B2$CustomerID)

esProc: =B2.id(CustomerID)
Comments: The keywords of the two solutions respectively differ to that of SQL. However, their usages are basically the same to that of SQL. In which, R is the typical function style, and esProc is the typical object style.

Like: Search for the record with Island in ShipName
SQL : select * from Orders where ShipName like '%Island%'

R: subset(A1,grepl("Island",ShipName,ignore.case = TRUE))

esProc : =A1.select(like@c(ShipName ,"*Island*"))

Comments: R supports several means to match, including the regular expressions, and is more powerful than esProc in this respect. The usages of esProc are more close to that of SQL, and fit for those who are familiar with SQL.

As can be seen from the above comparison, esProc has a coding style more close to that of SQL since esProc supports the data structure of “Record” by the infrastructure, making it more suitable for users who are familiar with SQL. In addition, compared with R language, esProc provides the representation style of "object + functions" that is much easier for those who are familiar with database to accept.
R is more resourceful in details and third party support, ideal for mathematicians.