Sunday, May 5, 2019

Powerful Script Data Sources for BIRT Report

1. Preface: JVM-based SQL functions and stored procedures

Some databases, such as MySQL, don’t have analytic functions. Some others, such as Vertica, don’t support stored procedures. They turn to external Python or R script, or other languages, to deal with complicated data computations. But the scripting languages and Java, the mainstream programming language, are integration-unfriendly. Often, a lengthy Java code that tries to replace SQL functions or stored procedures aims at achieving a certain computing goal, and is unreusable.

It’s not easy to implement complicated logics even with analytic functions. Here’s a common computing task: Find the first N customers whose sales accounts for half of the total sum and sort them by amount in descending order. Oracle implements it this way:
with A as
  (select CUSTOM,SALESAMOUNT,row_number() over (order by SALESAMOUNT) RANKING
  from SALES)
  select CUSTOM,SALESAMOUNT
  from (select CUSTOM,SALESAMOUNT,sum(SALESAMOUNT) over (order by RANKING) AccumulativeAmount
  from A)
  where AccumulativeAmount>(select sum(SALESAMOUNT)/2 from SALES)
  order by SALESAMOUNT desc
The Oracle script sorts records by sales amount in ascending order, and then finds the customers whose sales amount to half of the total sum in an opposite direction according to the condition that the accumulated amount is greater than half of the total sum. In order to avoid window function’s mistake in handling same sales amounts when calculating the accumulated value, we calculate the sales amounts rankings in the first subquery.

esProc script:
A B
1 =connect("verticaLink") /Connect to Vertica database
2 =A1.query("select * from sales").sort(SALESAMOUNT:-1) /Get the sales records and sort them by sales amount in descending order
3 =A2.cumulate(SALESAMOUNT) /Calculate a sequence of accumulated values; the function is a replacement of database window function
4 =A3.m(-1)/2 /Calculate half of total sales amount
5 =A3.pselect(~>=A4) /Find the position in the accumulated value sequence where half of total sales amount falls
6 =A2(to(A5)) /Get the record where half of total sales amount falls and records before it
7 >A1.close() /Close database connection
8 return A6 /Return A6’s result
Instead of the complicated nested SQL plus window function, esProc uses concise syntax to implement the computing logic. Being applicable to all databases (data sources), the code is more universal.

esProc is driven by a JVM-based scripting language intended to handle structured data. As SQL functions and stored procedures, it can be integrated with a Java application to create migratable, versatile and database-independent computing logics. Such a computing logic run as a middle layer is separated from the data logic run in the database (data source) layer. The separation makes the overall application more scalable, more flexible and more maintainable.

2. Application scenario: Report data preparation


2.1 Reporting architecture

001png
An esProc script embedded into the reporting layer is like a local logical database that doesn’t need deploying a server specifically. It stands as a data preparation layer between the reporting tool and data source for performing various complicated computations.

2.2 Integration

Let’s look at how to integrate esProc as the data preparation layer (take Vertica and BIRT as the example).

I. Integration of basic jars
esProc JDBC has three basic jars, which are situated in [installation directory]\esProc\lib :
dm.jar esProc computing engine and JDBC driver 
jdom.jar Parse configuration files 
icu4j_3_4_5.jar Handle internationalization
Besides, there are jars for achieving specific functionalities. To use databases as the data sources in esProc JDBC, their driver jars are required. As Vertica is the data source here, the corresponding jars are needed (Take Vertica 9.1.0 as an example).
vertica-jdbc-9.1.0-0.jar   Download it from Vertica website
Those jars should be copied and placed under BIRT’s [installation directory]\plugins\org.eclipse.birt.report.data.oda.jdbc_4.6.0.v20160607212.

II. Deploy the configuration file
The configuration file, raqsoftConfig.xml, contains license information, script file path, data source connection configuration information, and etc.
It is located in [esProc installation directory]\esProc\config, and needs to be copied and placed under BIRT designer class path [installation directory]\plugins\org.eclipse.birt.report.data.oda.jdbc_4.6.0.v20160607212.
The file’s name must not be changed.

2.3 BIRT development environment

1. Copy all the required jars under BIRT’s WEB-INF\lib;
2. Copy raqsoftConfig.xml under BIRT’s WEB-INF\classes.


2.3.1 Example 1: Normal call

 1. Below is Sales table in Vertica database. (The table contains data of the years 2013, 2014 and 2015, and queried via vsql)
002png
  
     
2. Create an esProc script
  
(1) Put Vertica JDBC driver jars into esProc designer path
Download JDBC driver jar (vertica-jdbc-9.1.0-0.jar, for instance) from Vertica website, and put it under [esProc installation directory]\common\jdbc.
  
(2) Add Vertica data source
Open esProc designer, click Tool -> Datasource to add the Vertica data source in JDBC way.
003png
Click OK to save the configuration and then Connect to connect to the data source.
004png
The data source is successfully connected once the data source name turns pink.
  
(3) Create an algorithm script (saved as VerticaExternalProcedures.dfx) through File – >New.
A B
1 =connect("verticaLink") /Connect to Vertica database
2 =A1.query("select * from sales").sort(SALESAMOUNT:-1) /Get the sales records and sort them by sales amount in descending order
3 =A2.cumulate(SALESAMOUNT) /Calculate a sequence of accumulated values; the function is a replacement of database window function
4 =A3.m(-1)/2 /Calculate half of total sales amount
5 =A3.pselect(~>=A4) /Find the position in the accumulated value sequence where half of total sales amount falls
6 =A2(to(A5)) /Get the record where half of total sales amount falls and records before it
7 >A1.close() /Close database connection
8 return A6 /Return A6’s result to BIRT as the report source data set

3. Deploy the script
Put the script file under the script file main directory configured in raqsoftConfig.xml.
005png


4. Configure data source connection: verticaLink, in raqsoftConfig.xml

<DB name="verticaLink">
    <property name="url" value="jdbc:vertica://192.168.10.10:5433/ForEsprocTestDB"/>
    <property name="driver" value="com.vertica.jdbc.Driver"/>
    <property name="type" value="0"/>
    <property name="user" value="dbadmin"/>
    <property name="password" value="runqian"/>
    <property name="batchSize" value="0"/>
    <property name="autoConnect" value="false"/>
    <property name="useSchema" value="false"/>
    <property name="addTilde" value="false"/>
    <property name="needTransContent" value="false"/>
    <property name="needTransSentence" value="false"/>
    <property name="caseSentence" value="false"/>
  </DB>

5. Create a new report BIRT report designer and add esProc data source: esProcConnection.
006png
The Driver class is com.esproc.jdbc.InternalDriver(v1.0), which needs dm.jar and other jars. Database URL is jdbc:esproc:local://

6. BIRT calls esProc data set (Vertica’s external stored procedure)
Create a new data set; select the esProc data source (esProcConnection); the data set type is SQL Stored Procedure Query.
007png
Next, enter {call VerticaExternalProcedures()} under Query Text. VerticaExternalProcedures is esProc script file name.
008png

Now we can preview the computing result with Preview Results.
009png
That’s the process of how to use esProc script as Vertica’s external stored procedure to prepare data source for a report.
  
7. Web presentation
Take a grid report as an example. Below is the report design:
010png
  Publish preview:
011png

2.3.2 Example 2: Parameter-based call

We change the above computing task a bit. Find the first N customers whose sales accounts for half of the total sum by year and sort them by amount in descending order. The task requires a parameter filtering.

1. Add a year parameter for filtering.
Open esProc designer, and click Program –> Parameter –> Add to add parameter qyear (the name can be different from a report parameter).
012png

Modified script:
A B
1 =connect("verticaLink") /Connect to Vertica database
2 =A1.query("select * from sales where year(subscriptiondate)=?",qyear).sort(SALESAMOUNT:-1) /qyear is the parameter receiving a typed year to find the corresponding sales records and sort them by sales amount in descending order
3 =A2.cumulate(SALESAMOUNT) /Calculate a sequence of accumulated values; the function is a replacement of database window function
4 =A3.m(-1)/2 /Calculate half of total sales amount
5 =A3.pselect(~>=A4) /Find the position in the accumulated value sequence where half of total sales amount falls
6 =A2(to(A5)) /Get the record where half of total sales amount falls and records before it
7 >A1.close() /Close database connection
8 return A6 /Return A6’s result to BIRT as the report source data set
A2 performs conditional filtering.

2. Define a year parameter for the report
Define an input parameter named qyear for the report.
Open the report, click Data Explorer –> Report Parameter –> New parameter to add the parameter.
013png
The second red box is the default value of parameter qyear.

3. Add a data set parameter and link it with the report parameter
Create data set VerticaExternalProcedures.
014png

There is a bit different about the Query Text, which is {call VerticaExternalProcedures(?)}. The question mark (?) is a placeholder for an input year parameter. Under Parameters, add data set parameter qyear and link it with report parameter qyear.
015png
  
Under Preview Results, query data of the year 2013 according to the default value of qyear.
016png

After passing the value “2015” to the parameter:
017png

4. Web presentation
018png
Query data of the year 2015:
019png

After modifying the URL or passing “2013” to qyear:
020png

No comments:

Post a Comment