Upscaling your JDBC app using Oracle object type collection

Regarding oracle jdbc bulk array inserts, Greg Rahn wrote this week about the performance gains to be had by batching up calls to the database using the array interface.

As an addendum to his excellent points, please find attached a comparison with using an Oracle collection of an Oracle object type – forgive my Java. Using StructDescriptor, STRUCT, ArrayDescriptor and ARRAY structures is unsightly and unintuitive but they can deliver some further performance gains. If only we could wrap this approach up in some user-friendly layer then I reckon we could kick some of these iterative row-by-row ORM tools into touch.

First up, for the baseline, based On Greg’s example, this is what my batch size performance was like inserting 10000 rows into emp on my system:
jdbc-update-batching-performance.gif

And, using an Oracle collection of Oracle object types, uploading the 10000 rows in a single INSERT… TABLE … CAST statement it took 0.219 secondsJava class here.

Which compared very favourably.

Inline scripts:

create type to_emp as object
(EMPNO NUMBER(4)
,ENAME VARCHAR2(10)
,JOB VARCHAR2(9)
,MGR NUMBER(4)
,HIREDATE DATE
,SAL NUMBER(7,2)
,COMM NUMBER(7,2)
,DEPTNO NUMBER(2));
/

create type tt_emp as table of to_emp;
/
import java.sql.*;
import java.util.*;
import oracle.jdbc.*;
import oracle.jdbc.pool.OracleDataSource;
import oracle.sql.STRUCT;
import oracle.sql.StructDescriptor;
import oracle.sql.ArrayDescriptor;
import oracle.sql.ARRAY;


public class bulkInsert {

    public static void main(String[] args) {
        try {

            OracleDataSource ods = new OracleDataSource();
            ods.setURL("jdbc:oracle:oci8:@ora.noldb507");
            ods.setUser("scott");
            ods.setPassword("tiger");
            OracleConnection conn = (OracleConnection) ods.getConnection();
            conn.setAutoCommit(false);

            short seqnum = 0;
            String[] metric = new
                    String[OracleConnection.END_TO_END_STATE_INDEX_MAX];

            metric[OracleConnection.END_TO_END_ACTION_INDEX] = "insertEmp";
            metric[OracleConnection.END_TO_END_MODULE_INDEX] = "bulkInsert";
            metric[OracleConnection.END_TO_END_CLIENTID_INDEX] = "myClientId";
            conn.setEndToEndMetrics(metric, seqnum);

            DatabaseMetaData meta = conn.getMetaData();

            System.out.println(
                    "JDBC driver version is " + meta.getDriverVersion());

            Statement stmt = conn.createStatement();

            stmt.execute("alter session set sql_trace=true");
            stmt.execute("truncate table emp");

			int numberOfEmployees = Integer.parseInt(args[0]);

            STRUCT[] structEmployee = new STRUCT[numberOfEmployees];

            oracle.sql.StructDescriptor descEmployee = oracle.sql.StructDescriptor.createDescriptor("SCOTT.TO_EMP",conn);

            java.sql.Timestamp now = new java.sql.Timestamp( (new java.util.Date() ).getTime() );

            for (int i = 0; i < numberOfEmployees; i++) {

                Object[] empValues = {
                  (i), // EMPNO
                  ("Name" + i), // ENAME
                  ("Job"), // JOB
                  (i), // MGR
                  now , //now
                  (10000 + i), // SAL
                  (100 + i), // COMM
                  (10) // DEPTNO
                };

                structEmployee[i] = new oracle.sql.STRUCT(descEmployee, conn, empValues);
            }

            oracle.sql.ArrayDescriptor empArrayDescriptor = oracle.sql.ArrayDescriptor.createDescriptor("SCOTT.TT_EMP",conn);

            ARRAY empArray = new ARRAY(empArrayDescriptor,conn,structEmployee);

            OraclePreparedStatement psEmp = (OraclePreparedStatement) conn.prepareStatement(
             "insert /* insEmpBulk */ into emp (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO) select * from table (cast (? as tt_emp))");

            psEmp.setObject(1,empArray);

            long start1 = System.currentTimeMillis();

            // Set the batch size for each statment
            ((OraclePreparedStatement) psEmp).execute();

            conn.commit();
            psEmp.close();

            long elapsedTimeMillis1 = System.currentTimeMillis() - start1;
            // Get elapsed time in seconds
            float elapsedTimeSec1 = elapsedTimeMillis1 / 1000F;

            System.out.println("elapsed seconds: " + elapsedTimeSec1);

            conn.close();

        } catch (Exception e) {
            System.err.println("Got an exception! ");
            System.err.println(e.toString());
        }
    }
}

ORM

Aaaaarggghh. 

ORM – Object Relational Mapping – seems to be the bane of a lot of Oracle Developers / DBAs these days.

 I’m not talking about great Oracle features such as Oracle relational views using UDTs (User-Defined Types) and operators such as CAST, TABLE and MULTISET.

More the object/relational persistence layer type things like Hibernate, iBATIS, JDO, SDO, etc.

I get the whole thing about it saving application developers from writing so much code and therefore there is a reduction in testing and errors, and the application gets written that much quicker (in theory), etc.

But often it’s like death by one thousand little generated SQL statements where one bigger one could do the job much more efficiently.  Either that or you find the most horrific query you’ve ever seen has been generated with seemingly hundreds of tables, silly aliases, and hoardes of UNIONs and ORs, etc.

Maybe one of the problems has been that the DAO layer has never been particularly cool or trendy and that most application developers have never been into writing SQL – it’s always been a bit of a chore and boring to them. But SQL isn’t difficult. You’ve just got to think in sets.

And I’m sure that this one of those scenarios where the oft-quoted 80:20 “rule” can be applied – i.e that an ORM tool might make sense 80% of the time, particularly when SQL experts aren’t available. Trouble is that you can turn that silly rule right around and say that the 20% of code where these ORMs don’t work very well take up 80% of the effort.

The problem for me with this is the database becomes more like just a bucket. And a bucket which is accessed by inefficient SQL. The database was born to store data and manipulate data using set operations. More often than not with ORM, we find row-by-row operations, we don’t see efficient set operations, we don’t see bulk operations, we see dynamically generated IN lists, we see lots of OR clauses, etc.

And then, more often that not, when there’s a problem with that SQL, there’s nothing that can be done about it.

Going back to the O-R features in Oracle, these have been steadily developed since 8i. I’m a big fan of creating a layer of O-R views to create more appropriate interfaces for the application to the database entities and have used them to great success in a varietyof commercial JDBC applications. And it always comes as a surprise to the application developers that it is possible to pass Oracle UDT collections back and forward. Granted, the JDBC is a little fiddly, but it’s a good way of doing efficient set/bulk operations on entities that are a little more natural to the OO world than the base relational entities. It’s a pity that ODP.NET does not yet have the same support for these Oracle Types.

Maybe one day all the application developers or (80%) will be replaced by code generators that work from a few boxes and a few lines put together on something like a Visio diagram. I hope not because I consider myself an application developer/designer starting from the database going out to and including the application.

Alternatively, maybe boxes, memory and disk space get so big and fast that these inefficiencies aren’t a concern anymore (either that or the affects of the inefficiencies are magnified).

Varying IN lists – last bit

So, just a final say on this series of blog entries.

The first is that often you start down one workaround and then you find something that doesn’t quite work so you workaround that and before you know it you’re along way from where you’d like to be.

What started as a simple desire to reduce the impact of dynamically generated varying IN lists (some not using bind variables) from a couple of applications was severely complicated by ODP.NET’s lack of support / interoperability with Oracle UDTs and the OCI limit of 4000 characters in a VARCHAR2 bind.

As a result, the current choice is between the original situation – lots of similar SQL – and inserting the values into a GLOBAL TEMPORARY TABLE and then using a query which has a WHERE … IN subquery selecting from that GTT. More on that down below…

When I wrote previously, also under consideration was using a global packaged variable. However this was eliminated as a possibility due to at least three frustrating issues. First up was an idea to use a function that would convert an associative array (supported by ODP.NET) to a similar UDT (just using a FOR LOOP that puts the values from one into the other). Using a bit of PL/SQL to demonstrate (8i hence the lack of SYS_REFCURSOR and the need to declare a REF CURSOR in a package):


CREATE OR REPLACE TYPE tt_number AS TABLE OF NUMBER;
/


CREATE OR REPLACE PACKAGE pkg_types
AS
--
TYPE refcursor IS REF CURSOR;
TYPE aa_number IS TABLE OF NUMBER INDEX BY BINARY_INTEGER;
--
END pkg_types;
/


CREATE OR REPLACE FUNCTION f_tt_convert_error_demo (
i_associative_array IN pkg_types.aa_number
)
RETURN tt_number
AS
--
vt_number tt_number := tt_number();
--
BEGIN
--
FOR i IN i_associative_array.FIRST .. i_associative_array.LAST
LOOP
--
vt_number.EXTEND();
vt_number(i) := i;
--
END LOOP;
--
RETURN vt_number;
--
END;
/


declare
v_number number;
v_cursor pkg_types.refcursor;
va_number pkg_types.aa_number;
begin
for i in 1 .. 10
loop
va_number(i) := i;
end loop;
open v_cursor for
select *
from table(cast(f_tt_convert_error_demo(va_number) as tt_number));
loop
fetch v_cursor into v_number;
exit when v_cursor%NOTFOUND;
dbms_output.put_line(v_number);
end loop;
close v_cursor;
end;

I should have known, but this cannot work due to an error “PLS-00425: in SQL, function argument and return types must be SQL type”.

Incidentally, if you try dynamic SQL, you’ll get a “PLS-00457: expressions have to be of SQL types” instead.

Secondly, just because you can do something in SQL doesn’t mean that the same statement will work in a PL/SQL routine (and vice versa from memory – it tends to be features that are relative new). Following on from SQL above:


CREATE OR REPLACE FUNCTION f_tt_error_demo
RETURN tt_number
AS
--
vt_number tt_number := tt_number();
--
BEGIN
--
FOR i IN 1 .. 10
LOOP
--
vt_number.EXTEND();
vt_number(i) := i;
--
END LOOP;
--
RETURN vt_number;
--
END;
/


SQL> select value(t)
2 from table (cast (f_tt_error_demo as tt_number)) t
3 /
VALUE(T)
----------
1
2
3
4
5
6
7
8
9
10


SQL> declare
2 begin
3 for a in (select VALUE(t) num
4 from table (cast (f_tt_error_demo as tt_number)) t)
5 loop
6 dbms_output.put_line(a.num);
7 end loop;
8 end;
9 /
declare
*
ERROR at line 1:
ORA-06550: line 0, column 0:
PLS-00382: expression is of wrong type
ORA-06550: line 3, column 13:
PL/SQL: SQL Statement ignored
ORA-06550: line 6, column 28:
PLS-00364: loop index variable 'A' use is invalid
ORA-06550: line 6, column 7:
PL/SQL: Statement ignored

but, with dynamic sql:


SQL> declare
2 v_number number;
3 v_cursor pkg_types.refcursor;
4 begin
5 open v_cursor for
6 ' select value(t) '||
7 ' from table (cast (f_tt_error_demo as tt_number)) t ';
8 loop
9 fetch v_cursor into v_number;
10 exit when v_cursor%NOTFOUND;
11 dbms_output.put_line(v_number);
12 end loop;
13 close v_cursor;
14 end;
15 /
1
2
3
4
5
6
7
8
9
10
PL/SQL procedure successfully completed.

Thirdly, the above are distilled code examples of further errors that I came across while trying to put together a bit of example code for this blog entry to show the real problem that I had.

I actually wanted to use the function in a subquery like this:


create table error_demo
as
select rownum col1
from all_objects
where rownum < 11
1 select *
2 from error_demo
3 where col1 IN (select value(t) num
4* from table (cast (f_tt_error_demo as tt_number)) t)
SQL> /
COL1
----------
1
2
3
4
5
6
7
8
9
10
10 rows selected.


SQL> declare
2 v_number number;
3 v_cursor pkg_types.refcursor;
4 begin
5 open v_cursor for
6 select col1
7 from error_demo
8 where col1 IN (select *
9 from table (cast (f_tt_error_demo as tt_number)) t);
10 loop
11 fetch v_cursor into v_number;
12 exit when v_cursor%NOTFOUND;
13 dbms_output.put_line(v_number);
14 end loop;
15 close v_cursor;
16 end;
17 /
from table (cast (f_tt_error_demo as tt_number)) t);
*
ERROR at line 9:
ORA-06550: line 9, column 37:
PLS-00220: simple name required in this context
ORA-06550: line 6, column 4:
PL/SQL: SQL Statement ignored

“PLS-00220: simple name required in this context” – that’s a new one for me, first time I’ve had that error, I think.

But again, with dynamic sql:


SQL> ed
Wrote file afiedt.buf
1 declare
2 v_number number;
3 v_cursor pkg_types.refcursor;
4 begin
5 open v_cursor for
6 ' select col1 '||
7 ' from error_demo '||
8 ' where col1 IN (select * '||
9 ' from table (cast (f_tt_error_demo as tt_number)) t)';
10 loop
11 fetch v_cursor into v_number;
12 exit when v_cursor%NOTFOUND;
13 dbms_output.put_line(v_number);
14 end loop;
15 close v_cursor;
16* end;
SQL> /
1
2
3
4
5
6
7
8
9
10
PL/SQL procedure successfully completed.

So, as I mentioned way up above, these issues really have reduced it down to a choice between inserting the values in a GTT and then using that GTT in an IN subquery SELECT, or just to revert back to the IN lists as they were.

On the GTT front, we can bulk insert the values for the IN list from the application into the GTT using a simple procedure which accepts an associative array (like pkg_types.aa_number above).
One important note is that the GTT as to be created as “ON COMMIT PRESERVE ROWS”, otherwise the application will raise error “ORA-08103: object no longer exists”. This rings a bell from previous experiences using a JDBC app.

However this then requires another sort of workaround to delete the rows in the GTT as the first step in the procedure which bulk inserts the rows….

So, this leaves it all in a funny place where you’ve got to weigh up the evils of all the IN lists with the inefficiency, phaffing around and the general “bad smell” of the GTT approach.

It also leaves two open questions. Firstly, should we have another look at a higher level at the application design to reconsider why these IN lists are being constructed / are required in the first place? And secondly, when will ODP.NET provide the same support for Oracle UDTs as it does for JDBC?

Varying IN lists – part II

I have blogged before about a solution to varying in lists being issued by an application.

The problem related to hard parsing – when these varying in lists used literals rather than binds (resolved by the cursor_sharing paramter) – and to shared pool usage – with every distinct number of binds in an in list requiring a shared pool entry (or parse if absent therefrom).

Having gently guided the application developers down the path of this type of approach, I was feeling pretty pleased with progress. However, two problems have subsequently presented themselves which underline why the initial approach was taken and why the standard solutions approach is not a silver bullet in all circumstances.

One of the applications concerned uses ODP (Oracle Data Provider for .NET). This is one of those situations where my understanding (or lack thereof) leaves me feeling very vulnerable.
ODP sits on top of OCI in a way that is, at this time, beyond my comprehension in terms of driver internals.

The varying in list “design pattern” uses a single delimited string as a single bind variable. As a result, no matter the length of the single bind, there is only a single statement per x binds for that base bit of SQL.

However, OCI has a limt of 4000 for a VARCHAR2 argument. Therefore, depending on the length of each argument in the delimited string, we are severely limited in the number of arguments that can be passed in.

This brings me back to an old bug raised against the current application. In the old varying IN list approach, the application was limited to something like 1000 arguments in the IN list. There are obvious question marks against the application design of any application which is dynamically building such IN lists – where are the arguments in the IN lists coming from? What is the application object to relational mapping such that this is happening? Can we present alternative structures / entities to avoid these scenarios? etc.

However, by pushing the varying IN list solution, the limit of 1000 arguments is now considered a “nice-to-have”. Because at 8 digits, the maximum number of arguments is now closer to 250!

As the length of the single bind variable reaches the 4000 limit, the error raised is “ORA-01460: unimplemented or unreasonable conversion requested”. We have tried a LONG and CLOB alternative and both are met with the same result (although an erroneous implementation of a workaround can never be ruled out).

One of the frustrations here is that the application developers would love to use a more appropriate interface. The natural choice, IMHO, is to provide some sort of array of the arguments concerned. If we were using JDBC, then using proper Oracle collections (CREATE TYPE…. AS TABLE OF) and the appropriate JDBC structures would be fantastic.

However, for some reason, ODP does not yet support the same functionality. According to the documentation, ODP.NET supports associative arrays/index-by tables/plsql tables but not SQL types. Which raises the question, why has this functionality been absent from ODP.NET for so long?

… more …

Follow

Get every new post delivered to your Inbox.

Join 72 other followers