OraStory

Loader

November 6, 2009 · Leave a Comment

It’s been years and years since I worked regularly with SQL*Loader-based feeds.

There are loads of tips and tricks regarding formats, encodings, character sets, etc.
I had forgotten nearly everything that I could ever have claimed to know in this area, all aged out of my personal buffer cache, at best distant memories, very distant.

However, I’ve recently had to resurrect some of these distant memories to feed some data via External Tables.

I thought it would be worthwhile to do an incoherent brain dump on some of the particular issues for future reference.

This article is my no means comprehensive and will just touch on some of the considerations specific to what I was doing.

The starting place for documentation on SQL*Loader is the Oracle Database Utilies Manual and similarly for External Tables.

The thing about external tables is that they can use the ORACLE_LOADER or the ORACLE_DATAPUMP access drivers.

My circumstances involved the ORACLE_LOADER driver.

As the names suggest, ORACLE_LOADER is related to SQL*LOADER. In fact, you can use the EXTERNAL_TABLE=GENERATE_ONLY parameter with SQL*LOADER to generate the syntax for your EXTERNAL TABLE.

I think things like this are brilliant but I never use them. I suppose I’m old skool. Neat features that auto generate code mean that a) I don’t necessarily need to understand what’s going on and b) skip the line-by-line sanity checks that I still believe in.

I work in an environment where it’s not so easy to get access to servers to ftp and view files, etc.

As a result, my approach would be to work with SQL*Loader early on and then graduate to external files nearer the time. However, I’m going to skip all that, fast forward past the SQL*Loader bit and move on to the issues.

So the main considerations for loading up data?
- Encoding / Character sets
- Delimiters – field and line
- File Transfer

(I’m going to ignore Endianness for the sake of “clarity”. But it can be a factor. See Byte Ordering for futher information.)

In my experience, when you get errors whilst loading, it’s due to one or more of the reasons above.
More than one?
Sure.

For example, the combination of file transfer mechanism and delimiters.
Ever wondered why there’s confusion over whether you should transfer a file as text (or ascii), as binary? What’s the difference?

As ever, it depends.
- SFTP doesn’t have a text mode, all transfers are binary.
- FTP lets you choose between ascii (default) and binary.
- What about WinSCP and similar tools with their automatic mode?

Automatic modes usually work off the file extension and would pick a text transfer for HTML, TXT, PHP, XML, etc and binary otherwise.

And the differences in mode?

For Text mode, there are two basic methods – either the tool is responsible for doing some conversion to the format supported by the destination or, more commonly, the client converts to a canonical format and the service then does a further conversion to its own format if necessary.

With Binary mode, the raw bytes are transferred as is, i.e. the file is transferred in its original form.

So, the main significance of this related to field and line delimiters – things like tabs and line feed characters are different between platforms, different between Unix and Windows for example.

This was particularly relevant for my file export from SQL Server to Oracle on Linux.

In Windows, a new line is often represented by two characters – one carriage return and one line feed.
In Unix, a new line is normally just a line feed.

Sometimes you see “^M” characters on *nix. What’s this about?
This represents the carriage return part of the newline for Windows as described above.
If you need to, you can get rid of that using dos2unix.

So, in the ACCESS PARAMETERS subsection, if you use the “RECORDS DELIMITED BY NEWLINE” syntax in your external table definiton, what does that mean?

The NEWLINE keyword uses the newline format for your platform – so just a line feed in the case of *nix.

For my process, I decided that I would just go with the format as extracted from SQL Server – a 50:50 decision that there’s no point on expanding on.

So, in my situation, the syntax for an external table should not use the NEWLINE keyword because the format was Windows new lines, the syntax for which can be:

RECORDS DELIMITED BY '\r\n'

Here’s a situation. Things have been going fine in DEV,etc and eventually go to PREPRODUCTION and there’s a problem.

The feeds don’t work. The external table loader isn’t finding the right delimiters.

The question really is how can you tell what characters are in your file?
So, there’s no point looking at the file prior to transfer in case the technological clue (or the person in a manual process) changes the file on transfer. We need to look at it in the destination directory on the destination server.

And what’s the best way to do that?

On a Unix/Linux, a useful command is the od command which dumps files in octal and other formats. For example,

od -c <filename>

which gives ascii characters or backslash escapes.
There’s a lot of output from this command but you don’t need to do the whole file, e.g.

head -2 <filename> | od -x | more

If the file suddenly doesn’t match the expected format, what’s the sort of errors you might get?
Well, if it should have transfered in binary but it was done in text by error, you might get:

KUP-04020: found record longer than buffer size supported

i.e. the lines are running into each other
or under slightly different circumstances

KUP-04023: field start is after end of record

etc.

That’s about it on transfers and delimiters.

What’s there to say about encoding and character sets.
Using the od command above we can see the hex codes to double check the encoding is as expected.
Other that that, in our external table definition we can specify the character set in the ACCESS PARAMETERS section using the CHARACTERSET keyword, e.g.

CREATE TABLE <my_tablename>
(<my_columns>)
ORGANIZATION EXTERNAL
  (  TYPE ORACLE_LOADER
     DEFAULT DIRECTORY <my_directory>
     ACCESS PARAMETERS
       ( RECORDS DELIMITED BY '\r\n'
       CHARACTERSET AL16UTF16
       FIELDS TERMINATED BY '\t'
       MISSING FIELD VALUES ARE NULL  )
     LOCATION (<my_directory>:'<my_filename>')
  )
REJECT LIMIT 0;

and then the expected characterset conversion will take place.
Note that SQLServer tends to use USC2 as its unicode characterset and UCS2 is a subset of AL16UTF16.

It’s Friday, it’s 5 o’clock, I’ve lost focus. I’m done.

→ Leave a CommentCategories: binary transfer · delimiter · external tables · newline · oracle · sql*loader

Learning

October 28, 2009 · Leave a Comment

Earlier this month I attended Tanel Poder’s course “Advanced Oracle Troubleshooting for DBAs and Performance Engineers”.

It was good – definitely worthwhile, lots of useful information and really puts into perspective those scripts of his, like LatchProf, LatchProfX, WaitProf and sample.sql. Plus a whole bunch of other stuff. So definitely worth thinking about if you’ve any training budget (that’s mine blown for forseeable future).

I’ve already mentioned the Metalink Headlines tip and I’ve had a little jump in traffic over the past couple of days from Tanel saying the same thing.

The other top tip was to read James Morle’s book Scaling Oracle 8i. James was there on the second day but that wasn’t why it was recommended. Ignore the 8i in the title, most of it remains as relevant as it ever was. I don’t know how I never read this at the time but I’m ploughing through it on the train and it’s a real, real benefit. It’s not just about Oracle. It’s as much about the stuff it interacts with – OS, IO subsystems, memory, etc.
It’s downloadable. Read it.

→ Leave a CommentCategories: oracle

Metalink headlines

September 16, 2009 · 4 Comments

When I was thinking about whether to go on Tanel Poder’s course, I read the testimonials and there’s a great piece of advice from Tanel via Coskan Gundogar:

I asked him how do you learn all these interesting stuff. He gave me a perfect clue. Sign up for metalink headlines mailing list. Bugs docs they are all there after they are created. Perfect resource.

I just want to emphasise what a great resource these Metalink updates are.

I’ve only been signed up for a couple of weeks but some of the nuggets that come into your mailbox are pure gold.

I’ve learnt some stuff that I would never come across, plus I’m building up links for some great articles and explanations, some of which are invaluable as a reference for yourself or to give others (I’m rubbish at explaining stuff in my own words).

For example, just from the last couple of days, I’ve been notified of these new or updated articles:
(Note: Login to Metalink first before clicking the links)

Character set and character set conversion (comes up all the time on the forums and from other teams at work):

Metalink Scripts:

General:

Interesting behaviour:

To name but a few…

It’s just invaluable stuff.

Sign up if you don’t already.

→ 4 CommentsCategories: Metalink · oracle

Data modelling in the modern world

August 6, 2009 · Leave a Comment

I first read Robyn Sands’s article on the importance of a good data model back when it was published in May.

However, the comments have since moved on and taken a life of their own – a sure sign of a great post – with some excellent points and poignant observations on the frequent neglect of the data model in the agile world.

If you’ve not read it or have not kept updated with the comments, then I urge you to.

→ Leave a CommentCategories: agile · database · design · oracle

Gotcha: Application Contexts

July 14, 2009 · 7 Comments

This is one of those feature behaviours which isn’t surprising, but you probably wouldn’t think about it unless you saw it.

The flop:

I make heavy use of application contexts in flexible, variable input, dynamically-built queries (one of the options mentioned in this asktom thread for example).

(In the latest edition of Oracle Magazine, Tom explains why this is not an approach he would recommend any more. I don’t necessarily agree but more of that another time perhaps)

The Turn:

My Query Builder – a set of code which I’ve always intended to make publicly available but have yet to do so – uses a context with single namespace and typically reuses attribute names between queries – e.g. a product id populated from an on-screen product selector will typically populate a context attribute called PRODUCT_ID.

And a Query Builder implementation will typically reset any context attributes(DBMS_SESSION.CLEAR_ALL_CONTEXT) at the start to clear any parameters from a previous query.

In one of the more complex scenarios, a controller proc constructs three different queries, based on the input parameters supplied, in order to return three resultsets to populate three different parts of a GUI. A bit like this dummy code:

CREATE OR REPLACE PROCEDURE p_get_matching_data (
  i_parameters             IN  some_array_type
  o_cursor1                 OUT SYS_REFCURSOR,
  o_cursor2                 OUT SYS_REFCURSOR,
  o_cursor3                 OUT SYS_REFCURSOR
)
AS
   l_query1 LONG;
   l_query2 LONG;
   l_query3 LONG;
BEGIN
   l_query1 := my_query_builder_pkg.build_query1(i_parameters);
   OPEN o_cursor1 FOR l_query1;
   l_query2 := my_query_builder_pkg.build_query2(i_parameters);
   OPEN o_cursor2 FOR l_query2;
   l_query3 := my_query_builder_pkg.build_query3(i_parameters);
   OPEN o_cursor3 FOR l_query3;
END;

The River:

The problem with SYS_CONTEXT is that it is not bound into the query like a bind variable, nor is it subject to the same read consistency mechanism as block data.

Witness:

DOMINIC@11gR1&gt;VAR r REFCURSOR
DOMINIC@11gR1&gt;CREATE CONTEXT demo_ctx USING context_demo_pkg;

Context created.

Elapsed: 00:00:00.00
DOMINIC@11gR1&gt;CREATE OR REPLACE PACKAGE context_demo_pkg
  2  AS
  3    --
  4    PROCEDURE p_set_context (
  5      i_attribute                IN     VARCHAR2,
  6      i_value                    IN     VARCHAR2
  7    );
  8    --
  9    PROCEDURE p_unset_context (
 10      i_attribute                IN     VARCHAR2
 11    );
 12    --
 13  END;
 14  /

Package created.

Elapsed: 00:00:00.00
DOMINIC@11gR1&gt;CREATE OR REPLACE PACKAGE BODY context_demo_pkg
  2  AS
  3    --
  4    k_namespace CONSTANT VARCHAR2(24) := 'DEMO_CTX';
  5    --
  6    PROCEDURE p_set_context (
  7      i_attribute                IN     VARCHAR2,
  8      i_value                    IN     VARCHAR2
  9    )
 10    AS
 11    BEGIN
 12       --
 13       DBMS_SESSION.SET_CONTEXT
 14       (namespace =&gt; k_namespace,
 15        attribute =&gt; i_attribute,
 16        value     =&gt; i_value);
 17       --
 18    END p_set_context;
 19    --
 20    PROCEDURE p_unset_context (
 21      i_attribute                IN     VARCHAR2
 22    )
 23    AS
 24    BEGIN
 25       --
 26       DBMS_SESSION.CLEAR_CONTEXT
 27       (namespace =&gt; k_namespace,
 28        attribute =&gt; i_attribute);
 29       --
 30    END p_unset_context;
 31    --
 32  END;
 33  /

Package body created.

Elapsed: 00:00:00.00
DOMINIC@11gR1&gt;

Now to demonstrate the different behaviour.

The bind variable:

DOMINIC@11gR1&gt;DECLARE
  2   l_value  varchar2(24) := 'My Test Bind';
  3  BEGIN
  4    open :r for select l_value from dual;
  5    l_value := null;
  6  END;
  7  /

PL/SQL procedure successfully completed.

:B1
--------------------------------
My Test Bind

Elapsed: 00:00:00.00
DOMINIC@11gR1&gt;

No surprises there.
Now the illustration of read consistency:

DOMINIC@11gR1&gt;CREATE TABLE test_data
  2  (col1 VARCHAR2(1));

Table created.

Elapsed: 00:00:01.02
DOMINIC@11gR1&gt;
DOMINIC@11gR1&gt;INSERT
  2  INTO   test_data
  3  VALUES ('A');

1 row created.

Elapsed: 00:00:00.00
DOMINIC@11gR1&gt;BEGIN
  2     OPEN :r FOR SELECT * FROM test_data;
  3     DELETE FROM test_data;
  4     COMMIT;
  5  END;
  6  /

PL/SQL procedure successfully completed.

C
-
A

Elapsed: 00:00:00.00
DOMINIC@11gR1&gt;

Again, no surprises. Read consistency around cursor opening/fetching as you might expect.
Next, the application context.

Firstly, just to prove that it works:

DOMINIC@11gR1&gt;BEGIN
  2     context_demo_pkg.p_set_context('VALUE1','My Test Bind');
  3     OPEN :r FOR SELECT SYS_CONTEXT('DEMO_CTX','VALUE1') FROM dual;
  4  END;
  5  /

PL/SQL procedure successfully completed.

SYS_CONTEXT('DEMO_CTX','VALUE1')
-------------------------------------
My Test Bind

Elapsed: 00:00:00.00
DOMINIC@11gR1&gt;

But if we reset it before we fetch from the cursor:

DOMINIC@11gR1&gt;BEGIN
  2     context_demo_pkg.p_set_context('VALUE1','My Test Bind');
  3     OPEN :r FOR SELECT SYS_CONTEXT('DEMO_CTX','VALUE1') FROM dual;
  4     context_demo_pkg.p_unset_context('VALUE1');
  5  END;
  6  /

PL/SQL procedure successfully completed.

SYS_CONTEXT('DEMO_CTX','VALUE1')
---------------------------------------

Elapsed: 00:00:00.00
DOMINIC@11gR1&gt;

So, once you see it, not entirely unsurprising but maybe not something that you’d considered before ( I hadn’t ). Behaviour neither like a bind nor read consistency like block data.

→ 7 CommentsCategories: 11.1.0.7 · context · oracle · sys_context

Documentation in need of confidence booster

July 5, 2009 · Leave a Comment

Some people attach a lot of weight to what the Oracle documentation says.

Rightly so, most of the time. It is a hugely significant resource.

But there are some documentation bugs / errors.

I was answering a character set posting on the OTN forums and when providing some links to the doco, I noticed the quote below in the official documentation:


I think these secions that describe encoding are out of date. For example, do Java and Microsoft Windows still use UCS-2? Or do they use UTF-16? I think UNIX uses UTF-32. So I am confused about what is the most current information for these sections.

How reassuring ;)

→ Leave a CommentCategories: 11g · Documentation · charactersets · database

Madness

May 29, 2009 · 5 Comments

Just a quick one…

Had a meeting request today to get an idea of the different authorisation groups that will be required when in the future we move to a model that carries the authorisation ACL with the data.

This is a part of an ongoing architectural initiative to turn a logic-heavy database effectively into a bit bucket.

I know these things are going on all the time everywhere in a quasi-religious battle.
I know I’ve talked about it before as have many others elsewhere.
I’m sure I resolved to let it wash over me and not to get bothered by it or get sucked into battles I can’t win.

But this is madness this aspiration that this database, probably one of the most expensive bits of this tech stack, should be reduced to a bit bucket at some point in the future.

With the middle tier doing the data joining…

The middle tier doing what should be the set operations in the database in some sort of horrific giant nested loop operation…

And the middle tier doing the data security, which should be right there with the data in the database.

It’s a mad world.

And in today’s economic climate, this can’t be what they mean by “doing more for less”.

→ 5 CommentsCategories: architecture · database · oracle

Don’t use One.com

May 20, 2009 · Leave a Comment

This is not an oracle-related post.

Just a warning about using one.com for domain hosting.

Unlike previous domain registration and hosting companies which I have used before, with one.com if you don’t give them 35 days notice before the end of your domain “lease” that you do not wish to renew, then you are obliged to pay for a further 12 months. If you then fail to pay to their schedule they then add late payment charges within a week of the domain renewal timeline with threats to raise late payment charges of 50% of the domain renewal fee per week.

Of course, caveat emptor.
I should have read the alleged terms and conditions.

But if you are looking at the various “deals” on domain registration and hosting, bear this opinion in mind.

→ Leave a CommentCategories: domain registration · hosting · one.com

YAGNI?

May 14, 2009 · 2 Comments

YAGNI, it’s a very common term in development teams these days. Overused even.

However, in today’s more austere and economically challenging times, I have some new acronyms for you.

I’m not really up on popular acronyms so these may already exist in some form or another, but here are some more appropriate sayings that I’m saying/seeing more and more, together especially:

I Do Bloody Well Need It (IBLOWENI – ha ha) … but … NO-one’s Gonna Pay For It (NOGOPAFI).

→ 2 CommentsCategories: development · oracle · yagni

Motivation

February 19, 2009 · 5 Comments

I’ve mentioned before that I’m lucky enough, given my own database-heavy focus, to be currently servicing a client whose applications tend to be database-heavy in terms of logic.

They are moving their applications gradually, yet wholesale and with no exceptions, towards an N-tier architecture throughout the company and their particular interpretation of N-tier means treating the database like a bit bucket (the only place I’ve been where they treat n-tier and database independent as synonymous although I know this is not an uncommon misunderstanding).

Today I had a conversation with one of the .Net developers along the lines of “you must be gutted given your database expertise.”

To which my response was that I was not particularly bothered which tier had the most lines of code – at the end of the day, it’s not a competition.

There are four things which primarily motivate me, or give me the most pleasure, in a professional capacity:

  1. Making slow, inefficient code fly.
  2. *Writing cool satisfying code.
  3. *Good design and good data modelling .
  4. Using databases efficiently.

*Incidentally, when designing and developing, I’m a great believer in the frequent need to not force it but wait for the right way to make itself known (at whatever time of day or night that might be).

So, moving a bunch of business logic (and in general very poorly written database logic) out of the database doesn’t bother me – the location of this logic doesn’t matter as long as it makes sense.

Remodelling and rearchitecting processes out of the database doesn’t bother me as long as it’s done accurately and properly wherever, and data is modelled appropriately.

In fact, the only thing which bothers me is that this rearchitecture is going to eventually involve Hibernate or similar, which will translate a single object save into a hundred individual insert or update statements, no bulk operations, nothing. I don’t like ORM (although I am very fond of OR views or an OR api in the database which can map one object save to as few operations in the database as is necessary).

→ 5 CommentsCategories: oracle