What happened to my day, Dominic?
May 16, 2007 6 Comments
Normally, I rely on a strengthening cup of Starbucks’ venti vanilla skinny latte and a bowl of Dorset Cereals (you can take the boy out of Dorset, but not the boy out of Dorset) to get my brain faintly ticking in the morning (all freebies gratefully received).
Today, however, a poser from Dominic Delmolino also kicked things off.
"
create a directory into a result set using tabs or links, and mimicking what you see on a set of encyclopedias. You know, A-B, C-D, etc. But what I want to do is give a bit more information about what’s in each book, and mimic what usually happens with the letter S (Sa-Sm and Sn-Sz)
"
No problem, I thought, I’ll just rattle this off. If only. What happened to my day?
It was interesting to note my slow but steady degradation from just rattling it off to oh-bollocks-just-get-something-that-bloody-works.
In the end, I think I overdosed a bit on the WITHs, but I think what I’ve done does the job even if it’s not the smartest way. I’m sure that I’ve missed a number of tricks along the way but hopefully on the scale of smartypants to dumbass, I’ve somewhere in the middle not too close to the dumbass end. It’s everyone’s fear that they end up being ridiculed on Oracle WTF. And quite often, one gets so used to doing complex things that one starts to overlook simple and common sense approaches.
The first part was easy – putting the entries into 15 buckets:
with
ac1 as
(select distinct atc.column_name
from all_tab_columns atc)
, buckets as
(select a.column_name
, ntile(15) over (order by a.column_name) nt
from ac1 a)
, contents as
(select b.nt volume
, min(b.column_name) firstentry
, max(b.column_name) lastentry
from buckets b
group by b.nt)
, fromto as
(select c.volume
, c.firstentry
, c.lastentry
from contents c)
select *
from fromto;
VOLUME | FIRSTENTRY | LASTENTRY |
---|---|---|
1 | #_OF_PRODUCTS | BLOCKS_COALESCED |
2 | BLOCKS_FREE | CONSUMED_SCN |
3 | CONSUMER_GROUP | DEFPCTFREE |
4 | DEFPCTTHRES | ENQ_TIME |
5 | ENQ_TIMESTAMP | FS_FAILOVER_CURRENT_TARGET |
6 | FS_FAILOVER_OBSERVER_HOST | INTCOLID |
7 | INTCOLS | LIST_PRICE |
8 | LITERAL_HASH_VALUE | MSGCNT |
9 | MSGID | OPEXPR_N2 |
10 | OPEXPR_N3 | POLSCH |
11 | POLTXT | REF_NAME |
12 | REF_TARGET_TYPE | SERVICE_ID |
13 | SERVICE_NAME | STORAGE |
14 | STORAGESIZE | TRUE_RULES |
15 | TRUNCATED | ZERO_RESULTS |
But it was the next bit that got me stumped.
Eventually, the approach I went for was to break the extremes of each bucket into all the possible substrings and match them against the previous extreme of the previous bucket and find the first substring that was different.
I’m not sure that I’ve explained that very well, if that’s even possible. But given bucket 1 from A – ABACUS and bucket 2 from ABRACADABRA – ANTELOPE, I proposed to break ABACUS into A, AB, ABA, ABAC, ABACU and ABACUS and compare that with ABRACADABRA broken into A, AB, ABR, ABRA, etc which would let me determine that bucket 1 was from A -> ABA and bucket 2 was from ABR -> etc.
Hopefully, you get what I mean.
Anyway, I didn’t find it that easy. Found a few hidden features in my implementation along the way. And I’m sure I missed some tricks along the way. Here it is:
with
ac1 as
(select distinct atc.column_name
from all_tab_columns atc)
, buckets as
(select a.column_name
, ntile(15) over (order by a.column_name) nt
from ac1 a)
, contents as
(select b.nt volume
, min(b.column_name) firstentry
, max(b.column_name) lastentry
from buckets b
group by b.nt)
, fromto as
(select c.volume
, c.firstentry
, c.lastentry
from contents c)
, firstentry_substrings as
(select ft.volume
, ft.volume+1 nextvolume
, ft.volume-1 lastvolume
, ft.firstentry
, it.pos substringno
, substr (ft.firstentry,1,it.pos) substring
from fromto ft
, (select rownum pos from dual connect by rownum <= (select max(length(column_name)) from ac1)) it
where it.pos <= length(ft.firstentry) )
, lastentry_substrings as
(select ft.volume
, ft.volume+1 nextvolume
, ft.volume-1 lastvolume
, ft.lastentry
, it.pos substringno
, substr (ft.lastentry,1,it.pos) substring
from fromto ft
, (select rownum pos from dual connect by rownum <= (select max(length(column_name)) from ac1)) it
where it.pos <= length(ft.lastentry) )
, nextentrycomparison as
(select *
from (select fs.volume volume
, fs.firstentry firstentry
, fs.substring firstalphabet
, ls.lastvolume lastvolume
, ls.lastentry lastentry
, ls.substring lastalphabet
, rank() over (partition by fs.volume order by fs.substringno) rnk
from firstentry_substrings fs
, lastentry_substrings ls
where fs.volume = ls.nextvolume (+)
and fs.substring = ls.substring (+)
and ls.substring IS NULL)
where rnk = 1
)
, lastentrycomparison as
(select * from (select iv.*
, rank() over (partition by volume order by lastalphapos) rnk
from (select ls.volume volume
, ls.lastentry lastentry
, ls.substring lastalphabet
, ls.substringno lastalphapos
, fs.nextvolume nextvolume
, fs.firstentry nextentry
, fs.substring nextalphabet
, lead(ls.substring) over (partition by ls.volume order by ls.substringno) xx
from firstentry_substrings fs
, lastentry_substrings ls
where ls.volume = fs.lastvolume (+)
and ls.substring = fs.substring (+)) iv
where nextalphabet IS NULL
OR xx IS NULL
)
where rnk = 1)
, finally as
(select na.volume
, na.firstalphabet||'... thru '||la.lastalphabet||' ..'
, na.firstentry
, la.lastentry
from nextentrycomparison na
, lastentrycomparison la
where na.volume = la.volume)
select *
from finally
VOLUME | SPINE | FIRSTENTRY | LASTENTRY |
---|---|---|---|
1 | #... thru BLOCKS_C .. | #_OF_PRODUCTS | BLOCKS_COALESCED |
2 | BLOCKS_F... thru CONSUMED .. | BLOCKS_FREE | CONSUMED_SCN |
3 | CONSUMER... thru DEFPCTF .. | CONSUMER_GROUP | DEFPCTFREE |
4 | DEFPCTT... thru ENQ_TIME .. | DEFPCTTHRES | ENQ_TIME |
5 | ENQ_TIMES... thru FS_FAILOVER_C .. | ENQ_TIMESTAMP | FS_FAILOVER_CURRENT_TARGET |
6 | FS_FAILOVER_O... thru INTCOLI .. | FS_FAILOVER_OBSERVER_HOST | INTCOLID |
7 | INTCOLS... thru LIS .. | INTCOLS | LIST_PRICE |
8 | LIT... thru MSGC .. | LITERAL_HASH_VALUE | MSGCNT |
9 | MSGI... thru OPEXPR_N2 .. | MSGID | OPEXPR_N2 |
10 | OPEXPR_N3... thru POLS .. | OPEXPR_N3 | POLSCH |
11 | POLT... thru REF_N .. | POLTXT | REF_NAME |
12 | REF_T... thru SERVICE_I .. | REF_TARGET_TYPE | SERVICE_ID |
13 | SERVICE_N... thru STORAGE .. | SERVICE_NAME | STORAGE |
14 | STORAGES... thru TRUE .. | STORAGESIZE | TRUE_RULES |
15 | TRUN... thru Z .. | TRUNCATED | ZERO_RESULTS |
Dunno. Can’t be right – looks mental.
Looks like the last bit of the query got cut off…, but wow! Pretty neat.
Not at all how I did it, so I learned something…
Yep – had a few problems with my formatting as usual. I’ve also reorganised the two posts back into one and copied your comment from the old now deleted second post onto this one.
Pingback: Combining Hierarchical Queries With Analytics : Ardent Performance Computing
Jeremy Schneider has a much smarter solution than mine:
http://www.ardentperf.com/2007/05/21/encyclopedia-spine-problem/
For the record I mostly copied the idea from the paper Dominic linked – I was thinking along the same lines as you until he posted that link. I wouldn’t have thought of it on my own. Since I didn’t actually come up with a query on my own, I find this more impressive. 😉
Pingback: ITC Test Agg » Blog Archive » Combining Hierarchical Queries With Analytics