ID (VARCHAR) | NAME (VARCHAR) | AMOUNT (INT) | LAST_UPDATED (TIMESTAMP) |
1 | Eddy | 80000.00 | 2006-03-14-00.10.31.999999 |
1 | Eddy | 90000.00 | 2006-03-14-00.09.31.999999 |
1 | Eddy | 90000.00 | 2006-03-14-00.11.31.999999 |
2 | Lee Sin Ti | 100.00 | 2006-03-14-00.10.31.999999 |
2 | Lee Sin Ti | 200.00 | 2006-03-14-00.09.31.999999 |
Table1
Based on this, your DB2 query is to remove all the outdated records while maintaining only the latest entries.
An idiotic first attempt:
DELETE FROM TABLE1
WHERE
ID || NAME || CAST (AMOUNT AS VARCHAR(32)) || CAST(LAST_UPDATED AS VARCHAR(64))
NOT IN
(
SELECT (ID || NAME || CAST(AMOUNT AS VARCHAR(32)) || CAST(MAX(LAST_UPDATED) AS VARCHAR(64)) ) AS KEY FROM TABLE1 GROUP BY ID, NAME
);
This attempt is definitely a NO-NO. Not only the string concatenations takes a huge amount of processing cycles, it is also UGLY in my point of view. Using a generated random data of 100k records, it takes an unacceptable amount of time to complete the delete query.
Then, my second attempt got to deal with DB2 support of row_number() function, which I greatly appreciated from IBM.
Second attempt:
DELETE FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY ID, NAME ORDER BY LAST_UPDATED DESC)
FROM TABLE1
) AS X (ROWNUM) WHERE ROWNUM > 1;
WOW, an optimized yet elegant query to achieve my goal. With the same random data set, it only took less than 1 minute to complete the query.
No comments:
Post a Comment