ETL (Extract, Transform, Load) has traditionally been the domains of data warehousing professionals, but it could be applied to any process or group of processes that load data into databases. Data is the lifeblood of any business. However, data by itself is not too interesting – what is interesting is the information that the data can be processed into. Writing data change routines might be pervasive, but the real change code is normally not so challenging.
More challenging is to divorce the change into multiple threads and operating them in parallel, since ETL jobs use large data units usually, and we want the working job to complete in an acceptable time. Business application developers generally don’t do multithreaded programming too well, mainly because they don’t take action often enough.
Furthermore, the transformation business reasoning is inside the application form code, this means it can’t be sanity examined by the business person whose needs drove the transformation in the first place. I found out about Kettle, an open source ETL Tool, from a colleague at a previous job, where he was using it to automate data transformations to press out denormalized versions of data from backend directories to front end databases. Unfortunately, I got a chance to use it at work never, but it continued to be on my ToDo list as something I wanted to learn for later. In my career Early, I worked for the MIS band of a manufacturing plant that manufactured switchboards.
It occured to me that … Read the rest


