ETL (Extract, Transform, Load) has traditionally been the domains of data warehousing professionals, but it could be applied to any process or group of processes that load data into databases. Data is the lifeblood of any business. However, data by itself is not too interesting – what is interesting is the information that the data can be processed into. Writing data change routines might be pervasive, but the real change code is normally not so challenging.
More challenging is to divorce the change into multiple threads and operating them in parallel, since ETL jobs use large data units usually, and we want the working job to complete in an acceptable time. Business application developers generally don’t do multithreaded programming too well, mainly because they don’t take action often enough.
Furthermore, the transformation business reasoning is inside the application form code, this means it can’t be sanity examined by the business person whose needs drove the transformation in the first place. I found out about Kettle, an open source ETL Tool, from a colleague at a previous job, where he was using it to automate data transformations to press out denormalized versions of data from backend directories to front end databases. Unfortunately, I got a chance to use it at work never, but it continued to be on my ToDo list as something I wanted to learn for later. In my career Early, I worked for the MIS band of a manufacturing plant that manufactured switchboards.
It occured to me that one of the procedures for generating regular factory-wide input costs will be a good candidate to convert to Kettle and understand its functionality. Part of the input costs for the factory for the month were the sum of the actual dollar amount paid out to employees.
- Dash Cam
- 07-12-2019, 03:33 AM
- There is no romantic relationship between elevation and being viewed as a leader
- Adjusted gross income (AGI)
- The body of the email
- Collates and communicates the revise on each task being shipped
This was governed by the worker’s hourly rate and the number of hours worked. The number of hours were derived from the times documented when the employee agreed upon in and from the factory. The values are reported by the division. The shape below shows the circulation. To replicate the procedure, I created a set file for 5 employees in 2 departments (Electrical and Mechanical) which contained in and out times for these employees more than a one-month period. The initial directories involved were dBase-IV and Informix with migration scripts written with Clipper and Informix-4GL, the ones in my research study were MySQL and PostgreSQL.
Kettle comes with four main components – Spoon, Pan, Kitchen, and Chef. Spoon is a GUI editor for building data transformations. Pan is an order-series tool for running a transformation created with Spoon. Chef is a GUI for accumulating jobs, which are a couple of transformations which should work together, and Kitchen is a command-word series tool to perform careers built with Chef again. I balked at first at having to use a GUI to design transformations. I would have preferred a scripting language or some sort of XML configuration to do this, but I guess developers never have been the target market for ETL tools typically.
And I assume the objective of using Kettle is never to do programming for data transformations, and also to a certain extent, scripting is programming. Anyway, using Spoon was quite straightforward, and I was able to create three transformations that could be employed to my toned document dump in series to produce two rows in the CostingDB MySQL desk. Each Spoon Transformation produces as output a .ktr XML file.