Recently Talend introduced new family member – Talend Data Preparation Desktop. Let’s take a look what it has to offer!
Mac Install is pretty straightforward – download, run and move application to application folder. After launching the application, first difference is visible almost immediately.
Data Preparation has it’s own local repository keeping records of preparations you make and application itself looks more like SaaS than standalone application. This redesigned app is a step towards cloud integration that was recently deployed by Talend. Such design apart of being more modern in approach, allows you to etup bookmarks and to open multiple instances in separate tabs.
On the first start, you can go through short tutorial which shows very basics of the interface.
There are 2 main parts of the application: Preparations and Datasets.
Allows you to add files which you will perform actions on. However, data set remains static – you can export your data altered by actions but it will be different from the initial data set. That’s why initial data sets are reusable in different preparations. Also, there’s an option to overwrite existing data set with another file but apart of that nothing fancy to see here. However, for Talend Subsribed users there’s a nice feature to add data set directly from another Talend Job. This connects Talend Data Preparation with other applications from the Talend family. Also, there’s tDataprepRun component in the ‘fully grown’ applications.
There are 4 sample preparations provided with the installation package: CRM Phone Numbers, Clean HRMS Data, Marketing Upload, Create Email Address with respective Datasets. Upon entering any of those, next tutorial is displayed guiding you through the interface. Looking at the nomenclature used (‘Recipe’,’Ingredients’) – It’s clear that Talend Data Preparation is not a tool designed for IT professionals but rather for teams like Marketing or Sales.
The design is very interactive and intuitive – each action done in the ‘Ingredients’ part is being applied immediately to the data set in the middle window. However it takes surprisingly long time to process very simple request like column rename.
One of the nice features is automatic Operation Suggestion when you select specific row. For example if you click on the phone number, then you can add ‘Fill empty cells with text’ or ‘Delete the Rows with Empty Cell’.
Second unusual functionality is that when selecting data in the middle section, bottom right panel gives you few more options to analyze results of your actions.
There’s a nice feature that is easy to miss in the application design. By default, Talend Data Preparation maps assigns column type by analyzing it’s content. Column Header color, however, gives you additional information about data quality in the data set column. Green color indicates proper rows, white blank fields and Amber represents records that does not fit column datatype.
What can you do?
Basically most simple Excel actions can be performed – few examples:
- changing letter case in cell
- concatenate values from cells in different columns
- delete row where cell value match specified input
- remove or add negative value
- compare dates and change date format
- basic math operation like multiplying, subtracting, min/max
- rounding values
- extracting values from cell and split- especially useful for extracting emails!
On top of that you can perform actions on the data set itself like:
- Add data from lookup – allows you to match and add column from another file
- You can edit data source directly by double-clicking cell in the middle window. You can even perform same action on all occurrences of the same value. Even such direct change is added to the stack on the left which is good for keeping record of your actions.
- Super fast creation of actions that you keep on repeating on your source files
- Simple, straightforward and easy to understand basic concepts
- Something to use in teams with little to none data transformation skill
- Not an IT professional tool – no advanced features
- Defining some operations can take a while (slow tool responsiveness) – however this is partly caused by worse native performance of Talend Data Preparation on MacOS described here: https://help.talend.com/pages/viewpage.action?pageId=270078310
Is it worth downloading? It’s hard to say – it looks like someone had a brilliant idea and was forced to add features that didn’t really fit the purpose. In general, automation of small manual changes / actions is a great thing as it bother departments without Master level Excel teammates on board. But adding graphs and box plots to data cleansing software? Talend Data Preparation would like also to be a solution for data discovery which for me is a misfire. There are tools dedicated to do so like Tableau or even Excel with properly formatted / clean data.