Top 10 Tableau Prep Features
Written by George Koursaros - 29-05-2018
Are You Ready For Tableau Prep?
With Tableau you often realise than your data is of a greta quality and you end up spending most of your time trying to structure, shape and format data properly before you can explore it. You can now do your data preparation with Tableau’s own ETL solution called Tableau Prep (formerly Project Maestro), allowing a seamless transition between your data preparation and data analysis.
Let’s dive into Tableau Prep and take a closer look at some of its features!
The Tableau Prep Interface
One of the first things that strikes you when working in Tableau Prep is its intuitive, easy-to-navigate and user-friendly interface. At its core, Tableau Prep is built around three different views which allow you to view your data on three different levels of granularity simultaneously. You get a complete understanding of what the data looks like and how it’s distributed. Let’s consider each of the three views in more detail.
(i) Flow Pane
This is the uppermost pane in the view, which acts as a visual guide showing all steps in the ETL process from beginning to end. The flow pane allows you to keep track of the sequence of steps.
To learn more about each of the steps please visit the official Tableau documentation.
(ii) Profile Pane
The profile pane is the middle pane in the view and acts as a high-level overview of the data, illustrating each fields’ profile. You can find out the number of unique values contained within each field and how they are distributed. This is one of the most accommodating pane since it facilitates most of the cleaning operations that you can perform on the data and it visually highlights the relationship between the fields.
(iii) Data Grid
This is the bottom pane in the view, which illustrates the lowest granularity of the data; that is the row level. Any edits performed on the data in the flow or profile pane are mirrored in the row level data, allowing you to consistently track the format of the data.
Tableau Prep Top 10 Features
1) Seamless Tableau Integration
There are many ETL tools out there in the market and the competition is only getting fiercer. So why should you use Tableau Prep? Well, for one, Tableau Prep is included in the Tableau Creator license and is essentially free.
Learn more about Tableau product pricing here.
Prep is fully integrated into the Tableau analytical workflow so you can quickly and easily move from data preparation to data exploration.
At any point in your workflow you can open your the result of your Tableau Prep flow in Tableau, to explore the data straight away. If you’re happy with the result you can quickly insert an output node and save the now prepared data locally or publish it as a data source.
2) Data Distribution Display in the Profile Pane
The first step for any analyst should be to understand the nature of the data they’re dealing with, their type, form and characteristics. When dealing with a broad dataset containing many fields; it is hard to maintain a visual reference of what the data looks like. Tableau Prep’s profile pane helps you navigate and understand the overall dataset faster.
For each field in the dataset you can see the number of unique values and their distribution via the use of histograms. When are dealing with strings or date fields, Tableau Prep will display a discrete histogram. If the number of fields is too large to fit in the view a summarized distribution is displayed on the right-hand side.
For numerical fields you have the option of using a discrete histogram or alternatively a continuous histogram with bins like the one below.
When you click on a value within a field, the associated data in all other fields are highlighted to give you a sense of the relationship between fields. In our example below clicking on California shows you the distribution of California products in the rest of the fields.
3) Union Files at Input Step
(Note: This feature is currently available for text and Excel files only.)
Tableau Prep offers you the ability to union files together during the input step, referred to as a wildcard union. Alternatively, this can be done using a Union step. You can specify the directory to search for files in and a Matching Pattern to find files that have those characters in the file name.
In the example above, we can see an input step for the file “orders_south_2015.csv”. If you choose Single table then the inputted data will be limited to the data contained in the file. By using a Wildcard union with the specified matching pattern, Tableau Prep can identify all files that match that string condition and union them straight away.
4) Group and Replace Algorithms
Grouping in Tableau Desktop is an operation that combines dimensions members into higher level categories. The process is done manually. It comes at a low cost when dealing with small grouping tasks but will take you longer if you’re dealing with a large dataset such as in the example below.
Tableau Prep also facilitates the use of two fuzzy match algorithms which can search for and automatically group similar values.
This grouping option uses the Metaphone 3 algorithm to group together field members that sound alike. It works by indexing words according to their pronunciation and it is most suitable for use with English words.
• Common Characters
This grouping method uses the ngram-fingerprint algorithm to group values that have letters or numbers in common. It indexes words based on their unique characters after having removed whitespace, punctuation and duplicates.
The use of either fuzzy match algorithm in the example above results in the output below, with a single click. How about that for a time-saver?
5) Quick Cleaning Operations for Multiple Strings
Many would agree that when it comes to data types, strings can be a real pain to deal with. Not only do strings take up more memory than floats, integers or booleans, they are also the slowest to process, often trapping the user in a long, iterative prison of mundane, string-correcting tasks. Tableau Prep to the rescue!
Tableau Prep includes a list of commonly used cleaning operations for strings, that is available for string fields in the profile pane.
Something worth mentioning here, is that these operations are automatically applied to every value in the string field with a single click. No scripting required! And since the operation is applied to every character present in the string, they account for small error variations between field values too!
6) Changes Pane
This is arguably one of Tableau Prep’s most useful features. You can think of the changes pane as a log which keeps record of every data cleaning operation the user performs, in the chronological order in which they occur.
Whenever changes are made to the data, in the flow pane, annotations are added to the associated step where the changes occur. The various cleaning operations are represented by an icon over the cleaning step. The same icon appears in the change log to denote the type of task performed.
In the example above, we can see the 6 cleaning operations performed in a particular step in the flow, each associated with their respective icon and accompanied by a description of the exact cleaning operation. In case the chronological order of operations is unsatisfactory, the user can re-arrange the cleaning tasks in a different order, ensuring that along with record-keeping abilities the changes pane provides editing flexibility as well.
7) Pivot Step
Sometimes, analyzing data in a crosstab format can be difficult in Tableau, or simply unsuitable for certain data sources. Oftentimes therefore, the user may need to convert data from a crosstab format to a columnar format, in order to accommodate an easier analysis in Tableau. This can easily be done using a Pivot Step. Take the example below for instance.
In the original dataset, data is recorded in a crosstab format, where each year is recorded as a separate field containing regional quotas information. For their use in Tableau however, we would much rather have the quota data displayed in two columns. A date column and a quotas column.
Using a pivot step, we can select the fields we want to pivot and rename them accordingly. The result is a columnar dataset in the appropriate format, ready to be explored further.
8) Aggregation Step
If you’re a hardened Tableau user, you have probably come across level of detail calculations and the concept of data granularity. Well, sometimes the user might need to adjust the granularity of the data to either reduce the amount of data produced from the flow or to align the granularity of data with data they might want to join or union with. Let’s consider the previous example of annual, regional quotas. What if we want to join this dataset with a dataset whose level of detail is at the individual transaction level rather than the regional level? In Tableau Prep we can easily account for this disparity using an Aggregation Step.
As illustrated above, an aggregation step allows you to aggregate data at the level of detail defined by the grouping fields. Since we want to join data at the annual, regional level, for this example we are grouping using the dimensions for Region and Year of Sale. In turn, the aggregated fields on the right-hand side are the measures in the dataset which we wish to aggregate to the chosen level of detail; such as discount, profit or sales.
9) Join Step
Whether you’re dealing with data stored in an SQL database, a data warehouse or an excel file stored locally, it’s often the case that the data you wish to analyze is made up of tables that are only related by specific fields. Joining is a process by which data is combined on their common fields, resulting in a table that’s typically extended horizontally.
In the example above, there are two data sources color-coded yellow and red which contain different fields. If we want to produce a single unified data set which includes all these fields, we need at least one common field on which to join. These fields are referred to as Join Clauses. In our example, we are joining the data sources on Product ID and Order ID. The resulting dataset is one which is wider, since it contains more fields and is color-coded green.
10) Union Step
Unions and joins are similar, in that both can be used to combine data into a unified source. Unlike joins however, unions are used to combine data by appending rows of a table unto another table. Intuitively, unions are suitable when the tables to be combined, contain the same fields but different data in the rows. When tables are combined using a union therefore, they typically produce a table that is extended vertically. This can easily be done in Tableau Prep!
In the above example there are 4 separate datasets, each of them color-coded differently and containing sales information for different regions of the US. If we wanted to create a single data set which contains sales information for the whole of the US, we would need to Union them as above. The color-coded bar below each field is there to help the user identify which data source the data for that field is coming from. We can quickly see, for instance, that the discount field is absent in the blue data source.
Over the course of its lifetime, Tableau has really grown to be the gold standard of intuitive and interactive data analytics and one of the biggest players in the business intelligence industry. Now, with Tableau Prep they are taking a step towards the ETL side of data and in that they have made a solid beginning. Tableau Prep can help you overcome various data quality issues in an intuitive and user-friendly environment which is fully integrated with Tableau. To top it all, it comes at no extra cost! It’s exciting to think about what the future of Tableau Prep might hold! Which features are your personal favorites? What other features would you like to see in the future versions? Tell us in the comments section below.