The first step in exploring your data with Tableau is examining how the data is presented. When an Excel data source (other than a cube) is already formatted as a cross-tabulation or is otherwise aggregated, options for viewing, aggregating, and grouping in Tableau are limited. Tableau cannot see underlying data points that have already been summarized into a higher level group or order. To take advantage of Tableau's full functionality, you need to normalize the data - that is, format it as raw data - before connecting to it from Tableau.
For example, consider the two workbooks shown below. The first one is a formatted report with repeated headers, empty rows, grand totals, and so on. When you open the workbook in Tableau, your data should instead be in a raw data table like the second one.
Incorrect - Formatted Report
Correct - Raw Data Table
Below are some tips for turning your formatted reports into a raw data table that is ready for analysis in Tableau.
Remove or exclude introductory and other unnecessary text
The first row in the entire file must contain your field headers (or column names). Many reports delivered as Excel workbooks have a block of introductory text at the top. This text may be titles, color legends, descriptions, and so on. Remove all this information before opening the data with Tableau.
Remove unnecessary information at the top of the file.
If you don't want to remove introductory text, you can alternatively create a Named Range that contains just the data. When opening Excel workbooks in Tableau, you can connect to an entire sheet or a named range within a sheet.
In Excel:
Step 1
Select the data.
Step 2
On the Formulas tab, in the Defined Names section, select Define Name.
Step 3
In the New Name dialog box, in the Name text box, Excel offers a name based on the content of the top left cell of the selected data range. Keep this name.
Step 4
In the Scope list, select Sheet1.
Step 5
When finished, click OK.
In Tableau:
Your named range is offered as a table when you connect to the Excel workbook.
Make sure each row contains only one piece of data
This example shows an Excel table that lists students and their grades in three subjects. In a crosstab layout, you have a column for each subject. In this table, each row contains three pieces of data: the student's grade in Math, grade in English, and grade in Science.
ID | Gender | School | Math | English | Science |
1 | M | West | 90 | 80 | 70 |
2 | F | South | 50 | 50 | 50 |
3 | M | Central | 90 | 80 | 90 |
4 | M | Central | 50 | 80 | 80 |
5 | M | West | 100 | 90 | 100 |
6 | F | West | 80 | 80 | 60 |
7 | F | South | 50 | 80 | 100 |
8 | F | Central | 80 | 50 | 100 |
9 | M | South | 70 | 80 | 80 |
Replace the columns Math, English, and Science with a single column: Subject. Now the table contains three rows for each student, but each row contains only one grade.
ID | Gender | School | Subject | Score |
1 | M | West | Math | 90 |
1 | M | West | English | 80 |
1 | M | West | Science | 70 |
2 | F | South | Math | 50 |
2 | F | South | English | 50 |
2 | F | South | Science | 50 |
3 | M | Central | Math | 90 |
3 | M | Central | English | 80 |
3 | M | Central | Science | 90 |
4 | M | Central | Math | 50 |
4 | M | Central | English | 80 |
4 | M | Central | Science | 80 |
5 | M | West | Math | 100 |
5 | M | West | English | 90 |
5 | M | West | Science | 100 |
6 | F | West | Math | 80 |
6 | F | West | English | 80 |
6 | F | West | Science | 60 |
7 | F | South | Math | 50 |
7 | F | South | English | 80 |
7 | F | South | Science | 100 |
8 | F | Central | Math | 80 |
8 | F | Central | English | 50 |
8 | F | Central | Science | 100 |
9 | M | South | Math | 70 |
9 | M | South | English | 80 |
9 | M | South | Science | 80 |
Limit headers to a single row
Not only should the first row contain your field headers, but also this should be the only row of headers. If you have headers that include some type of "categorical" breakdown above them, create a new column that contains the category.
In this example, East is removed as a categorical header, and a new column, Region, is added to the table.
Fill blank cells
If you have created a new column for categories, make sure to fill the blank cells so that the information is repeated for each row of data, not just the first occurrence. While this seems redundant, it is important that each record (or row) has data across all the columns.
Clean up aggregated and descriptive data
Make sure to remove the rows that do not contain raw data records. For example, an Excel report may have rows that contain descriptive information and Grand Totals rows. You can easily add totals in Tableau and do not need to calculate them in your data source.
Delete blank rows and duplicate headers
Remove any blank rows and rows that contain duplicate headers.
Add missing headers
If any column does not have a title, make sure to add one. Be descriptive when writing your column headers.
Reshape the entire structure of your Excel data using Tableau's Excel Reshaper Plug-in
Even if you have followed all the suggestions shown above, you may still have data in a format that is not ideal for Tableau, purely from an analytic perspective. For example, you may still have a column for each month of business data, which Tableau interprets as separate columns, making month-to-month comparisons difficult.
You can use the Tableau plug-in for Excel to reshape your data. You still need to follow any/all of the steps in this article. Download the plug-in from the Installing the Tableau Add-in for Reshaping Data in Excel knowledge base article.