How to Connect Tableau to Hadoop Hive

Tableau supports connections to data stored in Cloudera, Hortonworks, MapR, and, starting in 9.0, Amazon EMR (Elastic MapReduce) distributions using Hive and the data source's Hive ODBC driver. Before you connect to Hadoop Hive with Tableau, review the guidelines in each section of this article to optimize your experience.

Prerequisites and external resources

For connections to Hive Server, you must have one of the following distributions:
  • Cloudera distribution including Apache Hadoop CDH3u1 or later, which includes Hive 0.7.1 or later. Refer to theAdministering Hadoop and Hive for Tableau Connectivity article, or work with your IT team to ensure your cluster is ready
  • Hortonworks
  • MapR Enterprise Edition (M5)
  • Amazon EMR
For connections to Hive Server 2, you must have one of the following distributions:
Additionally, you must have the correct Hive ODBC driver installed on each machine running Tableau Desktop or Tableau Server. Download it from the Drivers page.
This article describes some advanced features and may refer to the following external sources of information:

Connect to Hadoop Hive

Hive is a technology for working with data in your Hadoop cluster by using a mixture of traditional SQL expressions and advanced, Hadoop-specific data analysis and transformation operations. Tableau works with Hadoop using Hive to provide a user experience that requires no programming.
The sections below describe how to get started with analyzing data in your Hadoop cluster using Tableau.
Install the Driver
For both Hive Server or Hive Server 2, you must install the Cloudera, Hortonworks, MapR, or Amazon EMR ODBC driver from the Drivers page. Ensure that the bit version of the driver you download matches the bit version of your operating system.
  • Cloudera (Hive): Cloudera ODBC Driver for Apache Hive 2.5.x, 32-bit or 64-bit.
    • For use with Tableau Server 8.0.0-8.0.7, or 8.1.0-8.1.3, use version 2.5.0.1000.
    • For use with Tableau Server 8.0.8 and later, or 8.1.4 and later, use driver version 2.5.0.1001 or later.
  • Cloudera (Impala): Cloudera ODBC Driver for Impala Hive 2.5.x, 32-bit or 64-bit.
    Notes:
    • If connecting to Beeswax services on Cloudera Hadoop, use the Cloudera ODBC 1.2 Connector for Tableau Windows (32-bit or 64-bit) instead. You must uninstall version 2.5.x of the driver if you have installed it already.
    • Tableau recommends moving Beeswax services to Hive Server 2 as Tableau will be phasing out support for Beeswax in a future release.
  • Hortonworks: Hortonworks Hive ODBC Driver 1.2.x (32-bit or 64-bit)
  • MapR: MapR_odbc_2.1.0_x86.exe or later, or MapR_odbc_2.1.0_x64.exe or later
  • Amazon EMR: HiveODBC.zip or ImpalaODBC.zip
Note: If you have a different version of the driver installed, first uninstall that driver before installing the version provided on the Drivers page.
Connect in Tableau

Step 1 

In Tableau Desktop, select the appropriate server, Cloudera Hadoop, Hortonworks Hadoop Hive, MapR Hadoop Hive, or Amazon EMR, and enter the information required to connect.

Step 2 

  • For Tableau Desktop 9.0, select the appropriate schema and table to build your data source, and then click the sheet tab to start your analysis.
  • For Tableau Desktop 8.2 or 8.3, select the appropriate schema and table to build your data source, and then clickGo to Worksheet.
  • For Tableau Desktop 8.1 and earlier, complete the steps in the Hadoop Hive Connection dialog box, and click OK.
Notes:
  • If you are connecting to Desktop 7.0 using the Cloudera connector, ensure that you select the Hive Server radio button in Step 2 of the dialog box.
  • Refer to the Administering Hadoop and Hive for Tableau Connectivity article for more information about configuring your Hadoop Hive connection. Refer to the Extra Capabilities for Hadoop Hive article for more information about specific capabilities that are available for Hadoop Hive connections in Tableau.
Once your workbook is connected to Hive, drag and drop fields to build views as you would when connected to any other database.

Work with date/time data

Hive recently introduced native support for date/time as a data type, however, Tableau has not built in support for it yet. However, it is possible to leverage the very rich support for operating on date/time data stored within strings. Simply right-click fields in the Data pane and select Change Data Type > String to work with pure date or date/time data stored in strings. Tableau will provide the standard methods for visualizing and filtering date/time data, and Hive will construct the Map/Reduce jobs necessary to parse the string data as date/time to satisfy the queries generated by Tableau.

Known limitations

The following list contains the known limitations of Hive and Hadoop compared to traditional databases.
High latency
Hive is a batch-oriented system and is not yet capable of answering simple queries with very quick turnaround. This limitation can make it difficult to explore a new data set or experiment with calculated fields. However, some of the performance suggestions in the Designing for Performance Using Hadoop Hive article may help. In addition, some of the newer SQL-on-Hadoop technologies, such as Cloudera's Impala, Hortonworks' Stringer project, etc., are designed to address this limitation.
Query progress and cancellation
Canceling an operation in Hadoop Hive is not straightforward, especially when working on a machine that is not a part of the cluster. Hive is unable to offer a mechanism for canceling queries, so the queries that Tableau issues can only be "abandoned." You can continue your work in Tableau after abandoning a query, but the query will continue to run on the cluster and consume resources.
Additionally, the progress estimator for Map/Reduce jobs in Hadoop is simplistic and often produces inaccurate estimates. Once more, Hive is unable to present this information to Tableau to help you determine how to budget your time as you analyze data.
Date/time processing
Hive offers substantial functionality for operating on string data, which represents a date/time. It recently added support for storing date/time as a native data type. Support for native date/time is planned for a future release of Tableau. For now, you are able to inform Tableau of which fields contain date/time data. Keep in mind that the string data operations for date/time data do not support the complete SQL standard, especially those involving calendar operations like adding a month to an existing date.
Authentication
For connections to the traditional Hive Server, the original Hive ODBC drivers do not expose authentication operations, and the Hive authentication model and data security model are incomplete. The Hive Security Documentation provided by Cloudera describes the limitations in more detail. Tableau Server provides a data security model for situations like this. You can create User Filters in a Tableau workbook to express how the data in each visualization should be restricted, and Tableau Server will ensure these filters are enforced accordingly for each user who accesses an interactive visualization in their browser.
For connections to Hive Server 2, the latest Hive ODBC driver allows you to configure authentication. This authentication is handled directly in the Tableau Connection dialog box. The dialog box will allow for the following authentication methods:

Test the connection to Hadoop Hive cluster

The latest ODBC drivers for Cloudera, Hortonworks, MapR, and Amazon EMR enable you to use the driver configuration utility to test the connection to your Hadoop Hive cluster. To do this, go to the 32- or 64-bit ODBC Administrator utility, and complete the procedure below.
  1. Click the System DSN tab.
  2. Click the Add button.
  3. Select the appropriate driver for your Hadoop distribution.
  4. Complete the information in the DSN Setup dialog box.
  5. Click the Test button. Test results display with either, “TESTS COMPLETED SUCCESSFULLY” or “TEST COMPLETED WITH ERROR.”
Note: If the test completes with an error, contact the Hadoop Hive cluster administrator or the Hadoop support representative to help resolve the connection issue.