Tableau reads the data in a statistical file (for example, an SAS or R file) based on the file's character encoding or on information in an input file. R files typically use the character encoding of the operating system, compared to SAS and SPSS files, which include the character encoding information in the file. Sometimes, you might need to specify a different encoding. For example, if a colleague sends you a statistical file with Greek character encoding, then you must specify a Greek character set to use the file with Tableau. If you need to use a different character set when reading from a statistical file, you can create a Tableau data source customization (TDC) file and specify the encoding to use.
Create the TDC file
A TDC file is an XML file that applies to a single data source and contains vendor and driver name information of the data source provider. For the statistical file connector, the vendor and driver name is stat-direct .
To create a TDC file
- Open a plain text editor, such as Notepad.
- Copy the information from the sample provided below, paste it into your text file, and then specify the source-charset value. (See the Appendix below for a list of user-defined encodings supported by the Statistical File connector.)
- Save the file with a .tdc extension—for example, r-statsfile.tdc—and place in the My Tableau Repository\Datasources folder.
Sample TDC file sets the source-charset value
The following TDC file example sets the source-charset value to shift-jis for a statistical file data source.
Important: Tableau does not test or support TDC files. These files should be used as a tool to explore or occasionally address issues with your data connection. Creating and maintaining TDC files requires careful manual editing, and there is no support for sharing these files.
Appendix—User-defined encodings supported by statistical file connector
This appendix lists character encodings supported by Tableau Statistical File connector. The list includes single-byte, multi-byte, and Unicode user-defined encodings, as well as single-byte and multi-byte encodings that cannot currently be mapped to the corresponding SAS encodings.
Single-byte user-defined encodings
ASCII
CSASCII
US-ASCII
US
ISO_646.IRV:1991
ISO646-US
646
ISO-IR-6
IBM367
CP367
ANSI_X3.4-1986
ANSI_X3.4-1968
ISO-8859-1
CSISOLATIN1
LATIN1
L1
ISO_8859-1:1987
ISO8859-1
ISO-IR-100
ISO-8859-1
IBM819
CP819
ISO-8859-15
LATIN-9
ISO_8859-15:1998
ISO_8859-15
ISO8859-15
ISO-IR-203
IBM850
CSPC850MULTILINGUAL
CP850
850
WINDOWS-1252
MS-ANSI
CP1252
ISO-8859-7
CSISOLATINGREEKISO_8859-7:1987
ISO_8859-7
ISO-IR-126
ISO-8859-7
GREEK8
GREEK
ELOT_928
ECMA-118
WINDOWS-1253
MS-GREEK
CP1253
ISO-8859-10
CSISOLATIN6
LATIN6
L6
ISO_8859-10:1992
ISO_8859-10
ISO8859-10
ISO-IR-157
WINDOWS-1257
WINBALTRIM
CP1257
ISO-8859-2
CSISOLATIN2
LATIN2
L2
ISO_8859-2:1987
ISO_8859-2
ISO8859-2
ISO-IR-101
IBM852
CSPCP852
CP852
852
WINDOWS-1250
MS-EE
CP1250
ISO-8859-5
CSISOLATINCYRILLIC
ISO_8859-5:1988
ISO_8859-5
ISO8859-5
ISO-IR-144
CYRILLIC
WINDOWS-1251
MS-CYRL
CP1251
CP866
CSIBM866
IBM866
866
TIS-620
TIS620.2533-1
TIS620.2533-0
TIS620.2529-1
TIS620-0
TIS620
ISO-IR-166
ISO-8859-11
CP874
CSISOLATIN5
LATIN5
L5
ISO_8859-9:1989
ISO_8859-9
ISO8859-9
ISO-8859-9
ISO-IR-148
CSIBM857
IBM857
CP857
857
WINDOWS-1254
MS-TURK
CP1254
CP1129
VPS
WINDOWS-1258
CP1258
ISO-8959-6
CSISOLATINARABIC
ISO_8859-6:1987
ISO_8859-6
ISO8859-6
ISO-IR-127
ECMA-114
ASMO-708
ARABIC
WINDOWS-1256
MS-ARAB
CP1256
ISO-8859-8
CSISOLATINHEBREW
ISO_8859-8:1988
ISO_8859-8
ISO8859-8
ISO-IR-138
HEBREW
IBM864
CSIBM864
CP864
WINDOWS-1255
MS-HEBR
CP1255
IBM862
CSPC862LATINHEBREW0x2E
CP862
862
Multi-byte user-defined encodings
CP936
WINDOWS-936
MS936
GBK
GB2312
CSISO58GB231280
ISO-IR-58
GB_2312-80
CHINESE
ISO-2022-CN
CP950
windows-950
ms-950
ms950
CSBIG5
CN-BIG5
BIGFIVE BIG5
BIG-FIVE
BIG-5
BIG5HKSCS
BIG5-HKSCS
EUC-TW
CSEUCTW
EUCTW
EUC-JP
CSEUCPKDFMTJAPANESE
EXTENDED_UNIX_CODE_PACKED_FORMAT_FOR_JAPANESE
EUCJP
EUC-JP
ISO-2022-JP
CSISO2022JP
ISO-2022-JP
CSSHIFTJIS
SJIS
SHIFT_JIS
SHIFT-JIS
MS_KANJI
CP932
EUC-KR
CSEUCKR
EUCKR
EUC-KR
UHC
CP949
EUC-CN
CSGB2312
GB2312
EUCCN
CN-GB
Unicode user-defined encodings
TF-8
UCS-2
UCS-2BE
UCS-2LE
UCS-4
UCS-4BE
UTF-16
UTF-16BE
UTF-16LE
UTF-32
UTF-32LE
UTF-32BE
UTF-7
Single-byte encodings that cannot be mapped to SAS encodings
MACROMAN
CSMACINTOSH
MACINTOSH
MAC
ISO-8859-14
LATIN8
L8
ISO_8859-14:1998
ISO_8859-14
ISO8859-14
ISO-IR-199
ISO-CELTIC
MACGREEK
MACICELAND
ISO-8859-3
CSISOLATIN3
LATIN3
L3
ISO_8859-3:1988
ISO_8859-3
ISO8859-3
ISO-IR-109
ISO-8959-4
CSISOLATIN4
LATIN4
L4
ISO_8859-4:1988
ISO_8859-4
ISO8859-4
ISO-IR-110
ISO-8959-13
LATIN7
L7
ISO_8859-13
ISO8859-13
ISO-IR-179
ISO-8859-13
MACCENTRALEUROPE
MACCROATIAN
IBM855
CSIBM855
CP855
855
KOI8-R
CSKOI8R
MACCYRILLIC
KOI8-U
CSKOI8R
MACUKRAINIAN
ISO-8859-16
LATIN10
L10
ISO_8859-16:2001
ISO_8859-16
ISO8859-16
ISO-IR-226
MACROMANIAN
ARMSCII-8
GEORGIAN-ACADEMY
MACTURKISH
TCVN
VISCII
CSVISCII
VISCII1.1-1
MACARABIC
MACHEBREW
WINDOWS-874
Multi-byte encodings that cannot be mapped to SAS encodings
GB18030
HZ
HZ-GB-2312
CSISO2022JP
ISO-2022-JP
JOHAB
JOHAB
CP1361
ISO-2022-KR
CSISO2022KR
ISO-2022-KR
ISO-2022-JP
CSISO2022JP
ISO-2022-JP-1
ISO-2022-JP-2
CSISO2022JP2
ISO-2022-CN
CSISO2022CN
ISO-2022-CN-EXT