Blog 
Switch from WE8 Character Set to Unicode

As mentioned in my blog “Character Set Conversion with CSALTER – Does that actually work?” it is however possible to change the character set of the database with CSSCAN or CSALTER, but that does not help if you saved US7ASCII characters in the database, as it is quite usual in Europe, because only the data dictionary is being adjusted but not the data in the tables.

So is nothing for it but download the data from the WE8 database and import it to a new Unicode database. As character set for the target system I again use AL32UTF8, that means a character set with at least one Byte (UTF8) and 4 Byte maximum (AL32). For the German characters this means that an Umlaut (Ä, Ö, Ü) needs 2 Byte and the “ß” even 3 Byte.

1. Try: Export/Import in DataPump

As my test databases are Oracle 11g Release 2 (11.2.0.2) in my favorite operating system OEL6, I will use DataPump, of course. With older systems it works similarly with the classic exp/imp.

First the schemas (DEMO and DEMOSTAMM) on the source system are exported and afterwards imported to the target system. The used directory object “DATA_PUMP_DIR” is by the way created automatically in Oracle 11g Release 2 when installing the database and points to the directory$ORACLE_BASE/admin/<ORACLE_SID>/dpdump.

Export Parameter File:

DUMPFILE="demoexpwe8.dmp"
LOGFILE="exp_demoexpwe8.log"
DIRECTORY=DATA_PUMP_DIR
CONTENT=ALL SCHEMAS=('DEMOSTAMM', 'DEMO')

Import Parameter File:

DUMPFILE="demoexpwe8.dmp"
LOGFILE="imp_demoimputf.log"
DIRECTORY=DATA_PUMP_DIR
STREAMS_CONFIGURATION=n
TABLE_EXISTS_ACTION=APPEND
SKIP_UNUSABLE_INDEXES=y
CONTENT=ALL
PARTITION_OPTIONS=none
SCHEMAS=('DEMOSTAMM', 'DEMO')

The log of the import now looks like this:

...
Master table "SYSTEM"."SYS_IMPORT_SCHEMA_01" successfully loaded/unloaded
Starting "SYSTEM"."SYS_IMPORT_SCHEMA_01":  system/******** parfile=demoimputf.dat
Processing object type SCHEMA_EXPORT/USER
Processing object type SCHEMA_EXPORT/SYSTEM_GRANT
Processing object type SCHEMA_EXPORT/ROLE_GRANT
Processing object type SCHEMA_EXPORT/DEFAULT_ROLE
Processing object type SCHEMA_EXPORT/PRE_SCHEMA/PROCACT_SCHEMA
Processing object type SCHEMA_EXPORT/SEQUENCE/SEQUENCE
Processing object type SCHEMA_EXPORT/TABLE/TABLE
Processing object type SCHEMA_EXPORT/TABLE/TABLE_DATA
. . imported "DEMO"."POSITIONEN"                           727 KB   30267 rows
ORA-02374: conversion error loading table "DEMO"."AUFTRAEGE"
ORA-12899: value too large for column AUFSTATUS (actual: 2, maximum: 1)
ORA-02372: data for row: AUFSTATUS : 0X'DC'
ORA-02374: conversion error loading table "DEMO"."AUFTRAEGE"
ORA-12899: value too large for column AUFSTATUS (actual: 2, maximum: 1)
ORA-02372: data for row: AUFSTATUS : 0X'DC'
...
. . imported "DEMO"."AUFTRAEGE"                          299.3 KB    9934 out of 10000 rows
. . imported "DEMOSTAMM"."NACHNAMEN"                     177.4 KB   11552 rows
...
ORA-02374: conversion error loading table "DEMO"."STATUS"
ORA-12899: value too large for column STATUSID (actual: 2, maximum: 1)
ORA-02372: data for row: STATUSID : 0X'DC'
. . imported "DEMO"."STATUS"                             6.093 KB       5 out of 6 rows
...
ORA-39083: Object type REF_CONSTRAINT failed to create with error:
ORA-02298: cannot validate (DEMO.FK_POSITIONEN_AUFTRAEGE) - parent keys not
found
Failing sql is: ALTER TABLE "DEMO"."POSITIONEN" ADD CONSTRAINT "FK_POSITIONEN_AUFTRAEGE" FOREIGN KEY ("AUFID") REFERENCES
"DEMO" ."AUFTRAEGE" ("AUFID") ON DELETE CASCADE ENABLE
Job "SYSTEM"."SYS_IMPORT_SCHEMA_01" completed with 2 error(s) at 14:58:03

What happened?

Well: If you have a look at the error message you see that “0X DC” for the Umlaut “Ü” stands in the Unicode character set. The error message “ORA-12899: value too large for column ORDSTATUS (actual 2, maximum: 1)” gives the reason: the order status column was defined to a maximum length of 1 Byte – and the Umlaut which only need one Byte in the WE8 character set, does not fit anymore as it now takes two Byte.

By that this try failed, too!

Solution:

The tables must be adjusted according to the new character’s length. Therefore you could prolong all arrays accordingly, so make a CHAR(1) to CHAR(3) for example (for the “ß”). But since version 9i Oracle offers the possibility to define the length of a character field not in “BYTE” anymore but in “CHAR”. You call that the length semantic.

Unfortunately the DESCRIBE in SQL*Pus does not mean a length scematic, so you have to evade to the directory views.

An example:

NLS_LENGTH_SEMANTIC=BYTE

SELECT column_name,
       data_type,
       data_length,
       decode(char_used,'B','BYTE','C','CHAR') Semantic
FROM   user_tab_columns
WHERE  table_name = 'STATUS'
COLUMN_NAME                    DATA_TYPE  DATA_LENGTH SEMANTIC
------------------------------ ---------- ----------- ----------
STATUSID                       CHAR                 1 BYTE
KURZBESCHREIBUNG               VARCHAR2            10 BYTE
BESCHREIBUNG                   VARCHAR2           255 BYTE

In another database withNLS_LENGTH_SEMANTIC=CHAR

COLUMN_NAME                    DATA_TYPE  DATA_LENGTH SEMANTIC
------------------------------ ---------- ----------- ----------
STATUSID                       CHAR                 4 CHAR
KURZBESCHREIBUNG               VARCHAR2            40 CHAR
BESCHREIBUNG                   VARCHAR2          1020 CHAR

The issue about this is that DataPump and also the classic import/export “take” the semantic. That means if the tables were created with BYTE they will keep this in the new database, independent from the length semantic that can be set on the system level (ALTER SYSTEM) or on the session level (ALTER SESSION).

So next the table definition must be extracted. Here you should be aware that the length semantic is not being extracted with it. With the function dbms_metadata.get_ddl these definitions can be created as a SQL command. Now it is enough to set the length semantic on the target system to “CHAR” and create the table.

Example for a SQL*Plus Script:

set heading off
set feedback off
set trimspool ON
set pages 0
set linesize 1000
set long 2000000
spool create_tables.sql
PROMPT ALTER SESSION SET NLS_LENGTH_SEMANTICS = CHAR
prompt /
--
-- no old storage clauses (MINEXTENTS etc.)
--
EXECUTE DBMS_METADATA.SET_TRANSFORM_PARAM (DBMS_METADATA.SESSION_TRANSFORM, 'STORAGE',FALSE);
--
-- ";" as space holder between commands
--
EXECUTE DBMS_METADATA.SET_TRANSFORM_PARAM (DBMS_METADATA.SESSION_TRANSFORM, 'SQLTERMINATOR',TRUE);
--
-- take over segment attributes like PCTFREE, etc. (set Default)
--
EXECUTE DBMS_METADATA.SET_TRANSFORM_PARAM (DBMS_METADATA.SESSION_TRANSFORM, 'SEGMENT_ATTRIBUTES',TRUE);
SELECT DBMS_METADATA.GET_DDL ('TABLE' ,T.table_name,T.owner)
FROM dba_tables T
WHERE owner IN ('DEMO','DEMOSTAMM');
spool off

The created SL*Plus scriptcreate_table.sqlnow looks like this:

ALTER SESSION SET NLS_LENGTH_SEMANTICS = CHAR
/
...
CREATE TABLE "DEMO"."STATUS"
(  "STATUSID" CHAR(1) CONSTRAINT "SYS_C003083" NOT NULL ENABLE,
   "SHORTDESCRIPTION" VARCHAR2(10),
   "DESCRIPTION" VARCHAR2(255),
      CONSTRAINT "PK_STATUS" PRIMARY KEY ("STATUSID")
USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255
TABLESPACE "USERS"  ENABLE
) SEGMENT CREATION IMMEDIATE
PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
TABLESPACE "USERS" ;
...

The “ALTER SESSION” was built in right away. So the script can be run on the target system directly.

Afterwards a DataPump import can follow, just take care to have the parameter TABLE_EXIST_ACTION set to APPEND or TRUNCATE and not to SKIP, so the data is actually inserted. Additionally in this case the schemas have to be created first.

But unfortunately this import fails, too!

The reason for this is, that when creating the objects the constraints (mainly foreign keys) are created and activated. An import afterwards entails the risk that the order of inserting does not meet the master to detail relation and by that “ORA-02291 integrity constraint violated – parent key not found” errors occur.

So to perform the final import, all foreign key constraints must be disabled.

You can best make yourself a little SQL*Plus script again that creates two more scripts: disable_cons.sql and enable_cons.sql.

set heading off
set feedback off
set trimspool ON
set pages 0
set linesize 1000
spool disable_cons.sql
SELECT 'ALTER TABLE '||owner||'.'||table_name||' DISABLE CONSTRAINT '||constraint_name||';'
  FROM all_constraints
 WHERE owner IN ('DEMO','DEMOSTAMM')
   AND constraint_type='R';
spool off
spool enable_cons.sql
SELECT 'ALTER TABLE '||owner||'.'||table_name||' ENABLE CONSTRAINT '||constraint_name||';'
  FROM all_constraints
 WHERE owner IN ('DEMO','DEMOSTAMM')
   AND constraint_type='R';
spool off

With this now the data import to the target database can finally follow.

This trial and error can only be performed on a test system, of course, which has an identical data stock to the production. Test data, encoded or invalid data here entail the risk that errors due to character’s length are not being detected. In the example this was still very obvious as CHAR(1 BYTE) literally cries for a change.

It becomes more difficult when the length of a field is set on several 100 BYTE or even worse, when the maximum size of VARCHAR2 is reached. Because this is still on 4000 BYTE and not on 4000 CHAR. That is a hard limit which can only be crossed by changing into a CLOB field.

CLOB = UCS-2

One more word about CLOB. When changing to AL32UFT8, for CLOB the UCS-2 character set is used always, that means every character becomes 2 (!) Byte long. That is why you should think about the impacts a switch to Unicode would have on to your tables with CLOB fields.

This is the advice from the Oracle documentation (Globalization Support Guide, Chapter 6):
Data in CLOB columns is stored in a format that is compatible with UCS-2 when the database character set is multibyte, such as UTF8 or AL32UTF8. This means that the storage space required for an English document doubles when the data is converted. Storage for an Asian language document in a CLOB column requires less storage space than the same document in a LONG column using UTF8, typically around 30% less, depending on the contents of the document.
In the next blog I am going to show you possibilities of simplifying this procedure (i.e. by using my favorite tool Toad) and how to minimize the application downtime.

No comments on “Switch from WE8 Character Set to Unicode

Leave a Reply

Your email address will not be published. Required fields are marked *

What can CarajanDB do for you?