Note: [1][2] Range was initially part of the Private Use Area in Unicode 1.0.0,[3] and removed from it in Unicode 1.0.1.
CJK Compatibility Ideographs is a Unicode block created to contain mostly Han characters that were encoded in multiple locations in other established character encodings, in addition to their CJK Unified Ideographs assignments, in order to retain round-trip compatibility between Unicode and those encodings. However, it also contains 12 unified ideographs sourced from Japanese character sets from IBM.
The block has dozens of ideographic variation sequences registered in the Unicode Ideographic Variation Database (IVD).[4][5]
These sequences specify the desired glyph variant for a given Unicode character.
Character sources
Sources for the original collection of CJK Compatibility Ideographs include:
South Korean KS X 1001 (U+F900–U+FA0B, 268 characters; see that page for the explanation)
19 are unifiable with characters in the URO, and are therefore compatibility ideographs in the strict sense.
12 are kokuji characters which are actually unified ideographs (with the Unified_Ideograph property, and which do not change upon normalisation). In spite of their inclusion in the CJK Compatibility Ideographs block and their algorithmically generated character names beginning with "CJK COMPATIBILITY IDEOGRAPH", they are not duplicates of characters in the original CJK Unified Ideographs block in any respect;[6][7] 11 of these 12 are completely non-duplicate, while U+FA23﨣CJK COMPATIBILITY IDEOGRAPH-FA23 was later unintentionally duplicated in CJK Unified Ideographs Extension B as U+27EAF𧺯CJK UNIFIED IDEOGRAPH-27EAF. They are placed there because they do not have a URO encoding, yet IBM 32 is one of the encodings where duplicate encodings are of concern. All of them are rarely used or are variants of common kanji. They are as follows:
U+FA0E﨎CJK COMPATIBILITY IDEOGRAPH-FA0E
U+FA0F﨏CJK COMPATIBILITY IDEOGRAPH-FA0F
U+FA11﨑CJK COMPATIBILITY IDEOGRAPH-FA11
U+FA13﨓CJK COMPATIBILITY IDEOGRAPH-FA13
U+FA14﨔CJK COMPATIBILITY IDEOGRAPH-FA14
U+FA1F﨟CJK COMPATIBILITY IDEOGRAPH-FA1F
U+FA21﨡CJK COMPATIBILITY IDEOGRAPH-FA21
U+FA23﨣CJK COMPATIBILITY IDEOGRAPH-FA23
U+FA24﨤CJK COMPATIBILITY IDEOGRAPH-FA24
U+FA27﨧CJK COMPATIBILITY IDEOGRAPH-FA27
U+FA28﨨CJK COMPATIBILITY IDEOGRAPH-FA28
U+FA29﨩CJK COMPATIBILITY IDEOGRAPH-FA29
Uniquely, (U+FA20蘒CJK COMPATIBILITY IDEOGRAPH-FA20) is intended to be encoded as the kyūjitai form of a kokuji which received a separate encoding for a variant that is straightforwardly the (extended) shinjitai form U+8612蘒CJK UNIFIED IDEOGRAPH-8612. The URO only encoded the shinjitai form, and uses its stroke count to place it in this position. It is furthermore one variant of the many variants of the jinmeiyō kanjiU+8429萩CJK UNIFIED IDEOGRAPH-8429 (i.e. Kummerowia). U+FA20 was assigned a normalisation to U+8612, even though the 龜 and 亀 components, while both forms of radical 213, are not usually considered unifiable.[8]
Sato, T. K.; Kobayashi, Tatsuo; Pak, Tong Gi (2002-05-22), Proposal to add 122 compatibility Hanja code table of the D P R of Korea into the CJK Compatibility Ideographs of ISO/IEC 10646-1:2000
Suignard, Michel (2002-12-12), "USA T.5 e, USA T.8", Proposed disposition of comments on SC2 N 3624 (FPDAM text for Amendment 2 to ISO/IEC 10646-1:2000)
↑Freytag, Asmus; McGowan, Rick; Whistler, Ken (2021-06-14). "Known Anomalies in Unicode Character Names". Unicode Consortium. Unicode Technical Note #27. These 12 characters are unified CJK ideographs, not compatibility ideographs, despite their names.
Unified
Unified
Unified
Unified
Unified
Unified
Unified
Unified
Unified
Unified
Unified
Not unified
Not unified
Not unified
Not unified
Not unified
Not unified
Not unified
12 are unified
Not unified
Not unified
Not unified
Han
Han
Han
Han
Han
Han
Han
Han
Han
Han
Han
Han
Han Common
Han, Hangul, Common, Inherited
Common
Hangul, Katakana, Common
Katakana, Common
Han
Common Hiragana, Common
Han