Codepage 51932 is a superset of euc-jp. The encoding is called as a euc-jp on Windows. BUT it is not same as a encoding called "euc-jp" on unix. 51932 has a little more code sets of Kanji(Chinese character) and special symbols than euc-jp on unix. The 51932 is designed to interconversion between cp51932 and cp932 by Microsoft. InternetExplorer, Firefox, Hidemaru(famous japanese text editor) are using the 51932 called it as a "euc-jp". Tcl's euc-jp has basic code set only. So we should use 51932 to convert web page written by euc-jp. The next figure is a code set of 51932.
Codeset | cp51932 | Unicode | |
1st byte | 2nd byte | ||
JIS X 0201 Latin | 0x00 - 0x7F | --- | U+0000 - U+007F |
JIS X 0201 Katakana | 0x8E | 0xA1 - 0xDF | U+FF61 - U+FF9F |
JIS X 0208:1997 | 0xA1 - 0xA8 | 0xA1 - 0xFE | convert by a table |
0xB0 - 0xF4 | |||
NEC special characters | 0xAD | ||
NEC selection of IBM extensions | 0xF9 - 0xFC |
Tcl's euc-jp encoding doen't have NEC special characters and NEC selection of IBM extensions. Next tables shows these all characters.
NEC special characters
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
ADA0 | ① | ① | ② | ③ | ④ | ⑤ | ⑥ | ⑦ | ⑧ | ⑨ | ⑩ | ⑪ | ⑫ | ⑬ | ⑭ | ⑮ |
ADB0 | ⑯ | ⑰ | ⑱ | ⑲ | ⑳ | Ⅰ | Ⅱ | Ⅲ | Ⅳ | Ⅴ | Ⅵ | Ⅶ | Ⅷ | Ⅸ | Ⅹ | |
ADC0 | ㍉ | ㌔ | ㌢ | ㍍ | ㌘ | ㌧ | ㌃ | ㌶ | ㍑ | ㍗ | ㌍ | ㌦ | ㌣ | ㌫ | ㍊ | ㌻ |
ADD0 | ㎜ | ㎝ | ㎞ | ㎎ | ㎏ | ㏄ | ㎡ | ㍻ | ||||||||
ADE0 | 〝 | 〟 | № | ㏍ | ℡ | ㊤ | ㊥ | ㊦ | ㊧ | ㊨ | ㈱ | ㈲ | ㈹ | ㍾ | ㍽ | ㍼ |
ADF0 | ≒ | ≡ | ∫ | ∮ | ∑ | √ | ⊥ | ∠ | ∟ | ⊿ | ∵ | ∩ | ∪ |
NEC selection of IBM extensions
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
F9A0 | 纊 | 纊 | 褜 | 鍈 | 銈 | 蓜 | 俉 | 炻 | 昱 | 棈 | 鋹 | 曻 | 彅 | 丨 | 仡 | 仼 |
F9B0 | 伀 | 伃 | 伹 | 佖 | 侒 | 侊 | 侚 | 侔 | 俍 | 偀 | 倢 | 俿 | 倞 | 偆 | 偰 | 偂 |
F9C0 | 傔 | 僴 | 僘 | 兊 | 兤 | 冝 | 冾 | 凬 | 刕 | 劜 | 劦 | 勀 | 勛 | 匀 | 匇 | 匤 |
F9D0 | 卲 | 厓 | 厲 | 叝 | 﨎 | 咜 | 咊 | 咩 | 哿 | 喆 | 坙 | 坥 | 垬 | 埈 | 埇 | 﨏 |
F9E0 | 塚 | 增 | 墲 | 夋 | 奓 | 奛 | 奝 | 奣 | 妤 | 妺 | 孖 | 寀 | 甯 | 寘 | 寬 | 尞 |
F9F0 | 岦 | 岺 | 峵 | 崧 | 嵓 | 﨑 | 嵂 | 嵭 | 嶸 | 嶹 | 巐 | 弡 | 弴 | 彧 | 德 | 忞 |
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
FAA0 | 德 | 忞 | 恝 | 悅 | 悊 | 惞 | 惕 | 愠 | 惲 | 愑 | 愷 | 愰 | 憘 | 戓 | 抦 | 揵 |
FAB0 | 摠 | 撝 | 擎 | 敎 | 昀 | 昕 | 昻 | 昉 | 昮 | 昞 | 昤 | 晥 | 晗 | 晙 | 晴 | 晳 |
FAC0 | 暙 | 暠 | 暲 | 暿 | 曺 | 朎 | 朗 | 杦 | 枻 | 桒 | 柀 | 栁 | 桄 | 棏 | 﨓 | 楨 |
FAD0 | 﨔 | 榘 | 槢 | 樰 | 橫 | 橆 | 橳 | 橾 | 櫢 | 櫤 | 毖 | 氿 | 汜 | 沆 | 汯 | 泚 |
FAE0 | 洄 | 涇 | 浯 | 涖 | 涬 | 淏 | 淸 | 淲 | 淼 | 渹 | 湜 | 渧 | 渼 | 溿 | 澈 | 澵 |
FAF0 | 濵 | 瀅 | 瀇 | 瀨 | 炅 | 炫 | 焏 | 焄 | 煜 | 煆 | 煇 | 凞 | 燁 | 燾 | 犱 |
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
FBA0 | 犾 | 犾 | 猤 | 猪 | 獷 | 玽 | 珉 | 珖 | 珣 | 珒 | 琇 | 珵 | 琦 | 琪 | 琩 | 琮 |
FBB0 | 瑢 | 璉 | 璟 | 甁 | 畯 | 皂 | 皜 | 皞 | 皛 | 皦 | 益 | 睆 | 劯 | 砡 | 硎 | 硤 |
FBC0 | 硺 | 礰 | 礼 | 神 | 祥 | 禔 | 福 | 禛 | 竑 | 竧 | 靖 | 竫 | 箞 | 精 | 絈 | 絜 |
FBD0 | 綷 | 綠 | 緖 | 繒 | 罇 | 羡 | 羽 | 茁 | 荢 | 荿 | 菇 | 菶 | 葈 | 蒴 | 蕓 | 蕙 |
FBE0 | 蕫 | 﨟 | 薰 | 蘒 | 﨡 | 蠇 | 裵 | 訒 | 訷 | 詹 | 誧 | 誾 | 諟 | 諸 | 諶 | 譓 |
FBF0 | 譿 | 賰 | 賴 | 贒 | 赶 | 﨣 | 軏 | 﨤 | 逸 | 遧 | 郞 | 都 | 鄕 | 鄧 | 釚 | 釗 |
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
FCA0 | 釚 | 釗 | 釞 | 釭 | 釮 | 釤 | 釥 | 鈆 | 鈐 | 鈊 | 鈺 | 鉀 | 鈼 | 鉎 | 鉙 | 鉑 |
FCB0 | 鈹 | 鉧 | 銧 | 鉷 | 鉸 | 鋧 | 鋗 | 鋙 | 鋐 | 﨧 | 鋕 | 鋠 | 鋓 | 錥 | 錡 | 鋻 |
FCC0 | 﨨 | 錞 | 鋿 | 錝 | 錂 | 鍰 | 鍗 | 鎤 | 鏆 | 鏞 | 鏸 | 鐱 | 鑅 | 鑈 | 閒 | 隆 |
FCD0 | 﨩 | 隝 | 隯 | 霳 | 霻 | 靃 | 靍 | 靏 | 靑 | 靕 | 顗 | 顥 | 飯 | 飼 | 餧 | 館 |
FCE0 | 馞 | 驎 | 髙 | 髜 | 魵 | 魲 | 鮏 | 鮱 | 鮻 | 鰀 | 鵰 | 鵫 | 鶴 | 鸙 | 黑 | |
FCF0 | ⅰ | ⅱ | ⅲ | ⅳ | ⅴ | ⅵ | ⅶ | ⅷ | ⅸ | ⅹ | ¬ | ¦ | ' | " |
I used mlang.dll to mapping code sets (by using mlenc). And I found a difference of mapping between 51932 and euc-jp. Unicode(Tcl internal) to 51932/euc-jp Mappings doesn't have problem. But this 51932 or euc-jp to unicode mapping may make a bit trouble. You should remember this difference if you will use 51932 instead of euc-jp.
51932 | euc-jp | |
0xA1C1 | U+FF5E | U+301C |
0xA1C2 | U+2225 | U+2016 |
0xA1DD | U+FF0D | U+2212 |
0xA1F1 | U+FFE0 | U+00A2 |
0xA1F2 | U+FFE1 | U+00A3 |
0xA2CC | U+FFE2 | U+00AC |
By the way...
I tried to make eucJP-ms encoding file too. The encoding is also called as eucJP-open. And It's aim is same as the cp51932. (Interconversion between euc-jp and cp932). It is also superset of euc-jp. It was designed by TOG/JVC CDE/Motif Technical WG. The encoding was supported by MySQL for several years. The encoding is used by MySQL, PostgreSQL and some unix applications.
If Tcl had this encoding, it is more convenient. But I couldn't make the encoding file. Because the encoding conversion system of Tcl doesn't support 3byte input.
http://reddog.s35.xrea.com/software/cp51932.enc.zip