編集履歴

回答編集履歴

追記

2022/06/16 02:41

投稿

スコア38352

answer CHANGED Viewed

@@ -2,16 +2,54 @@
 この関数内で、抽出した部分をbyte列に変換し、それを'utf-16be'として解釈して文字列化します。
 なお、以下コードにて文字列に[:-1]しているのは、今回の問題では本質ではないので説明は省きます。
 （[Python raw strings and trailing backslash [duplicate]](https://stackoverflow.com/questions/2870730/python-raw-strings-and-trailing-backslash)に記載されています）
+## 追記
+`X2`にくわえ顔文字など`X4`にも対応させました。
+参考：
+[How to decoding IFC using Ruby](https://stackoverflow.com/questions/43417411/how-to-decoding-ifc-using-ruby)
+[STEP-file, ISO 10303-21](https://www.loc.gov/preservation/digital/formats/fdd/fdd000448.shtml)
+[6.4.3.3 Encoding ISO 10646 characters within a string](https://www.steptools.com/stds/step/IS_final_p21e3.html)
 ```Python
 import re
-def to_str(x):
+def func2(x):
     b = bytes.fromhex(x.group(1))
     return b.decode('utf-16be')
+def func4(x):
+    # 7文字毎の先頭に"0"を付加
+    #
+    # https://www.steptools.com/stds/step/IS_final_p21e3.html
-ESCAPE_SEQUENCE_EXPR = r'\\X2\\(.*?)\\X0\\='[:-1]
+    # 6.4.3.3 Encoding ISO 10646 characters within a string
+    #-----
+    # NOTE This use of eight hexadecimal characters in the "\X4\" encoding predates the restriction of the UCS codespace to a maximum value of 10FFFF.
+    # The first two characters in each eight character group will always be digit zero.
+    #-----
+    s = x.group(1)
+    s = ''.join(['0'+s[i:i+7] for i in range(0,len(s),7)])
+    b = bytes.fromhex(s)
+    return b.decode('utf-32be')
+def decode_ifc_str(s):
+    EXPR2 = r'\\X2\\(.*?)\\X0\\='[:-1]
+    EXPR4 = r'\\X4\\(.*?)\\X0\\='[:-1]
+    for expr, func in [(EXPR2, func2), (EXPR4, func4)]:
+        s = re.sub(expr, func, s)
+    return s
+lst = [
+    r'abc=',
-src = r'\X2\3010914D7BA1301151B75A92914D7BA1\X0\ (b)15.9\X2\03C600D7\X0\9.5\X2\03C6\X0\='[:-1]
+    r'\X2\3010914D7BA1301151B75A92914D7BA1\X0\ (b)15.9\X2\03C600D7\X0\9.5\X2\03C6\X0\=',
+    r'\X2\03B103B203B3\X0\=',    # ギリシャ文字アルファ、ベータ、ガンマ（αβγ）
+    r'\X4\001F600\X0\=',	     # ニヤリと笑う顔（絵文字、😀）
+    r'\X4\001F600001F638\X0\='   #ニヤリと笑う猫の顔（2つの絵文字、😀😸）
+]
+with open('ret.txt', 'w', encoding='utf-8') as f:
+    for src in lst:
-dst = re.sub(ESCAPE_SEQUENCE_EXPR, to_str, src)
+        dst = decode_ifc_str(src)
+        line = f'[{src}]->[{dst}]'
-print(dst) # 【配管】冷媒配管 (b)15.9φ×9.5φ
+        print(line) # 環境によっては顔文字が正しく表示できない
+        f.write(line+'\n')
 ```