Using Python 2.7.9 on Windows 8.1 Enterprise 64-bit
I’m using the following code to search for any Korean characters ( http://lcweb2.loc.gov/diglib/codetables/9.3.html )
line = ['x'. 'y', 'z', '쭌', 'a']
if any([re.search("[%s-%s]" % ("xE3x84xB1".decode('utf-8'), "xECxADx8C".decode('utf-8')), x) for x in line[3:]]): print "found character"
When ever I run the script and give it the following character
쭌 the console shows
∞¡î which is a result of IDLE / Command Prompt being unable to show Korean characters I’m guessing.
쭌 is the last character that I was hoping to match in the regex
So is the above search correct at least? I’d prefer to know I at least have the right pattern to search for and spend time trying to make the console show the proper Korean characters.
I’ve tried in command prompt to do
cph 1252 and nothing. It never prints out “found character” so I wouldn’t ever know.
If it helps, the script is receiving text from an IRC channel where Korean is usually spoken.
Use Unicode strings (note the “u” prefixes):
import re line = [u'x', u'y', u'z', u'쭌', u'a'] if any([re.search(u'[u3131-ucb4c]', x) for x in line[3:]]): print "found character"