Is this the right way to make code working with Python3?

I am updating some Python2 code written by others, and this part:

def exec(self, content, query):
      # query = "city_68"
      content = content.strip().strip(',').decode('utf-8', 'ignore')
      query = query.decode('utf-8', 'ignore')
      query_list = query.split('|')

This gives an error in Python3:

File "/Users/cong/bexec.py", line 708, in bexec
    content = content.strip().strip(',').decode('utf-8', 'ignore')
AttributeError: 'str' object has no attribute 'decode'

The parameters content and query are both strings. So I removed the decode part:

content = content.strip().strip(',')
# query = query.decode('utf-8', 'ignore')

Now it doesn’t complain any more. Is this safe to do? I guess in Python3 it doesn’t need decode() any more.

Answer

Correct. In Python 3, if you have a str value, you can assume it is a proper sequence of Unicode code points, not a sequence of bytes that need to be decoded from (say) UTF-8 to a Unicode string. If you have a bytes value, you must decode it first in order to get a proper Unicode string.

In Python 2, the boundaries were looser. A unicode value was definitely a proper Unicode string (and was renamed str in Python 3), while a str value could be a “real” ASCII-only string value or arbitrary binary data: you couldn’t tell just from the type.

As such, the str type supported encode and decode methods to allow switching between the two sides of the str type.

In Python 3, with more strictly defined roles, you can call str.encode to get a bytes value, or you can call bytes.decode to get a str value. You cannot decode a str or further encode a bytes. str.decode and bytes.encode simply do not exist.


In some sense, all files are binary files: they consist of a stream of bytes. What we call a text file is just a file whose bytes are intended to be decoded using a particular text decoder, like ASCII or UTF-8, as opposed to something like a JPEG decoder, or a JVM, or your CPU itself.

When you use open to open a file in text mode (the default), its read method returns str values, resulting from applying file object’s decoder to the raw bytes read from the file.

When you use open to open a file in binary mode, its read method returns bytes values, the raw bytes being left undecoded for you to handle as you see fit.