There are a lot of subtitles in Persian saved with wrong encoding. there are some options in video players to fix & display this files correctly but there is only one windows software which actually fix the file & save it with correct encoding. I want to do this in python. I’ve tried many things but was unable to get this done. notepad says file is in ANSI so I opened it as ‘Latin-1’ in python & tried to decode & encode it as UTF-8 but it gives me the original file. file can be downloaded from https://ufile.io/np0rodjg
also, fixed file with mentioned software can be downloaded from https://ufile.io/ignop48m
how this can be done using python?
Likely the file is encoded in cp1256, aka Windows-1256, the code page used for Persian and Urdu in Windows. To create a UTF-8 version of the file, you’ll just need to read it in this code page and write out in UTf-8:
with open("source.srt", "rt", encoding="cp1256") as f: data = f.read() with open("fixed.srt", "wt", encoding="utf_8_sig") as f: f.write(data)