I found some random codes caused by image files in my text file and I want to remove those random codes, which start with letters or numbers but end with “PM”: for example, there is a text:
An ideal result would be:
but I don’t know how to use re to remove it.
You want to remove every continuous segment of roman letters plus arabic numerals that end with PM. This is achieved by a simple regular expression:
a-z describes the range of all lowercase latin letters, equivalent for
* indicates any amount of characters since your string can likely have arbitrary length.
PM is the fixed end string.
Of course, you have to make sure these strings don’t contain special characters like
ü. If they do, add groups of characters as appropriate.
The actual python code would then be