How to tokenize on regex pattern and sort the resulting list?

I have a file that looks something like this:

select a,b,c FROM Xtable
select a,b,c FROM Vtable
select a,b,c FROM Atable
select a,b,c FROM Atable
select d,e,f FROM Atable

I want to get a sortedMap:

{
"Atable":["select a,b,c FROM Atable", "select d,e,f FROM Atable"],
"Vtable":["select a,b,c FROM Vtable"],
"Xtable":["select a,b,c FROM Xtable"]
}

The keys of sortedMap will be tableName and values being the textline in list.

I started off with this, but stuck in tokenizing the line for regex matching:

import re

f = open('mytext.txt', 'r')
x = f.readlines()
print x
f.close()
for i in x:
    p = re.search(".* FROM ", i)
 //now how to tokenize and get the value that follows FROM

Answer

You can use a combination of defaultdict and regular expressions. Let lines be a list of your lines:

from collections import defaultdict
pattern = "(select .+ from (S+).*)"
results = defaultdict(list)

for line in lines:
     query, table = re.findall(pattern, line.strip(), flags=re.I)[0]
     results[table].append(query)

Actually, the right way to read the file would be:

with open('mytext.txt') as infile:
    for line in infile:
         query, table = re.findall(pattern, line.strip(), flags=re.I)[0]
         results[table].append(query)

The result is, naturally, a defaultdict. If you want to convert it into an ordered dictionary, call the dictionary constructor:

from collections import OrderedDict
OrderedDict(sorted(results.items()))
#OrderedDict([('Atable', ['select a,b,c FROM Atable', ...

You can make the pattern more robust to keep track of commas, valid identifiers, etc.