Parsing text files containing lists of files

I have txt files containing info about catalogs and it files on a server. From each file I need only to get 2 catalogs with it filenames to get them into a massive for further comparsion with local files.

I thought about line by line reading but I am stucked. Especially if it finds ‘SDU__DACS’ in filename in other catalog which Im not intrested it writes it to previous catalog name.

I was trying:

pathSDU = []
pathSCI = []
filesDict = {}
for file in glob.glob('/foo/bar/catalog/*.txt'):
    with open(os.path.join('/foo/bar/catalog', file), 'r') as openFile:
        print('opening file ' + file)
        for line in openFile:
            if '/ACS/SDU_:' in line:
                pathSDU = line
            else:
                if 'SDU__DACS' in line:
                    if 'manifest' not in line:
                        filesDict.update({line: pathSDU})
            if '/ACS/ScienceDataFile:' in line:
                pathSCI = line
            else:
                if 'SCI__DACS' in line:
                    if 'manifest' not in line:
                        filesDict.update({line: pathSCI})

Example of txt files content:

/data/foo/bar/ACS/SDU_:
68421952
17660866 2021-09-06 09:56 SDU__DACS_69DC_0241DB01_2021-246T08-13-26__00001.EXM
17660866 2021-09-06 09:41 SDU__DACS_69DB_0241DB01_2021-246T08-12-37__00001.EXM
17660866 2021-09-06 09:24 SDU__DACS_69DA_0241DB01_2021-246T08-11-46__00001.EXM
17660866 2021-09-06 08:27 SDU__DACS_69D9_0241DB01_2021-246T08-10-56__00001.EXM

/data/foo/bar/TGO/ACS/ScienceDataFile:
69881252
 14759936 2021-09-05 21:51 SCI__DACS__0241DA01_2021-246T04-26-15__00001.EXM
       53 2021-09-05 21:51 SCI__DACS__0241DA01_2021-246T04-26-15__00001.EXM.manifest
318758912 2021-09-05 14:42 SCI__DACS__0241D801_2021-246T00-30-32__00001.EXM

Answer

Try using the following.

with open('data.txt') as openFile:

    path = None
    files = []
    filesDict = dict()

    for line in openFile:
        line = line.rstrip()

        # empty line; you're done with one folder  
        # store the previous data, if it exists and start a new collection 
        if not line.strip(): 
            if files and (path is not None):
                filesDict[path] = files
                path = None; files = []
                continue
    
        # paths end with colons, don't store specific variables for each path 
        if line.endswith(':'):  
            # save previous results, if any 
            if files and (path is not None):
               filesDict[path] = files
    
            path = line; files = []
            next(openFile) # skip the line with only a number 
            continue 
    
        if path:  # only continue if path is defined 
            if 'manifest' not in line:
                files.append(line)
    
    # last folder read from file but not yet stored 
    if path:
        filesDict[path] = files

Sample run

for p, l in filesDict.items():
  print(p)
  for f in l:
    print('t', f)

Output

/data/foo/bar/ACS/SDU_:
     17660866 2021-09-06 09:56 SDU__DACS_69DC_0241DB01_2021-246T08-13-26__00001.EXM
     17660866 2021-09-06 09:41 SDU__DACS_69DB_0241DB01_2021-246T08-12-37__00001.EXM
     17660866 2021-09-06 09:24 SDU__DACS_69DA_0241DB01_2021-246T08-11-46__00001.EXM
     17660866 2021-09-06 08:27 SDU__DACS_69D9_0241DB01_2021-246T08-10-56__00001.EXM
/data/foo/bar/TGO/ACS/ScienceDataFile:
      14759936 2021-09-05 21:51 SCI__DACS__0241DA01_2021-246T04-26-15__00001.EXM
     318758912 2021-09-05 14:42 SCI__DACS__0241D801_2021-246T00-30-32__00001.EXM