Numpy genfromtxt() skip invalid lines

I am using Numpy’s genfromtxt() function to get large amounts of txt data as array. The data is provided in following format:

2020-05-20 16:54:01.807645  1033.074    2392.555    256.8516    2700.547    1029.691    2108.094    3256.539    90.94727    1775.043    4.770321    48.875

The log file also contains lines like the following:

2020-05-20 17:05:21.864533  DUT stopped

I want to skip those lines. Is there a way of doing so?

My current approach:

values = np.genfromtxt(fi, dtype="S32,f8,f8,f8,f8,f8,f8,f8,f8,f8,f8,f8", missing_values='', delimiter="t", invalid_raise=False, filling_values=0)

Thanks for any help

Answer

According to the manual you can also pass in a generator generating bytestrings (lines) to np.genfromtxt().

fname

  • file, str, pathlib.Path, list of str, generator

File, filename, list, or generator to read. If the filename extension is gz or bz2, the file is first decompressed. Note that generators must return byte strings. The strings in a list or produced by a generator are treated as lines.

def skip_stopped_lines(fi):
    for line in fi:
        if b"DUT stopped" in line:
            # Don't yield this line, we don't want it
            continue
        yield line


# NB: not using `with` to avoid the file being closed
#     while the generators are active

fi = open("somefile.txt", "rb")  # note binary mode
skipper = skip_stopped_lines(fi)
values = np.genfromtxt(
    skipper,
    dtype="S32,f8,f8,f8,f8,f8,f8,f8,f8,f8,f8,f8",
    missing_values="",
    delimiter="t",
    invalid_raise=False,
    filling_values=0,
)

might do the trick for you.