I have a log file that has what’s known as a header section, and then the rest of it is a lot of data. The header section contains certain key value pairs that tells a db table information about said file.
One of my tasks is to parse out some of this header info. The other task is to go through the entire file and parse out counts of when certain strings occur. The later part I have a function for attched below:
with open(filename, 'rb') as f: time_data_count = 0 while True: memcap = f.read(102400) # f.seek(-tdatlength, 1) poffset_set = set(config_offset.keys()) # need logic to check if key value exists time_data_count += memcap.count(b'TIME_DATA') if len(memcap) <= 8: break if time_data_count > 20: print("time_data complete") else: print("incomplete time_data data") print(time_data_count)
The issue now with this is that it is not a line by line processing which would take a lot of time. I want to only get the first 50 lines of this log and then parse them. Then have the rest of the function go through the entire file without goign line by line and doing the counting parts.
Is it possible to extract the first 50 lines without going through the entire file? The first 50 lines have header info of the form
What I really need is to get the value of ProdID in that log file
You can read line-by-line for the first 50, by using a
for loop or a list comprehension to just read the next line 50 times. This moves the read pointer down through the file, so when you call
.read() or any other method, you’ll not get anything you’ve already consumed. You can then process the rest as batch, or however else you need to:
with open(filename, 'rb') as f: first_50_lines = [next(f) for _ in range(50)] # first 50 lines remainder_of_file = f.read() # however much of the file remains
You can alternate various methods of reading the file, as long as the same file object (
f in this case) is in play the entire time. Line-by-line, sized-chunk by chunk, or all at once (though
.read() is always going to preclude further processing, on account of consuming the whole thing at once).