re.VERBOSE and a dict

I have this .txt with some logs* I’m trying to assign the value “-” to “user_name” in case the file has no user name. But, in those cases the output dict shows this “user_name”:””. Do you know what I’m doing wrong? I also tried with item.groupdict[“user_name”] = “-“, but clearly this is not the root of the problem…

import re
with open("logdata.txt", "r") as file:
    logdata = file.read()
logs = []
pattern = """
(?P<host>[d.]+)[-s]+
(?P<user_name>w*)s+
[(?P<time>[^][]+)]s+
"(?P<request>[^"]+)"
"""

for item in re.finditer(pattern,logdata,re.VERBOSE):
    **if item.groupdict("user_name") == None:
        item["user_name"] = '-'**
    logs.append(item.groupdict())

Here is a sample of the txt:

71.172.239.195 - dooley1853 [21/Jun/2019:15:45:32 -0700] "PUT /cutting-edge HTTP/2.0" 406 24498
180.95.121.94 - mohr6893 [21/Jun/2019:15:45:34 -0700] "PATCH /extensible/reinvent HTTP/1.1" 201 27330
144.23.247.108 - auer7552 [21/Jun/2019:15:45:35 -0700] "POST /extensible/infrastructures/one-to-one/enterprise HTTP/1.1" 100 22921

Answer

You could either save item.groupdict() in a variable and modify its content before appending it to logs

for item in re.finditer(pattern, logdata, re.VERBOSE):
    params = item.groupdict()
    if params["user_name"] == "":
        params["user_name"] = "-"

    logs.append(params)

or you could change your regular expression to make the second group optional (?) but only match words with at least one character (w+) and then use the default parameter of item.groupdict():

import re

with open("logdata.txt", "r") as file:
    logdata = file.read()

logs = []
pattern = """
(?P<host>[d.]+)[-s]+
(?P<user_name>w+)?s+
[(?P<time>[^][]+)]s+
"(?P<request>[^"]+)"
"""

for item in re.finditer(pattern, logdata, re.VERBOSE):
    logs.append(item.groupdict(default="-"))

Leave a Reply

Your email address will not be published. Required fields are marked *