Extracting a date from a log file and creating a file with unique dates Code Answer

Hello Developer, Hope you guys are doing great. Today at Tutorial Guruji Official website, we are sharing the answer of Extracting a date from a log file and creating a file with unique dates without wasting too much if your time.

The question is published on by Tutorial Guruji team.

I would like to extract from a file the date with format DD.MM.YYYY, date is always in the first place, here an example of the entries

15.04.2016 13:13:30,228 INFO    [wComService] [mukumukuko@system/3] Call created with id:VoiceConnector$mukumukuko@system$D1:1:0:CB:SESSION$D1:1:0:DB:mukumukuko@system$D1:1:0:HB:_TARGET^M
15.04.2016 13:14:10,886 INFO    [wComService] Call 5303 from device +41999999999^M
15.04.2016 13:14:20,967 INFO    [AddressTranslatorService][mukumukuko@system/3] </convertLocalToGNF>^M
15.04.2016 13:14:20,992 INFO    [wComService] [mukumukuko@system/3] Call created with id: VoiceConnector$mukumukuko@system$D1:1:0:MB:SESSION$D1:1:0:NB:mukumukuko@system$D1:1:0:RB:_TARGET^M
15.04.2016 13:15:18,760 INFO    [OSMCService] SessionManager Thread - Heartbeat (1clients connected)^M

this file contains the activity log of 1 week, so in the file it is possible to find dates i.e. 16.04.2016, 17.04.2016, 18.04.2016 as well.

The file can have also these outputs from Java exception:

    at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

I have tried following:

cat fac.log | sed 's/^.*([0-9]{2}.[0-9]{2}.[0-9]{4}).*$/1/' > datesF1

but I get in “datesF1” the desired date but with these Java exception messages

So what I would like is to generate a file which only displays unique dates without repeating them, for example “datesF1” must be:

15.04.2016
16.04.2016
17.04.2016
18.04.2016

Do you know if that is possible or if it is better to use the grep command?

Answer

The reason your sed command doesn’t work is that it assumes you have a date on every line, which is not the case if some lines come from multi-line error messages. When there is nothing matching the replacement pattern, sed does no replacement and the call stack listings you saw stay in the output.

To get only the dates from lines that have them in the beginning, you have several options:

grep:

grep -Eo '^[0-9.]+' fac.log 

-o tells grep to only print the matching part, instead the whole line, and -E enables “extended” regular expressions

awk:

awk '/^[0-9.]+/ {print $1}' fac.log

First part of the awk command is a regexp match, the rest is what to do with a matching line, here we print the first word on the line.

Perl:

perl -lne 'print $1 if /^([0-9]+)/' fac.log

-l: print a newline on each print, -n: run the command for every line of input (like awk), -e: just tells the program is given on the command line, and not in a file.

In all cases, you get one line of output per matching line of input, i.e. repeating dates. Piping the result through | sort | uniq is probably the simplest idiom to remove duplicates.

Note that I was lazy and used ^[0-9.]+ instead of the longer and more exact pattern. This is related to the reason I like to use perl instead of sed, awk, and friends: Perl regular expression are always the same, regardless of what you are doing. Also in Perl there’s no need to remember which modifiers are supported by default, and which require setting -E or whatever. Then there’s the differences between versions: apparently my Debian systems have mawk instead of GNU awk by default, and it doesn’t seem to support the {N} modifier so the more exact pattern didn’t work. Whoops.

GNU awk manual: “Interval expressions were not traditionally available in awk. They were added as part of the POSIX standard to make awk and egrep consistent with each other.” (ref. https://www.gnu.org/software/gawk/manual/html_node/Regexp-Operators.html#Regexp-Operators)

We are here to answer your question about Extracting a date from a log file and creating a file with unique dates - If you find the proper solution, please don't forgot to share this with your team members.

Related Posts

Tutorial Guruji