I working on HTTP Traffic Data set which is composed of complete POST and GET request Like given below. I have written code in java that has separated each of these request and saved it as string element in array list. Now i am confused how to parse these raw HTTP request in java is there any method better than manual parsing?
GET http://localhost:8080/tienda1/imagenes/3.gif/ HTTP/1.1 User-Agent: Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.8 (like Gecko) Pragma: no-cache Cache-control: no-cache Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Encoding: x-gzip, x-deflate, gzip, deflate Accept-Charset: utf-8, utf-8;q=0.5, *;q=0.5 Accept-Language: en Host: localhost:8080 Cookie: JSESSIONID=FB018FFB06011CFABD60D8E8AD58CA21 Connection: close
I [am] working on [an] HTTP Traffic Data set which is composed of complete POST and GET request[s]
So you want to parse a file or list that contains multiple HTTP requests. What data do you want to extract? Anyway here is a Java HTTP parsing class, which can read the method, version and URI used in the request-line, and that reads all headers into a Hashtable.
You can use that one or write one yourself if you feel like reinventing the wheel. Take a look at the RFC to see what a request looks like in order to parse it correctly:
Request = Request-Line ; Section 5.1 *(( general-header ; Section 4.5 | request-header ; Section 5.3 | entity-header ) CRLF) ; Section 7.1 CRLF [ message-body ] ; Section 4.3