Context: Getting date from a third party data source as below. I am using Pig script to transform this string to date.
a= LOAD '/user/hit_data.tsv' using PigStorage('t'); b= FOREACH a GENERATE $0 as post_t_time_info; c= FOREACH b GENERATE ToDate(post_t_time_info,'DD/MM/YYYY HH:mm:ss e ZZZ')
Sample value the date object takes:
17/6/2015 5:50:22 5 -120 17/6/2015 0:7:6 5 240
I am unable to understand what is -120/240. I tried with timezone(ZZZ) and milliseconds (SSS) but appears to be incorrect.
My current format used is ‘DD/MM/YYYY HH:mm:ss e X’, where X is unknown and looking forward for appropriate pattern for it.
Chances are that
240 are indeed time zone offsets. They are likely in terms of minutes, not hours. However, there’s no standard for that, so it could be minutes east of GMT, or minutes west of GMT.
In other words,
-120 could be
240 could be
getTimezoneOffset function, the sign will be opposite of what you might expect. It would have positive values to the west, while the usual ISO8601 convention has positive values to the east.
Since you are the one obtaining the data, you are in a much better position than us to identify the source and disambiguate. If it’s from a third-party, look in their specs, or contact them and ask.
Also – You said you were using Apache Pig, but according to their documentation, the
ToDate function uses Java’s
SimpleDateFormat – which does not use the same format qualifiers as ICU, nor does it have a format qualifier that recognizes time zone offsets in terms of minutes. You will likely need to write your own function instead of using just the built-in