What format is this date? [17/6/2015 5:50:22 5 -120]

Context: Getting date from a third party data source as below. I am using Pig script to transform this string to date.

Script:

a= LOAD '/user/hit_data.tsv' using PigStorage('t');  
b= FOREACH a GENERATE $0 as post_t_time_info; 
c= FOREACH b GENERATE ToDate(post_t_time_info,'DD/MM/YYYY HH:mm:ss e ZZZ')

Sample value the date object takes:

17/6/2015 5:50:22 5 -120
17/6/2015 0:7:6 5 240

I am unable to understand what is -120/240. I tried with timezone(ZZZ) and milliseconds (SSS) but appears to be incorrect.

My current format used is ‘DD/MM/YYYY HH:mm:ss e X’, where X is unknown and looking forward for appropriate pattern for it.

Thanks!

Reference: http://userguide.icu-project.org/formatparse/datetime http://www.unicode.org/reports/tr35/tr35-25.html#Time_Zone_Fallback

Answer

Chances are that -120 and 240 are indeed time zone offsets. They are likely in terms of minutes, not hours. However, there’s no standard for that, so it could be minutes east of GMT, or minutes west of GMT.

In other words, -120 could be UTC+02:00, or UTC-02:00. 240 could be UTC+04:00, or UTC-04:00.

For example, if it was obtained from the JavaScript Date object’s getTimezoneOffset function, the sign will be opposite of what you might expect. It would have positive values to the west, while the usual ISO8601 convention has positive values to the east.

Since you are the one obtaining the data, you are in a much better position than us to identify the source and disambiguate. If it’s from a third-party, look in their specs, or contact them and ask.

Also – You said you were using Apache Pig, but according to their documentation, the ToDate function uses Java’s SimpleDateFormat – which does not use the same format qualifiers as ICU, nor does it have a format qualifier that recognizes time zone offsets in terms of minutes. You will likely need to write your own function instead of using just the built-in ToDate.

Leave a Reply

Your email address will not be published. Required fields are marked *