Importing .dat files in python without knowing how it is structured

I am trying to load and see the contents of data which can be downloaded from here. After which I need to analyze it. In this regard, I had already posed on problem, but I could not get any solution.

Now, I went through their label file located here. In that, it is mentioned that

“Will code useful Python based letters to describe each object
/ / see http://docs.python.org/library/struct.html for codes / / formats will comma separated beginning with “RJW,” as key then / / {NAME}, {FORMAT}, {Number of dims}, {Size Dim 1}, {Size Dim 2}, … / / where {FORMAT} is the Python code for the type, i.e. I for uint32 / / and there are as many Size Dim’s as number of dimensions. ”

So, I guess one can try python. I do have a working knowledge in python. So, I started with this program which I got from here (for simplicity python file and the data files are in same folder):

import numpy as np
data = np.genfromtxt('JAD_L30_LRS_ELC_ANY_CNT_2018091_V03.dat')
print(data)

I got the error “UnicodeDecodeError: 'cp949' codec can't decode byte 0xff in position 65: illegal multibyte sequence”.

If I change the code to (as mentioned here):

data=open('JAD_L30_LRS_ELC_ANY_CNT_2018091_V03.DAT', encoding='utf-8')
print(data)

The error message disappears, but all I get is:

<_io.TextIOWrapper name='JAD_L30_LRS_ELC_ANY_CNT_2018091_V03.DAT' mode='r' encoding='utf-8'>

I had checked other answers in in StackOverflow, but could not get any answer. My question may be closely similar to what is posted here

I need to first see the contents of this dat file and then export to other format, say .csv.

Any help will be deeply appreciated…

Answer

You need to open the file in binary mode.

with open('JAD_L30_LRS_ELC_ANY_CNT_2018091_V03.DAT', 'rb') as f:
    while True:
        chunk = f.read(160036) # that is record size as per LBL file
            # because the file is huge it will expect to hit Enter
            # to display next chunk. Use Ctrl+C to interrupt
        print(chunk)
        input('Hit Enter...')

Note, you can parse the LBL file, construct format string to use with struct module and parse each chunk into meaningful fields. That is what the comment you quote is saying.

"""Example of reading NASA JUNO JADE CALIBRATED SCIENCE DATA
https://pds-ppi.igpp.ucla.edu/search/view/?f=yes&id=pds://PPI/JNO-J_SW-JAD-3-CALIBRATED-V1.0/DATA/2018/2018091/ELECTRONS/JAD_L30_LRS_ELC_ANY_CNT_2018091_V03&o=1
https://stackoverflow.com/a/66687113/4046632
"""

import struct
from functools import reduce
from operator import mul
from collections import namedtuple

__author__ = "Boyan Kolev, https://stackoverflow.com/users/4046632/buran"

with open('JAD_L30_LRS_ELC_ANY_CNT_2018091_V03.LBL') as f:
    rjws = [line.strip('n/* ') for line in f if line.startswith('/* RJW')]

# create the format string for struct
rjws = rjws[2:] # exclude first 2 RJW comments related to file itself
names = []
FMT = '='
print(f'Number of objects: {len(rjws)}')
for idx, rjw in enumerate(rjws):
    _, name, fmt, num_dim, *dims = rjw.split(', ')
    fstr = f'{reduce(mul, map(int, dims))}{fmt}'
    FMT = f'{FMT} {fstr}'
    names.append(name)
    print(f'{idx}:{name}, {fstr}')
FMT = FMT.replace('c', 's') # for conveninece treat 21c as s char[]
print(f"Format string: {repr(FMT)}")

# parse DAT file
s = struct.Struct(FMT)
print(f'Struct size:{s.size}')
with open('JAD_L30_LRS_ELC_ANY_CNT_2018091_V03.DAT', 'rb') as f:
    n = 0
    while True: # in python3.8+ this loop can be simplified with walrus operator
        chunk = f.read(s.size)
        if not chunk:
            break
        data = s.unpack_from(chunk)
        # process data further, e.g. split data in 2D containers where appropriate
        n += 1

print(f'Number of records: {n}')

# make a named tuple to represent first 10 fields
# for nice display. This basic use of namedtuple works only
# for first 23 objects, which have single item.
num_fields = 10
Record = namedtuple('Record', names[:num_fields])
record = Record(*data[:num_fields])
print('n----------------------n')
print(f'First {num_fields} fields of the last record.')
print(record)

output:

Number of objects: 49
0:DIM0_UTC, 21c
1:PACKETID, 1B
2:DIM0_UTC_UPPER, 21c

--- omitted for sake of brevity ---

46:DIM2_AZIMUTH_DESPUN_LOWER, 3072f
47:MAG_VECTOR, 3f
48:ESENSOR, 1H
Format string: '= 21s 1B 21s 1b 21s 1b 1H 1B 1B 1B 1B 1h 1h 1f 1f 1f 1f 1f 1f 1f 1f 1f 1f 3f 3f 3f 1f 9f 9f 9f 1f 1I 1I 1H 3072f 3072f 3072f 3072f 3072f 3072f 3072f 3072f 3072f 3072f 3072f 3072f 3072f 3f 1H'
Struct size:160036
Number of records: 1101

----------------------

First 10 fields of the last record.
Record(DIM0_UTC=b'2018-091T23:56:08.925', PACKETID=106, DIM0_UTC_UPPER=b'2018-092T00:01:08.925', PACKET_MODE=1, DIM0_UTC_LOWER=b'2018-091T23:51:08.925', PACKET_SPECIES=-1, ACCUMULATION_TIME=600, DATA_UNITS=2, SOURCE_BACKGROUND=3, SOURCE_DEAD_TIME=0)

Link to GutHub gist

Leave a Reply

Your email address will not be published. Required fields are marked *