madrigal.cedar module
cedar is the module that allows the creation and editing of Cedar files.
With Madrigal 3.0, this module was rewritten to work with the power of the hdf5 I/O library. Editing a file now never needs to load the file into memory. Instead, only the slice needing modification is read in.
Any input file must now be an Hdf5 one. However, this module can output a file in any format.
The underlying data structure is a h5py dataset object, which is read from the hdf5 file under the group "Data" and the dataset name "Table Layout". Data not yet written in may also be stored in a numpy recarray with the same dtype.
With Madrigal 2.X, this module abstracts away many of the details of the Cedar file format, which may change in the future, and instead simply follows the Cedar data model (Prolog-1d parms-2d parms). For example, all values are accessed and set via doubles, so users do not need to deal with scale factors, "additional increment" parameters, etc. The old Cedar file format had limited dynamic range, so for now an exception is raised if any value would violate this limited range. Furthermore, the Cedar data model defines time in the prolog. Unfortunately, time may also be set in the data itself, leading to the possibility of inconsistent data. This module will issue warnings in that case.
With Madrigal 3.0, this module is built on top of the h5py hdf5 interface
$Id: cedar.py 7044 2019-10-07 19:13:16Z brideout $
"""cedar is the module that allows the creation and editing of Cedar files.
With Madrigal 3.0, this module was rewritten to work with the power of the hdf5 I/O library. Editing a file
now never needs to load the file into memory. Instead, only the slice needing modification is read in.
Any input file must now be an Hdf5 one. However, this module can output a file in any format.
The underlying data structure is a h5py dataset object, which is read from the hdf5 file under the group "Data" and the
dataset name "Table Layout". Data not yet written in may also be stored in a numpy recarray with the
same dtype.
With Madrigal 2.X, this module abstracts away many of the details of the Cedar file format, which may change in the future,
and instead simply follows the Cedar data model (Prolog-1d parms-2d parms). For example, all values are accessed and set via doubles, so
users do not need to deal with scale factors, "additional increment" parameters, etc. The old Cedar file format had
limited dynamic range, so for now an exception is raised if any value would violate this limited range. Furthermore,
the Cedar data model defines time in the prolog. Unfortunately, time may also be set in the data itself, leading
to the possibility of inconsistent data. This module will issue warnings in that case.
With Madrigal 3.0, this module is built on top of the h5py hdf5 interface
$Id: cedar.py 7044 2019-10-07 19:13:16Z brideout $
"""
# standard python imports
import os, os.path, sys
import array
import types
import time
import datetime
import traceback
import itertools
import re
import warnings
import subprocess
import shlex
import shutil
import copy
import collections
# third party imports
import numpy
import numpy.lib.recfunctions
import netCDF4
import h5py
# Millstone imports
import madrigal.metadata
import madrigal.data
import madrigal.admin
# cedar special values
missing = numpy.nan
assumed = -1.0
knownbad = -2.0
timeParms = 30 # any Cedar parameter below this number is a time parameter that conflicts with prolog
def getLowerCaseList(l):
"""getLowerCaseList returns a new list that is all lowercase given an input list
that may have upper case strings
"""
return([s.lower() for s in l])
def parseArraySplittingParms(hdf5Filename):
"""parseArraySplittingParms returns a (possibly empty) list of parameter mnemonic used to split
the array layout for a Madrigal Hdf5 file
Input: hdf5Filename - Madrigal Hdf5 filename
Raise IOError if not a valid Madrigal Hdf5 file
"""
with h5py.File(hdf5Filename, 'r') as f:
try:
dataGroup = f['Data']
except:
raise IOError('Hdf5 file %s does not have required top level Data group' % (hdf5Filename))
retList = []
if 'Array Layout' not in list(dataGroup.keys()):
return(retList) # no array layout in this file
arrGroup = dataGroup['Array Layout']
for key in list(arrGroup.keys()):
if key.find('Array with') != -1:
items = key.split()
for item in items:
if item.find('=') != -1:
subitems = item.split('=')
parm = subitems[0].lower()
parm = parm.encode('ascii','ignore')
retList.append(parm)
return(retList)
# None found
return(retList)
def listRecords(hdf5Filename, newFilename=None, addedLinkFormat=None):
"""listRecords outputs a summary of records in the hdf5Filename. Is lower memory footprint than
loading full file into memory, then calling loadNextRecords. However, if the file is already fully
in memory, than classing the MadrigalCedarFile method loadNextRecords is faster.
Inputs:
hdf5Filename - input Madrigal hdf5 file to use
newFilename - name of new file to create and write to. If None, the default, write to stdout
addedLinkFormat - if not None, add link to end of each record with value addedLinkFormat % (recno).
If None (the default) no added link. Must contain one and only one integer format to be
filled in by recno.
"""
madParmObj = madrigal.data.MadrigalParameters()
madDataObj = madrigal.data.MadrigalFile(hdf5Filename)
formatStr = '%6i: %s %s'
headerStr = ' record start_time end_time'
parms = []
kinstList = madDataObj.getKinstList()
if len(kinstList) > 1:
formatStr += ' %i'
headerStr += ' kinst'
parms.append('kinst')
kindatList = madDataObj.getKindatList()
if len(kindatList) > 1:
formatStr += ' %i'
headerStr += ' kindat'
parms.append('kindat')
if newFilename is not None:
f = open(newFilename, 'w')
else:
f = sys.stdout
if not addedLinkFormat is None:
formatStr += ' ' + addedLinkFormat
headerStr += ' record_plot'
formatStr += '\n'
f.write('%s\n' % headerStr)
# read in data from file
with h5py.File(hdf5Filename, 'r') as fi:
table = fi['Data']['Table Layout']
recno = table['recno']
ut1_unix = table['ut1_unix']
ut2_unix = table['ut2_unix']
if 'kinst' in parms:
kinst = table['kinst']
if 'kindat' in parms:
kindat = table['kindat']
max_recno = int(recno[-1])
for index in range(max_recno + 1):
i = numpy.searchsorted(recno, index)
this_ut1_unix = ut1_unix[i]
this_ut2_unix = ut2_unix[i]
if 'kinst' in parms:
this_kinst = kinst[i]
if 'kindat' in parms:
this_kindat = kindat[i]
sDT = datetime.datetime.utcfromtimestamp(this_ut1_unix)
sDTStr = sDT.strftime('%Y-%m-%d %H:%M:%S')
eDT = datetime.datetime.utcfromtimestamp(this_ut2_unix)
eDTStr = eDT.strftime('%Y-%m-%d %H:%M:%S')
data = [index, sDTStr, eDTStr]
if 'kinst' in parms:
data.append(this_kinst)
if 'kindat' in parms:
data.append(this_kindat)
if not addedLinkFormat is None:
data += [index]
f.write(formatStr % tuple(data))
if f != sys.stdout:
f.close()
class MadrigalCedarFile:
"""MadrigalCedarFile is an object that allows the creation and editing of Cedar files.
This class emulates a python list, and so users may treat it just like a python list. The
restriction enforced is that all items in the list must be either MadrigalCatalogRecords,
MadrigalHeaderRecords, or MadrigalDataRecords (all also defined in the madrigal.cedar module).
Each of these three classes supports the method getType(), which returns 'catalog', 'header',
and 'data', respectively.
There are two programming patterns to choice from when using this module. For smaller input or output files,
read in files using the default maxRecords=None, so that the entire file is read into memory. Output files
using write for hdf5 or netCDF4 output, or writeText with the default append=False for text files. This
will be somewhat faster than using the larger file pattern below.
For larger files, init the reading with maxRecords = some value, and then read in the rest using loadNextRecords.
Write hdf5 file with a series of dumps, and then close with close(). Write text files using writeText with append
= True. Write large netCDF4 files by first creating a large Hdf5 file as above, and then use convertToNetCDF4
to create the large netCDF4 file. This approach is somewhat slower, but has a limited memory footprint.
Usage example::
# the following example inserts a catalog record at the beginning of an existing file
import madrigal.cedar.MadrigalCedarFile, time
cedarObj = madrigal.cedar.MadrigalCedarFile('/opt/madrigal/experiments/1998/mlh/20jan98/mil980120g.003.hdf5')
startTime = time.mktime((1998,1,20,0,0,0,0,0,0)) - time.timezone
endTime = time.mktime((1998,1,21,23,59,59,0,0,0)) - time.timezone
# catLines is a list of 80 character lines to be included in catalog record
catObj = madrigal.cedar.MadrigalCatalogRecord(31, 1000, 1998,1,20,0,0,0,0,
1998,1,21,23,59,59,99, catLines)
cedarObj.insert(0, catObj)
cedarObj.write()
Non-standard Python modules used: None
Change history:
Major rewrite in Jan 2013 as moved to Hdf5 and Madrigal 3.0
Written by "Bill Rideout":mailto:wrideout@haystack.mit.edu April. 6, 2005
"""
# cedar special values
missing = numpy.nan
missing_int = numpy.iinfo(numpy.int64).min
requiredFields = ('year', 'month', 'day', 'hour', 'min', 'sec', 'recno', 'kindat', 'kinst',
'ut1_unix', 'ut2_unix')
def __init__(self, fullFilename,
createFlag=False,
startDatetime=None,
endDatetime=None,
maxRecords=None,
recDset=None,
arraySplitParms=None,
skipArray=False):
"""__init__ initializes MadrigalCedarFile by reading in existing file, if any.
Inputs:
fullFilename - either the existing Cedar file in Hdf5 format,
or a file to be created. May also be None if this
data is simply derived parameters that be written to stdout.
createFlag - tells whether this is a file to be created. If False and
fullFilename cannot be read, an error is raised. If True and
fullFilename already exists, or fullFilename cannot be created,
an error is raised.
startDatetime - if not None (the default), reject all input records where
record end time < startDatetime (datetime.datetime object).
Ignored if createFlag == True
endDatetime - if not None (the default), reject all input records where
record start time > endDatetime (datetime.datetime object)
Ignored if createFlag == True
maxRecords - the maximum number of records to read into memory
Ignored if createFlag == True
recDset - a numpy recarray with column names the names of all parameters, starting
with requiredFields. Values are 1 for 1D (all the required parms are 1D), 2 for
dependent 2D, and 3 for independent spatial 2D parameters. If None, self._recDset
not set until first data record appended.
arraySplitParms - if None (the default), read in arraySplitParms from the exiting file.
Otherwise set self._arraySplitParms to arraySplitParms, which is a list of 1D or 2D parms
where each unique set of value of the parms in this list will be used to split the full
data into separate arrays in Hdf5 or netCDF4 files. For example arraySplitParms=['beamid']
would split the data into separate arrays for each beamid. If None and new file is being
created, no array splitting (self._arraySplitParms = []).
skipArray - if False and any 2D parms, create array layout (the default). If True, skip array
layout (typically used when there are too many ind parm value combinations - generally not recommended).
Affects: populates self._privList if file exists. self._privList is the underlying
list of MadrigalDataRecords, MadrigalCatalogRecords, and MadrigalHeaderRecords.
Also populates:
self._tableDType - the numpy dtype to use to build the table layout
self._nextRecord - the index of the next record to read from the input file. Not used if
createFlag = True
(The following are the input arguments described above)
self._fullFilename
self._createFlag
self._startDatetime
self._endDatetime
self._maxRecords
self._totalDataRecords - number of data records appended (may differ from len(self._privList)
if dump called).
self._minMaxParmDict - a dictionary with key = parm mnems, value = tuple of
min, max values (may be nan)
self._arrDict - a dictionary with key = list of array split parm values found in file,
('' if no spliting), and values = dict of key = 'ut1_unix' and ind 2d parm names (excluding
array splitting parms, if also ind 2D parm), and values
= python set of all unique values. Populated only if createFlag=True. Used to create
Array Layout
self._recIndexList - a list of (startIndex, endIndex) for each data record added. Used to slice out
data records from Table Layout
self._num2DSplit - Number of arraySplitParms that are 2D
self._closed - a boolean used to determine if the file being created was already closed
Returns: void
"""
self._privList = []
self._fullFilename = fullFilename
self._startDatetime = startDatetime
self._endDatetime = endDatetime
self._maxRecords = maxRecords
self._totalDataRecords = 0
self._nextRecord = 0
self._tableDType = None # will be set to the dtype of Table Layout
self._oneDList = None # will be set when first data record appended
self._twoDList = None # will be set when first data record appended
self._ind2DList = None # will be set when first data record appended
self._arraySplitParms = arraySplitParms
self._skipArray = bool(skipArray)
if createFlag:
self._closed = False
else:
self._closed = True # no need to close file only being read
self._hdf5Extensions = ('.hdf5', '.h5', '.hdf')
# keep track of earliest and latest record times
self._earliestDT = None
self._latestDT = None
# summary info
self._experimentParameters = None
self._kinstList = [] # a list of all kinsts integers in file
self._kindatList = [] # a list of all kindats integers in file
self._status = 'Unknown' # can be externally set
self._format = None # used to check that partial writes via dump are consistent
if createFlag not in (True, False):
raise ValueError('in MadrigalCedarFile, createFlag must be either True or False')
self._createFlag = createFlag
if createFlag == False:
if not os.access(fullFilename, os.R_OK):
raise ValueError('in MadrigalCedarFile, fullFilename %s does not exist' % (str(fullFilename)))
if not fullFilename.endswith(self._hdf5Extensions):
raise IOError('MadrigalCedarFile can only read in CEDAR Hdf5 files, not %s' % (fullFilename))
if createFlag == True:
if fullFilename != None: # then this data will never be persisted - only written to stdout
if os.access(fullFilename, os.R_OK):
raise ValueError('in MadrigalCedarFile, fullFilename %s already exists' % (str(fullFilename)))
if not os.access(os.path.dirname(fullFilename), os.W_OK):
raise ValueError('in MadrigalCedarFile, fullFilename %s cannot be created' % (str(fullFilename)))
if not fullFilename.endswith(self._hdf5Extensions):
raise IOError('All Madrigal files must end with hdf5 extension, <%s> does not' % (str(fullFilename)))
if self._arraySplitParms is None:
self._arraySplitParms = []
# create needed Madrigal objects
self._madDBObj = madrigal.metadata.MadrigalDB()
self._madInstObj = madrigal.metadata.MadrigalInstrument(self._madDBObj)
self._madParmObj = madrigal.data.MadrigalParameters(self._madDBObj)
self._madKindatObj = madrigal.metadata.MadrigalKindat(self._madDBObj)
if not self._arraySplitParms is None:
self._arraySplitParms = [self._madParmObj.getParmMnemonic(p).lower() for p in self._arraySplitParms]
self._minMaxParmDict = {}
self._arrDict = {}
self._num2DSplit = None # will be set to bool when first record added
self._recIndexList = []
if recDset is not None:
self._recDset = recDset
else:
self._recDset = None
if createFlag == False:
self.loadNextRecords(self._maxRecords)
def loadNextRecords(self, numRecords=None, removeExisting=True):
"""loadNextRecords loads a maximum of numRecords. Returns tuple of the the number of records loaded, and boolean of whether complete.
May be less than numRecords if not enough records in the input file. Returns 0 if no records left.
Inputs:
numRecords - number of records to try to load. If None, load all remaining records
removeExisting - if True (the default), remove existing records before loading new
ones. If False, append new records to existing records.
Returns:
tuple of the the number of records loaded, and boolean of whether complete.
May be less than numRecords if not enough records.
Raises error if file opened with createFlag = True
"""
if self._createFlag:
raise IOError('Cannot call loadNextRecords when creating a new MadrigalCedarFile')
if removeExisting:
self._privList = []
isComplete = False
hdfFile = h5py.File(self._fullFilename, 'r')
tableDset = hdfFile["Data"]["Table Layout"]
metadataGroup = hdfFile["Metadata"]
recDset = metadataGroup["_record_layout"]
if self._nextRecord == 0:
if self._recDset is None:
self._recDset = recDset[()]
elif self._recDset != recDset:
raise IOError('recDset in first record <%s> does not match expected recDset <%s>' % \
(str(recDset), str(self._recDset)))
self._verifyFormat(tableDset, recDset)
self._tableDType = tableDset.dtype
self._experimentParameters = numpy.array(hdfFile["Metadata"]['Experiment Parameters'])
self._kinstList = self._getKinstList(self._experimentParameters)
self._kindatList = self._getKindatList(self._experimentParameters)
if self._arraySplitParms is None:
self._arraySplitParms = self._getArraySplitParms(hdfFile["Metadata"])
if 'Experiment Notes' in list(hdfFile["Metadata"].keys()):
self._appendCatalogRecs(hdfFile["Metadata"]['Experiment Notes'])
self._appendHeaderRecs(hdfFile["Metadata"]['Experiment Notes'])
if self._ind2DList is not None:
parmObjList = (self._oneDList, self._twoDList, self._ind2DList) # used for performance in load
else:
parmObjList = None
# get indices for each record
recLoaded = 0
recTested = 0
if not hasattr(self, 'recnoArr'):
self.recnoArr = tableDset['recno']
# read all the records in at once for performance
if not numRecords is None:
indices = numpy.searchsorted(self.recnoArr, numpy.array([self._nextRecord, self._nextRecord + numRecords]))
tableIndices = numpy.arange(indices[0], indices[1])
if len(tableIndices) > 0:
fullTableSlice = tableDset[tableIndices[0]:tableIndices[-1]+1]
fullRecnoArr = fullTableSlice['recno']
else:
fullTableSlice = tableDset
fullRecnoArr = self.recnoArr
while(True):
if not numRecords is None:
if len(tableIndices) == 0:
isComplete = True
break
if numRecords:
if recTested >= numRecords:
break
# get slices of tableDset and recDset to create next MadrigalDataRecord
indices = numpy.searchsorted(fullRecnoArr, numpy.array([self._nextRecord, self._nextRecord + 1]))
tableIndices = numpy.arange(indices[0], indices[1])
if len(tableIndices) == 0:
isComplete = True
break
tableSlice = fullTableSlice[tableIndices[0]:tableIndices[-1]+1]
self._recIndexList.append((tableIndices[0],tableIndices[-1]+1))
self._nextRecord += 1
firstRow = tableSlice[0]
startDT = datetime.datetime.utcfromtimestamp(firstRow['ut1_unix'])
stopDT = datetime.datetime.utcfromtimestamp(firstRow['ut2_unix'])
if firstRow['kinst'] not in self._kinstList:
self._kinstList.append(firstRow['kinst'])
if firstRow['kindat'] not in self._kindatList:
self._kindatList.append(firstRow['kindat'])
# find earliest and latest times
if self._earliestDT is None:
self._earliestDT = startDT
self._latestDT = stopDT
else:
if startDT < self._earliestDT:
self._earliestDT = startDT
if stopDT > self._latestDT:
self._latestDT = stopDT
recTested += 1 # increment here because the next step may reject it
# check if datetime filter should be applied
if not self._startDatetime is None or not self._endDatetime is None:
if not self._startDatetime is None:
if stopDT < self._startDatetime:
continue
if not self._endDatetime is None:
if startDT > self._endDatetime:
isComplete = True
break
if self._ind2DList is None:
try:
indParmList = metadataGroup['Independent Spatial Parameters']['mnemonic']
indParms = [item.decode('utf-8') for item in indParmList]
except:
indParms = []
else:
indParms = self._ind2DList
newMadDataRec = MadrigalDataRecord(madInstObj=self._madInstObj, madParmObj=self._madParmObj,
dataset=tableSlice, recordSet=self._recDset,
parmObjList=parmObjList, ind2DList=indParms)
if self._ind2DList is None:
self._oneDList = newMadDataRec.get1DParms()
self._twoDList = newMadDataRec.get2DParms()
self._ind2DList = newMadDataRec.getInd2DParms()
parmObjList = (self._oneDList, self._twoDList, self._ind2DList) # used for performance in load
# set self._num2DSplit
twoDSet = set([o.mnemonic for o in self._twoDList])
arraySplitSet = set(self._arraySplitParms)
self._num2DSplit = len(twoDSet.intersection(arraySplitSet))
self._privList.append(newMadDataRec)
recLoaded += 1
hdfFile.close()
# update minmax
if self._totalDataRecords > 0:
self.updateMinMaxParmDict()
return((recLoaded, isComplete))
def write(self, format='hdf5', newFilename=None, refreshCatHeadTimes=True,
arraySplittingParms=None, skipArrayLayout=False, overwrite=False):
"""write persists a MadrigalCedarFile to file.
Note: There are two ways to write to a MadrigalCedarFile. Either this method (write) is called after all the
records have been appended to the MadrigalCedarFile, or dump is called after a certain number of records are appended,
and then at the end dump is called a final time if there were any records not yet dumped, followed by close.
The __del__ method will automatically call close if needed, and print a warning that the user should add it to
their code.
write has the advantage of being simplier, but has the disadvantage for larger files of keeping all those records
in memory. dump/close has the advantage of significantly reducing the memory footprint, but is somewhat more complex.
Inputs:
format - a format to save the file in. For now, the allowed values are
'hdf5' and 'netCDF4'. Defaults to 'hdf5'. Use writeText method to get text output.
newFilename - a filename to save to. Defaults to self._fullFilename passed into initializer if not given.
refreshCatHeadTimes - if True (the default), update start and and times in the catalog and header
records to represent the times in the data. If False, use existing times in those records.
skipArrayLayout - if True, do not include Array Layout even if there are independent spatial
parameters. If False (the default) write Array Layout if there are independent spatial
parameters and format = 'hdf5'
arraySplittingParms - a list of parameters as mnemonics used to split
arrays into subarrays. For example, beamcode would split data with separate beamcodes
into separate arrays. The number of separate arrays will be up to the product of the number of
unique values found for each parameter, with the restriction that combinations with no records will
not create a separate array. If default None passed in, then set to self._arraySplitParms,
set when CEDAR file read in.
overwrite - if False (the default) do not overwrite existing file. If True, overwrite file is it already exists.
Outputs: None
Affects: writes a MadrigalCedarFile to file
"""
if self._format != None:
raise ValueError('Cannot call write method after calling dump method')
if newFilename is None:
newFilename = self._fullFilename
if format not in ('hdf5', 'netCDF4'):
raise ValueError('Illegal format <%s> - must be hdf5 or netCDF4' % (format))
if os.access(newFilename, os.R_OK) and not overwrite:
raise IOError('newFilename <%s> already exists' % (newFilename))
self._format = format
if arraySplittingParms is None:
arraySplittingParms = self._arraySplitParms
if arraySplittingParms is None:
arraySplittingParms = []
if self._format == 'hdf5':
if not newFilename.endswith(self._hdf5Extensions):
raise IOError('filename must end with %s, <%s> does not' % (str(self._hdf5Extensions), newFilename))
try:
# we need to make sure this file is closed and then deleted if an error
f = None # used if next line fails
f = h5py.File(newFilename, 'w')
self._writeHdf5Metadata(f, refreshCatHeadTimes)
self._writeHdf5Data(f)
if len(self.getIndSpatialParms()) > 0:
self._createArrayLayout(f, arraySplittingParms)
f.close()
except:
# on any error, close and delete file, then reraise error
if f:
f.close()
if os.access(newFilename, os.R_OK):
os.remove(newFilename)
raise
elif self._format == 'netCDF4':
try:
# we need to make sure this file is closed and then deleted if an error
f = None # used if next line fails
f = netCDF4.Dataset(newFilename, 'w', format='NETCDF4')
self._writeNetCDF4(f, arraySplittingParms)
f.close()
except:
# on any error, close and delete file, then reraise error
if f:
f.close()
if os.access(newFilename, os.R_OK):
os.remove(newFilename)
raise
self._closed = True # write ends with closed file
def dump(self, format='hdf5', newFilename=None, parmIndexDict=None):
"""dump appends all the present records in MadrigalCedarFile to file, and removes present data records from MadrigalCedarFile.
Can be used to append records to a file. Catalog and header records are maintaained.
Typically close is called after all calls to dump. The __del__ method will automatically call
close if needed, and print a warning that the user should add it to their code.
Inputs:
format - a format to save the file in. The format argument only exists for backwards
compatibility - only hdf5 is allowed. IOError raised is any other argument given.
newFilename - a filename to save to. Defaults to self._fullFilename passed into initializer if not given.
parmIndexDict - used only for dumping netCDF4
Outputs: None
Affects: writes a MadrigalCedarFile to file
"""
if self._format != None:
if self._format != format:
raise ValueError('Previous dump format was %s, cannot now use %s' % (str(self._format), str(format)))
if format not in ('hdf5', 'netCDF4'):
raise ValueError('Format must be hdf5 for dump, not %s' % (str(format)))
if newFilename is None:
newFilename = self._fullFilename
if self._format is None:
# first write - run checks, and create all possible metadata and data
if os.access(newFilename, os.R_OK):
raise IOError('newFilename <%s> already exists' % (newFilename))
if format == 'hdf5':
if not newFilename.endswith(tuple(list(self._hdf5Extensions) + ['.nc'])):
raise IOError('filename must end with %s, <%s> does not' % (str(tuple(list(self._hdf5Extensions) + ['.nc'])), newFilename))
elif format == 'netCDF4':
if not newFilename.endswith('.nc'):
raise IOError('filename must end with %s, <%s> does not' % ('.nc', newFilename))
if len(self._privList) == 0:
# nothing to dump
return
if format == 'hdf5':
try:
# we need to make sure this file is closed and then deleted if an error
f = None # used if next line fails
f = h5py.File(newFilename, 'a')
self._closed = False
if self.hasArray(f):
raise IOError('Cannot call dump for hdf5 after write or close')
self._writeHdf5Data(f)
f.close()
except:
# on any error, close and delete file, then reraise error
if f:
f.close()
if os.access(newFilename, os.R_OK):
os.remove(newFilename)
raise
elif format == 'netCDF4':
if len(self._arraySplitParms) != 0:
raise IOError('Cannot dump netCDF4 files with arraySplitParms - write to Hdf5 and then convert')
if self._format is None:
# first write
f = netCDF4.Dataset(newFilename, 'w', format='NETCDF4')
self._firstDumpNetCDF4(f, parmIndexDict)
f.close()
else:
f = netCDF4.Dataset(newFilename, 'a', format='NETCDF4')
self._appendNetCDF4(f, parmIndexDict)
f.close()
self._format = format
# dump data records out of memory
self._privList = [rec for rec in self._privList if not rec.getType() == 'data']
def close(self):
"""close closes an open MadrigalCedarFile. It calls _writeHdf5Metadata and _addArray if ind parms.
Most be called directly when dump used.
"""
if self._closed:
# nothing to do
return
with h5py.File(self._fullFilename, 'a') as f:
self._writeHdf5Metadata(f, refreshCatHeadTimes=True)
if len(self.getIndSpatialParms()) > 0:
if not self._skipArray:
self._addArrayDump()
self._closed = True
def writeText(self, newFilename=None, summary='plain', showHeaders=False, selectParms=None,
filterList=None, missing=None, assumed=None, knownbad=None, append=False,
firstWrite=False):
"""writeText writes text to new filename
Inputs:
newFilename - name of new file to create and write to. If None, the default, write to stdout
summary - type of summary line to print at top. Allowed values are:
'plain' - text only mnemonic names, but only if not showHeaders
'html' - mnemonic names wrapped in standard javascript code to allow descriptive popups
'summary' - print overview of file and filters used. Also text only mnemonic names,
but only if not showHeaders
None - no summary line
showHeaders - if True, print header in format for each record. If False, the default,
do not.
selectParms - If None, simply print all parms that are in the file. If a list
of parm mnemonics, print only those parms in the order specified.
filterList - a list of madrigal.derivation.MadrigalFilter objects to be described in the
summary. Default is None, in which case not described in summary. Ignored if summary
is not 'summary'
missing, assumed, knownbad - how to print Cedar special values. Default is None for
all, so that value printed in value in numpy table as per spec.
append - if True, open newFilename in append mode, and dump records after writing. If False,
open in write mode. Used to allow writing in conjuction with loadNextRecords.
firstWrite - True if this is the first group of records added, and append mode is True.
Used to know whether to write summary lines. If False and append is True, no summary
lines are added; if True and append is True, summary lines are added. If append is not
True, this argument ignored.
"""
# constants
_underscore = 95 # used to indicate missing character
if newFilename is not None:
if append:
f = open(newFilename, 'a')
else:
f = open(newFilename, 'w')
else:
f = sys.stdout
if summary not in ('plain', 'summary', 'html', None):
raise ValueError('Illegal summary value <%s>' % (str(summary)))
# cache information needed to replace special values if needed
# helps performance when replacing
if missing is not None:
missing = str(missing)
missing_len = len(missing)
missing_search = '\ ' * max(0, missing_len-3) + 'nan'
if missing_len < 3:
missing = ' ' * (3-missing_len) + missing
if assumed is not None:
assumed = str(assumed)
assumed_len = len(assumed)
assumed_search = '\ ' * max(0, assumed_len-3) + 'inf'
if assumed_len < 3:
assumed = ' ' * (3-assumed_len) + assumed
if knownbad is not None:
knownbad = str(knownbad)
knownbad_len = len(knownbad)
knownbad_search = '\ ' * max(0, knownbad_len-4) + '-inf'
if knownbad_len < 4:
knownbad = ' ' * (4-knownbad_len) + knownbad
# create format string and header strings
formatStr = ''
parmStr = ''
if not selectParms is None:
names = selectParms
else:
names = self._tableDType.names
for parm in names:
parm = parm.upper()
format = self._madParmObj.getParmFormat(parm)
try:
# first handle float formats
dataWidth = int(format[1:format.find('.')])
# make sure width is big enough for special values
newDataWidth = dataWidth
if missing is not None:
newDataWidth = max(newDataWidth, len(missing)+1)
if self._madParmObj.isError(parm):
if assumed is not None:
newDataWidth = max(newDataWidth, dataWidth, len(assumed)+1)
if knownbad is not None:
newDataWidth = max(newDataWidth, dataWidth, len(knownbad)+1)
if newDataWidth > dataWidth:
# we need to expand format
format = '%%%i%s' % (newDataWidth, format[format.find('.'):])
dataWidth = newDataWidth
except ValueError:
# now handle integer or string formats - assumed to never be error values
if format.find('i') != -1:
if len(format) == 2:
# we need to insert a length
format = '%%%ii' % (self._madParmObj.getParmWidth(parm)-1)
dataWidth = self._madParmObj.getParmWidth(parm)
else:
dataWidth = int(format[1:-1])
elif format.find('S') != -1 or format.find('s') != -1:
dataWidth = int(format[1:-1])
else:
raise
width = max(self._madParmObj.getParmWidth(parm), dataWidth)
formatStr += '%s' % (format)
formatStr += ' ' * (max(1, width-dataWidth)) # sets spacing between numbers
if len(parm) >= width-1:
# need to truncate name
if summary != 'html':
parmStr += parm[:width-1] + ' '
else:
parmStr += "%s " % (parm[:width-1].upper(),
self._madParmObj.getSimpleParmDescription(parm),
self._madParmObj.getParmUnits(parm),
parm[:width-1].upper())
else:
# pad evenly on both sides
firstHalfSpace = int((width-len(parm))/2)
secHalfSpace = int((width-len(parm)) - firstHalfSpace)
if summary != 'html':
parmStr += ' ' * firstHalfSpace + parm.upper() + ' ' * secHalfSpace
else:
parmStr += ' ' * firstHalfSpace
parmStr += "%s" % (parm[:width-1].upper(),
self._madParmObj.getSimpleParmDescription(parm),
self._madParmObj.getParmUnits(parm),
parm[:width-1].upper())
parmStr += ' ' * secHalfSpace
formatStr += '\n'
firstHeaderPrinted = False # state variable for adding extra space between lines
if summary == 'summary':
if not append or (append and firstWrite):
self._printSummary(f, filterList)
if summary in ('plain', 'summary', 'html') and not showHeaders:
if not append or (append and firstWrite):
# print single header at top
f.write('%s\n' % (parmStr))
if summary == 'html':
f.write('
\n')
if len(self._privList) == 0:
# nothing more to write
if f != sys.stdout:
f.close()
return
# see if only 1D parms are selected, which implies printing only a single line per record
is1D = False
if not selectParms is None:
#make sure its a lowercase list
selectParms = list(selectParms)
selectParms = getLowerCaseList(selectParms)
# see if only 1D parameters are being printed, so that we should only print the first row
is1D = True
recordset = self.getRecordset()
for parm in selectParms:
if recordset[parm][0] != 1:
is1D = False
break
for rec in self._privList:
if rec.getType() != 'data':
continue
if showHeaders:
kinst = rec.getKinst()
instDesc = self._madInstObj.getInstrumentName(kinst)
sDT = rec.getStartDatetime()
sDTStr = sDT.strftime('%Y-%m-%d %H%M:%S')
eDT = rec.getEndDatetime()
eDTStr = eDT.strftime('%H%M:%S')
headerStr = '%s: %s-%s\n' % (instDesc, sDTStr, eDTStr)
if firstHeaderPrinted or summary is None:
f.write('\n%s' % (headerStr))
else:
f.write('%s' % (headerStr))
firstHeaderPrinted = True
f.write('%s\n' % (parmStr))
dataset = rec.getDataset()
if not selectParms is None:
recnoSet = dataset['recno'].copy() # used to see if we are at a new record
dataset_view = dataset[selectParms].copy()
else:
dataset_view = dataset
# modify special values if required
if assumed is not None or knownbad is not None:
for name in dataset_view.dtype.names:
if self._madParmObj.isError(name) and not self.parmIsInt(name):
if assumed is not None:
# set all -1 values to inf
assumedIndices = numpy.where(dataset_view[name] == -1.0)
if len(assumedIndices):
dataset_view[name][assumedIndices] = numpy.Inf
if knownbad is not None:
# set all -2 values to ninf
knownbadIndices = numpy.where(dataset_view[name] == -2.0)
if len(knownbadIndices):
dataset_view[name][knownbadIndices] = numpy.NINF
lastRecno = None
for i in range(len(dataset_view)):
if not selectParms is None:
thisRecno = recnoSet[i]
if is1D and (thisRecno == lastRecno):
continue
data = tuple(list(dataset_view[i]))
try:
text = formatStr % data
except:
# something bad happened - give up and just convert data to a string
textList = [str(item) for item in data]
delimiter = ' '
text = delimiter.join(textList) + '\n'
# modify special values if required
if missing is not None:
if text.find('nan') != -1:
text = re.sub(missing_search, missing, text)
if knownbad is not None:
if text.find('-inf') != -1:
text = re.sub(knownbad_search, knownbad, text)
if assumed is not None:
if text.find('inf') != -1:
text = re.sub(assumed_search, assumed, text)
if summary != 'html':
f.write(text)
else:
f.write(text.replace(' ', ' '))
if summary == 'html':
f.write('
\n')
if not selectParms is None:
lastRecno = thisRecno
if f != sys.stdout:
f.close()
if append:
# remove all records
self._privList = []
def getDType(self):
"""getDType returns the dtype of the table array in this file
"""
return(self._tableDType)
def setDType(self, dtype):
"""setDType sets the dtype of the table array
"""
self._tableDType = dtype
def getRecDType(self):
"""getRecDType returns the dtype of _record_layout
"""
return(self._recDset.dtype)
def getRecordset(self):
"""getRecordset returns the recordset array from the first data record.
Raises IOError if None
"""
if self._recDset is None:
raise IOError('self._recDset is None')
return(self._recDset)
def get1DParms(self):
"""get1DParms returns a list of mnemonics of 1D parms
in file. May be empty if none.
Raises ValueError if self._oneDList is None, since parameters unknown
"""
if self._oneDList is None:
raise ValueError('get1DParms cannot be called before any data records added to this file')
retList = []
for parm in self._oneDList:
retList.append(parm.mnemonic)
return(retList)
def get2DParms(self):
"""get2DParms returns a list of mnemonics of dependent 2D parms
in file. May be empty if none.
Raises ValueError if self._twoDList is None, since parameters unknown
"""
if self._twoDList is None:
raise ValueError('get2DParms cannot be called before any data records added to this file')
retList = []
for parm in self._twoDList:
retList.append(parm.mnemonic)
return(retList)
def getIndSpatialParms(self):
"""getIndSpatialParms returns a list of mnemonics of independent spatial parameters
in file. May be empty if none.
Raises ValueError if self._ind2DList is None, since parameters unknown
"""
if self._ind2DList is None:
raise ValueError('getIndSpatialParms cannot be called before any data records added to this file')
retList = []
for parm in self._ind2DList:
retList.append(parm.mnemonic)
return(retList)
def getArraySplitParms(self):
"""getArraySplitParms returns a list of mnemonics of parameters used to split array. May be empty or None.
"""
return(self._arraySplitParms)
def getParmDim(self, parm):
"""getParmDim returns the dimension (1,2, or 3 for independent spatial parms) of input parm
Raises ValueError if no data records yet.
Raise KeyError if that parameter not found in file
"""
if self._ind2DList is None:
raise ValueError('getParmDim cannot be called before any data records added to this file')
for obj in self._oneDList:
if obj.mnemonic.lower() == parm.lower():
return(1)
# do ind 2D next since they are in both lists
for obj in self._ind2DList:
if obj.mnemonic.lower() == parm.lower():
return(3)
for obj in self._twoDList:
if obj.mnemonic.lower() == parm.lower():
return(2)
raise KeyError('Parm <%s> not found in data' % (str(parm)))
def getStatus(self):
"""getStatus returns the status string
"""
return(self._status)
def setStatus(self, status):
"""setStatus sets the status string
"""
self._status = str(status)
def getEarliestDT(self):
"""getEarliestDT returns the earliest datetime found in file, or None if no data
"""
return(self._earliestDT)
def getLatestDT(self):
"""getLatestDT returns the latest datetime found in file, or None if no data
"""
return(self._latestDT)
def getKinstList(self):
"""getKinstList returns the list of kinst integers in the file
"""
return(self._kinstList)
def getKindatList(self):
"""getKindatList returns the list of kindat integers in the file
"""
return(self._kindatList)
def getRecIndexList(self):
"""getRecIndexList returns a list of record indexes into Table Layout
"""
return(self._recIndexList)
def parmIsInt(self, parm):
"""parmIsInt returns True if this parm (mnemonic) is integer type, False if not
Raises ValueError if parm not in record. or table dtype not yet set
"""
if self._tableDType is None:
raise ValueError('Cannot call parmIsInt until a data record is added')
try:
typeStr = str(self._tableDType[parm.lower()])
except KeyError:
raise ValueError('Parm <%s> not found in file' % (str(parm)))
if typeStr.find('int') != -1:
return(True)
else:
return(False)
def parmIsString(self, parm):
"""parmIsString returns True if this parm (mnemonic) is string type, False if not
Raises ValueError if parm not in record. or table dtype not yet set
"""
if self._tableDType is None:
raise ValueError('Cannot call parmIsInt until a data record is added')
try:
typeStr = str(self._tableDType[parm.lower()])
except KeyError:
raise ValueError('Parm <%s> not found in file' % (str(parm)))
if typeStr.lower().find('s') == -1:
return(False)
else:
return(True)
def getStringFormat(self, parm):
"""getStringFormat returns string format string. Raises error if not string type,
or parm not in record. or table dtype not yet set
"""
if not self.parmIsString(parm):
raise ValueError('parm %s not a string, cannot call getStringFormat' % (str(parm)))
return(str(self._tableDType[parm.lower()]))
def hasArray(self, f):
"""hasArray returns True in f['Data']['Array Layout'] exists, False otherwise
"""
if 'Data' in list(f.keys()):
if 'Array Layout' in list(f['Data'].keys()):
return(True)
return(False)
def getMaxMinValues(self, mnemonic, verifyValid=False):
"""getMaxMinValues returns a tuple of (minimum value, maximum value) of the value
of parm in this file. If verifyValid is True, then only lines with valid 2D data
are included. If no valid values, returns (NaN, NaN). Also updates self._minMaxParmDict
Raise IOError if parm not found
"""
parm = mnemonic.lower()
# for string data, always return (Nan, Nan)
if self._madParmObj.isString(parm):
self._minMaxParmDict[parm] = [numpy.NaN, numpy.NaN]
return((numpy.NaN, numpy.NaN))
# create a merged dataset
datasetList = []
for rec in self._privList:
if rec.getType() == 'data':
datasetList.append(rec._dataset)
if len(datasetList) == 0:
if parm in self._minMaxParmDict:
return(self._minMaxParmDict[parm])
else:
raise IOError('No data records in file')
merged_dataset = numpy.concatenate(datasetList)
if not verifyValid:
# veru simple - just jusr numpy methods
try:
data = merged_dataset[parm]
except:
raise IOError('parm %s not found in file' % (parm))
with warnings.catch_warnings():
warnings.simplefilter("ignore")
minValue = numpy.nanmin(data)
maxValue = numpy.nanmax(data)
if parm not in self._minMaxParmDict:
self._minMaxParmDict[parm] = [minValue, maxValue]
else:
orgMin, orgMax = self._minMaxParmDict[parm]
self._minMaxParmDict[parm] = [min(minValue, orgMin), max(maxValue, orgMax)]
return((minValue, maxValue))
# we need to find the minimum and maximum for only valid data
# first sort by parm so we just need to walk until we find a valid row starting at the top and bottom
sorted_indices = numpy.argsort(merged_dataset[parm])
# find min
minValue = None
for i in sorted_indices:
if numpy.isnan(merged_dataset[parm][i]):
continue
for twoDParm in self.get2DParms():
if self._madParmObj.isString(twoDParm):
continue
if numpy.isnan(merged_dataset[twoDParm][i]):
continue
# make sure its not a special error value
if self._madParmObj.isError(twoDParm) and merged_dataset[twoDParm][i] < 0:
continue
# minimum found
minValue = merged_dataset[parm][i]
break
if not minValue is None:
break
# find max
maxValue = None
for i in reversed(sorted_indices):
if numpy.isnan(merged_dataset[parm][i]):
continue
for twoDParm in self.get2DParms():
if self._madParmObj.isString(twoDParm):
continue
if numpy.isnan(merged_dataset[twoDParm][i]):
continue
# make sure its not a special error value
if self._madParmObj.isError(twoDParm) and merged_dataset[twoDParm][i] < 0:
continue
# minimum found
maxValue = merged_dataset[parm][i]
break
if not maxValue is None:
break
if minValue is None:
minValue = numpy.nan
if maxValue is None:
maxValue = numpy.nan
if parm not in self._minMaxParmDict:
self._minMaxParmDict[parm] = [minValue, maxValue]
else:
orgMin, orgMax = self._minMaxParmDict[parm]
self._minMaxParmDict[parm] = [min(minValue, orgMin), max(maxValue, orgMax)]
return((minValue, maxValue))
def refreshSummary(self):
"""refreshSummary rebuilds the recarray self._experimentParameters
"""
inst = int(self.getKinstList()[0])
delimiter = ','
kinstCodes = []
kinstNames = []
for code in self.getKinstList():
kinstCodes.append(str(int(code)))
kinstNames.append(str(self._madInstObj.getInstrumentName(int(code))))
instrumentCodes = delimiter.join(kinstCodes)
instrumentName = delimiter.join(kinstNames)
categoryStr = self._madInstObj.getCategory(inst)
piStr = self._madInstObj.getContactName(inst)
piEmailStr = self._madInstObj.getContactEmail(inst)
startDateStr = self.getEarliestDT().strftime('%Y-%m-%d %H:%M:%S UT')
endDateStr = self.getLatestDT().strftime('%Y-%m-%d %H:%M:%S UT')
cedarFileName = str(os.path.basename(self._fullFilename))
statusDesc = self._status
instLat = self._madInstObj.getLatitude(inst)
instLon = self._madInstObj.getLongitude(inst)
instAlt = self._madInstObj.getAltitude(inst)
# create kindat description based on all kindats
kindatList = self.getKindatList()
kindatDesc = ''
kindatListStr = ''
if len(kindatList) > 1:
kindatDesc = 'This experiment has %i kinds of data. They are:' % (len(kindatList))
for i, kindat in enumerate(kindatList):
thisKindatDesc = self._madKindatObj.getKindatDescription(kindat, inst)
if not thisKindatDesc:
raise IOError('kindat %i undefined - please add to typeTab.txt' % (kindat))
thisKindatDesc = thisKindatDesc.strip()
kindatDesc += ' %i) %s (code %i)' % (i+1, thisKindatDesc, kindat)
kindatListStr += '%i' % (kindat)
if i < len(kindatList) - 1:
kindatDesc += ', '
kindatListStr += ', '
else:
kindatDesc = self._madKindatObj.getKindatDescription(kindatList[0], inst)
if not kindatDesc:
raise IOError('kindat for %s undefined - please add to typeTab.txt' % (str((kindatList[0], inst))))
kindatDesc = kindatDesc.strip()
kindatListStr += '%i' % (kindatList[0])
# create an expSummary numpy recarray
summArr = numpy.recarray((14,), dtype = [('name', h5py.special_dtype(vlen=str) ),
('value', h5py.special_dtype(vlen=str) )])
summArr['name'][0] = 'instrument'
summArr['name'][1] = 'instrument code(s)'
summArr['name'][2] = 'kind of data file'
summArr['name'][3] = 'kindat code(s)'
summArr['name'][4] = 'start time'
summArr['name'][5] = 'end time'
summArr['name'][6] = 'Cedar file name'
summArr['name'][7] = 'status description'
summArr['name'][8] = 'instrument latitude'
summArr['name'][9] = 'instrument longitude'
summArr['name'][10] = 'instrument altitude'
summArr['name'][11] = 'instrument category'
summArr['name'][12] = 'instrument PI'
summArr['name'][13] = 'instrument PI email'
summArr['value'][0] = instrumentName
summArr['value'][1] = instrumentCodes
summArr['value'][2] = kindatDesc
summArr['value'][3] = kindatListStr
summArr['value'][4] = startDateStr
summArr['value'][5] = endDateStr
summArr['value'][6] = cedarFileName
summArr['value'][7] = statusDesc
summArr['value'][8] = str(instLat)
summArr['value'][9] = str(instLon)
summArr['value'][10] = str(instAlt)
summArr['value'][11] = categoryStr
summArr['value'][12] = piStr
summArr['value'][13] = piEmailStr
self._experimentParameters = summArr
def createCatalogTimeSection(self):
"""createCatalogTimeSection will return all the lines in the catalog record that
describe the start and end time of the data records.
Inputs: None
Returns: a tuple with three items 1) a string in the format of the time section of a
catalog record, 2) earliest datetime, 3) latest datetime
"""
earliestStartTime = self.getEarliestDT()
latestEndTime = self.getLatestDT()
sy = 'IBYRE %4s Beginning year' % (str(earliestStartTime.year))
sd = 'IBDTE %4s Beginning month and day' % (str(earliestStartTime.month*100 + \
earliestStartTime.day))
sh = 'IBHME %4s Beginning UT hour and minute' % (str(earliestStartTime.hour*100 + \
earliestStartTime.minute))
totalCS = earliestStartTime.second*100 + (earliestStartTime.microsecond/10000)
ss = 'IBCSE %4s Beginning centisecond' % (str(totalCS))
ey = 'IEYRE %4s Ending year' % (str(latestEndTime.year))
ed = 'IEDTE %4s Ending month and day' % (str(latestEndTime.month*100 + \
latestEndTime.day))
eh = 'IEHME %4s Ending UT hour and minute' % (str(latestEndTime.hour*100 + \
latestEndTime.minute))
totalCS = latestEndTime.second*100 + (latestEndTime.microsecond/10000)
es = 'IECSE %4s Ending centisecond' % (str(totalCS))
retStr = ''
retStr += sy + (80-len(sy))*' '
retStr += sd + (80-len(sd))*' '
retStr += sh + (80-len(sh))*' '
retStr += ss + (80-len(ss))*' '
retStr += ey + (80-len(ey))*' '
retStr += ed + (80-len(ed))*' '
retStr += eh + (80-len(eh))*' '
retStr += es + (80-len(es))*' '
return((retStr, earliestStartTime, latestEndTime))
def createHeaderTimeSection(self, dataRecList=None):
"""createHeaderTimeSection will return all the lines in the header record that
describe the start and end time of the data records.
Inputs:
dataRecList - if given, examine only those MadrigalDataRecords in dataRecList.
If None (the default), examine all MadrigalDataRecords in this
MadrigalCedarFile
Returns: a tuple with three items 1) a string in the format of the time section of a
header record, 2) earliest datetime, 3) latest datetime
"""
if dataRecList is None:
earliestStartTime = self.getEarliestDT()
latestEndTime = self.getLatestDT()
else:
earliestStartTime = None
latestEndTime = None
for rec in dataRecList:
if rec.getType() != 'data':
continue
#earliest time
thisTime = rec.getStartDatetime()
if earliestStartTime is None:
earliestStartTime = thisTime
if earliestStartTime > thisTime:
earliestStartTime = thisTime
#latest time
thisTime = rec.getEndDatetime()
if latestEndTime is None:
latestEndTime = thisTime
if latestEndTime < thisTime:
latestEndTime = thisTime
sy = 'IBYRT %4s Beginning year' % (str(earliestStartTime.year))
sd = 'IBDTT %4s Beginning month and day' % (str(earliestStartTime.month*100 + \
earliestStartTime.day))
sh = 'IBHMT %4s Beginning UT hour and minute' % (str(earliestStartTime.hour*100 + \
earliestStartTime.minute))
totalCS = earliestStartTime.second*100 + (earliestStartTime.microsecond/10000)
ss = 'IBCST %4s Beginning centisecond' % (str(totalCS))
ey = 'IEYRT %4s Ending year' % (str(latestEndTime.year))
ed = 'IEDTT %4s Ending month and day' % (str(latestEndTime.month*100 + \
latestEndTime.day))
eh = 'IEHMT %4s Ending UT hour and minute' % (str(latestEndTime.hour*100 + \
latestEndTime.minute))
totalCS = latestEndTime.second*100 + (latestEndTime.microsecond/10000)
es = 'IECST %4s Ending centisecond' % (str(totalCS))
retStr = ''
retStr += sy + (80-len(sy))*' '
retStr += sd + (80-len(sd))*' '
retStr += sh + (80-len(sh))*' '
retStr += ss + (80-len(ss))*' '
retStr += ey + (80-len(ey))*' '
retStr += ed + (80-len(ed))*' '
retStr += eh + (80-len(eh))*' '
retStr += es + (80-len(es))*' '
return((retStr, earliestStartTime, latestEndTime))
def updateMinMaxParmDict(self):
"""updateMinMaxParmDict updates self._minMaxParmDict
"""
for parm in self.get1DParms() + self.get2DParms():
self.getMaxMinValues(parm, True)
def _writeHdf5Metadata(self, f, refreshCatHeadTimes):
"""_writeHdf5Metadata is responsible for writing Metadata group in Hdf5 file
Can be called multiple times, but will only write "Experiment Notes" if any catalog or header
records found
Inputs:
f - the open h5py.File object
refreshCatHeadTimes - if True, update start and and times in the catalog and header
records to represent the times in the data. If False, use existing times in those records.
"""
if "Metadata" not in list(f.keys()):
# metadata tables that are only updated once
metadataGroup = f.create_group("Metadata")
self._addDataParametersTable(metadataGroup)
if self._experimentParameters is None:
self.refreshSummary()
metadataGroup.create_dataset('Experiment Parameters', data=self._experimentParameters)
# create Independent Spatial Parameters recordset
indParmList = self.getIndSpatialParms()
indParmDesc = []
longestMnemStr = 1
longestDescStr = 1
for indParm in indParmList:
if len(indParm) > longestMnemStr:
longestMnemStr = len(indParm)
indParmDesc.append(self._madParmObj.getSimpleParmDescription(indParm))
if len(indParmDesc[-1]) > longestDescStr:
longestDescStr = len(indParmDesc[-1])
indSpatialArr = numpy.recarray((len(indParmList),),
dtype = [('mnemonic', '|S%i' % (longestMnemStr)),
('description', '|S%i' % (longestDescStr))])
for i, indParm in enumerate(indParmList):
indSpatialArr[i]['mnemonic'] = indParm
indSpatialArr[i]['description'] = indParmDesc[i]
metadataGroup.create_dataset('Independent Spatial Parameters', data=indSpatialArr)
else:
metadataGroup = f["Metadata"]
self._writeRecordLayout(metadataGroup)
self.writeExperimentNotes(metadataGroup, refreshCatHeadTimes)
def _addArrayDump(self):
"""_addArrayDump adds Array Layout to an Hdf5 file created by dump.
Inputs: None
Outputs: None
Affects: adds "Array Layout" group to f['Data']
"""
if self._format != 'hdf5':
raise ValueError('Can only call _addArrayDump for Hdf5 files written using dump')
self._createArrayLayout2()
# try to gzip 2d array data
filename, file_extension = os.path.splitext(self._fullFilename)
# tmp file name to use to run h5repack
tmpFile = filename + '_tmp' + file_extension
cmd = 'h5repack -i %s -o %s --filter=\"Data/Array Layout\":GZIP=4' % (self._fullFilename, tmpFile)
try:
subprocess.check_call(shlex.split(cmd))
except:
traceback.print_exc()
return
shutil.move(tmpFile, self._fullFilename)
def _writeHdf5Data(self, f):
"""_writeHdf5Data is responsible for writing Data group in Hdf5 file
Input: f - the open h5py.File object
"""
tableName = 'Table Layout'
# create a merged dataset
datasetList = []
nrows = None
for rec in self._privList:
if rec.getType() == 'data':
datasetList.append(rec._dataset)
if nrows is None:
nrows = rec.getNrow()
if len(datasetList) == 0:
raise IOError('No data records in file')
merged_dataset = numpy.concatenate(datasetList)
if "Data" not in list(f.keys()):
dataGroup = f.create_group("Data")
dset = dataGroup.create_dataset(tableName, data=merged_dataset, compression='gzip', maxshape=(None,),
chunks=True)
else:
# append
dataGroup = f["Data"]
added_len = merged_dataset.shape[0]
dset = dataGroup[tableName]
dset.resize((dset.shape[0] + added_len,))
dset.write_direct(merged_dataset,None,numpy.s_[-1*added_len:])
del(merged_dataset)
def _createArrayLayout(self, f, arraySplittingParms):
"""_createArrayLayout will append an Array Layout to the open Hdf5 file f
arraySplittingParms - a list of parameters as mnemonics used to split
arrays into subarrays. For example, beamcode would split data with separate beamcodes
into separate arrays. The number of separate arrays will be up to the product of the number of
unique values found for each parameter, with the restriction that combinations with no records will
not create a separate array.
IOError raised if Array Layout already exists - this can only be called once
"""
if self._skipArray:
return
# get info from recarrays that already exist
table = f['Data']['Table Layout']
recLayout = f['Metadata']['_record_layout']
metadataGroup = f['Metadata']
# inputs
indParmList = self.getIndSpatialParms()
if "Array Layout" in list(f['Data'].keys()):
raise IOError('Array Layout already created - this can only be created once.')
# add "Parameters Used to Split Array Data" to Metadata
if len(arraySplittingParms) > 0:
arrSplitParmDesc = []
longestMnemStr = 0
longestDescStr = 0
for arrSplitParm in arraySplittingParms:
if len(arrSplitParm) > longestMnemStr:
longestMnemStr = len(arrSplitParm)
arrSplitParmDesc.append(self._madParmObj.getSimpleParmDescription(arrSplitParm))
if len(arrSplitParmDesc[-1]) > longestDescStr:
longestDescStr = len(arrSplitParmDesc[-1])
arrSplitArr = numpy.recarray((len(arraySplittingParms),),
dtype = [('mnemonic', '|S%i' % (longestMnemStr)),
('description', '|S%i' % (longestDescStr))])
for i, arrSplitParm in enumerate(arraySplittingParms):
arrSplitArr[i]['mnemonic'] = arrSplitParm
arrSplitArr[i]['description'] = arrSplitParmDesc[i]
metadataGroup.create_dataset('Parameters Used to Split Array Data', data=arrSplitArr)
arrGroup = f['Data'].create_group("Array Layout")
arraySplittingList = [] # list of lists of all existing values for each array splitting parm
for parm in arraySplittingParms:
arraySplittingList.append(numpy.unique(table[parm]))
tableSubsets = []
for combo in itertools.product(*arraySplittingList):
tableSubsets.append(_TableSubset(arraySplittingParms, combo, table))
for tableSubset in tableSubsets:
uniqueIndValueDict = {}
for indParm in indParmList:
uniqueIndValueDict[indParm] = numpy.unique(tableSubset.table[indParm])
unique_times = numpy.unique(tableSubset.table['ut1_unix'])
group_name = tableSubset.getGroupName()
if group_name != None:
thisGroup = arrGroup.create_group(tableSubset.getGroupName())
else:
thisGroup = arrGroup # no splitting, so no subgroup needed
self._addLayoutDescription(thisGroup)
ts_dset = thisGroup.create_dataset("timestamps", data=unique_times)
for indParm in indParmList:
thisGroup.create_dataset(indParm, data=uniqueIndValueDict[indParm])
# one D parm arrays
oneDGroup = thisGroup.create_group('1D Parameters')
self._addDataParametersTable(oneDGroup, 1)
for parm in recLayout.dtype.names[len(self.requiredFields):]:
if recLayout[parm][0] == 1:
dset = tableSubset.table[parm][tableSubset.oneDIndices]
oneDGroup.create_dataset(parm, data=dset)
# two D parm arrays
twoDGroup = thisGroup.create_group('2D Parameters')
self._addDataParametersTable(twoDGroup, 2)
# get shape of 2D data (number of dimensions dynamic)
twoDShape = []
for indParm in indParmList:
twoDShape.append(len(uniqueIndValueDict[indParm]))
twoDShape.append(len(unique_times))
dsetDict = {} # key = parm, value 2D dataset
for parm in recLayout.dtype.names[len(self.requiredFields):]:
if recLayout[parm][0] == 2:
dsetDict[parm] = numpy.zeros(twoDShape, dtype=table.dtype[parm])
if self.parmIsInt(parm):
dsetDict[parm][:] = self.missing_int
else:
dsetDict[parm][:] = self.missing
# precalculate the indices
# time index
time_indices = numpy.zeros((1, len(tableSubset.table)), numpy.int)
times = tableSubset.table['ut1_unix']
for i in range(len(unique_times)):
t = unique_times[i]
indices = numpy.argwhere(times == t)
time_indices[0, indices] = i
# ind parm indexes
indParmIndexDict = {}
for indParm in indParmList:
values = tableSubset.table[indParm]
indParmIndexDict[indParm] = numpy.zeros((1, len(tableSubset.table)), numpy.int)
for i in range(len(uniqueIndValueDict[indParm])):
v = uniqueIndValueDict[indParm][i]
indices = numpy.argwhere(values == v)
indParmIndexDict[indParm][0, indices] = i
# concatenate
tableIndex = None
for indParm in indParmList:
if tableIndex is None:
tableIndex = indParmIndexDict[indParm]
else:
tableIndex = numpy.concatenate((tableIndex, indParmIndexDict[indParm]), 0)
tableIndex = numpy.concatenate((tableIndex, time_indices), 0)
# set 2D parms
for parm in recLayout.dtype.names[len(self.requiredFields):]:
if recLayout[parm][0] == 2:
if len(indParmList) == 1:
dsetDict[parm][tableIndex[0], tableIndex[1]] = tableSubset.table[parm]
elif len(indParmList) == 2:
dsetDict[parm][tableIndex[0], tableIndex[1], tableIndex[2]] = tableSubset.table[parm]
elif len(indParmList) == 3:
dsetDict[parm][tableIndex[0], tableIndex[1], tableIndex[2], tableIndex[3]] = tableSubset.table[parm]
else:
raise ValueError('Can not handle more than 3 independent spatial parms - there are %i' % (len(indParmList)))
# write the datasets out
for parm in recLayout.dtype.names[len(self.requiredFields):]:
if recLayout[parm][0] == 2:
twoDGroup.create_dataset(parm, data=dsetDict[parm], compression='gzip')
def _createArrayLayout2(self):
"""_createArrayLayout2 will append an Array Layout to the Hdf5 file. Called after dump.
IOError raised if Array Layout already exists - this can only be called once
"""
f = h5py.File(self._fullFilename, 'a')
# get info from recarrays that already exist
table = f['Data']['Table Layout']
recLayout = f['Metadata']['_record_layout']
metadataGroup = f['Metadata']
# inputs
indParmList = self.getIndSpatialParms()
if "Array Layout" in list(f['Data'].keys()):
raise IOError('Array Layout already created - this can only be created once.')
# update metadata now that file finished
self._writeHdf5Metadata(f, refreshCatHeadTimes=True)
if self._skipArray:
f.close()
return
# now that self._arrDict is competely filled out, create a similar dict, except that
# all sets are replaced by ordered python arrays
total_allowed_records = 0 # make sure all ind parameters declared by checking that the product
# of all the ind parm value lengths time the numner of time is equal
# to or greater than the number of total records
arrDict = {}
for key in list(self._arrDict.keys()):
total_ind_parm_lens = []
thisDict = self._arrDict[key]
arrDict[key] = {}
for key2 in list(thisDict.keys()):
thisSet = thisDict[key2]
# convert to ordered numpy array
thisList = list(thisSet)
thisList.sort()
total_ind_parm_lens.append(len(thisList))
if self._madParmObj.isInteger(key2):
data = numpy.array(thisList, dtype=numpy.int64)
elif self._madParmObj.isString(key2):
strLen = self._madParmObj.getStringLen(key2)
data = numpy.array(thisList, dtype=numpy.dtype((str, strLen)))
else:
data = numpy.array(thisList, dtype=numpy.float64)
arrDict[key][key2] = data
# add the max number of records for this group
tmp = total_ind_parm_lens[0]
for v in total_ind_parm_lens[1:]:
tmp *= v
total_allowed_records += tmp
# protect against too many ind parm combinations (too sparse an array)
total_ind_combos = total_ind_parm_lens[1]
for v in total_ind_parm_lens[2:]:
total_ind_combos *= v
if total_ind_combos > 1000000:
print(('Skipping array creation since %i independent parm combinations would create too big an array' % (total_ind_combos)))
f.close()
return
if len(table) > total_allowed_records:
raise ValueError('Found %i lines in table, but values of times and ind parms %s allow maximum of %i values in file %s' % \
(len(table), str(indParmList), total_allowed_records, self._fullFilename))
# add "Parameters Used to Split Array Data" to Metadata
if not self._arraySplitParms == []:
arrSplitParmDesc = []
longestMnemStr = 0
longestDescStr = 0
for arrSplitParm in self._arraySplitParms:
if len(arrSplitParm) > longestMnemStr:
longestMnemStr = len(arrSplitParm)
arrSplitParmDesc.append(self._madParmObj.getSimpleParmDescription(arrSplitParm))
if len(arrSplitParmDesc[-1]) > longestDescStr:
longestDescStr = len(arrSplitParmDesc[-1])
arrSplitArr = numpy.recarray((len(self._arraySplitParms),),
dtype = [('mnemonic', '|S%i' % (longestMnemStr)),
('description', '|S%i' % (longestDescStr))])
for i, arrSplitParm in enumerate(self._arraySplitParms):
arrSplitArr[i]['mnemonic'] = arrSplitParm
arrSplitArr[i]['description'] = arrSplitParmDesc[i]
metadataGroup.create_dataset('Parameters Used to Split Array Data', data=arrSplitArr)
arrGroup = f['Data'].create_group("Array Layout")
# stage 1 - write all needed tables with nan values
for key in list(arrDict.keys()):
if key != '':
groupName = self._getGroupName(key)
thisGroup = arrGroup.create_group(groupName)
else:
thisGroup = arrGroup # no subgroups needed
self._addLayoutDescription(thisGroup)
"""thisDict is dict of key = 'ut1_unix' and ind 2d parm names (possibly minus arraySplitParms), values
= ordered numpy array of all unique values"""
thisDict = arrDict[key]
unique_times = thisDict['ut1_unix']
ts_dset = thisGroup.create_dataset("timestamps", data=unique_times)
for indParm in indParmList:
if indParm in self._arraySplitParms:
# not needed
continue
dataset = thisDict[indParm]
thisGroup.create_dataset(indParm, data=dataset)
# one D parm arrays
oneDGroup = thisGroup.create_group('1D Parameters')
self._addDataParametersTable(oneDGroup, 1)
for parm in recLayout.dtype.names[len(self.requiredFields):]:
if recLayout[parm][0] == 1:
if self._madParmObj.isInteger(parm):
dset = numpy.zeros((len(unique_times),), dtype=numpy.int64)
dset[:] = numpy.iinfo(numpy.int64).min
elif self._madParmObj.isString(parm):
strLen = self._madParmObj.getStringLen(parm)
dset = numpy.zeros((len(unique_times),), dtype=numpy.dtype((str, strLen)))
else:
dset = numpy.zeros((len(unique_times),), dtype=numpy.float64)
dset[:] = numpy.nan
oneDGroup.create_dataset(parm, data=dset)
# two D parm arrays
twoDGroup = thisGroup.create_group('2D Parameters')
self._addDataParametersTable(twoDGroup, 2)
# get shape of 2D data (number of dimensions dynamic)
twoDShape = []
for indParm in indParmList:
if indParm in self._arraySplitParms:
# not needed
continue
twoDShape.append(len(thisDict[indParm]))
twoDShape.append(len(unique_times))
for parm in recLayout.dtype.names[len(self.requiredFields):]:
if recLayout[parm][0] == 2:
if self._madParmObj.isInteger(parm):
dset = numpy.zeros(twoDShape, dtype=numpy.int64)
dset[:] = numpy.iinfo(numpy.int64).min
elif self._madParmObj.isString(parm):
strLen = self._madParmObj.getStringLen(parm)
dset = numpy.zeros(twoDShape, dtype=numpy.dtype((str, strLen)))
else:
dset = numpy.zeros(twoDShape, dtype=numpy.float64)
dset[:] = numpy.nan
twoDGroup.create_dataset(parm, data=dset)
# flush file
f.close()
f = h5py.File(self._fullFilename, 'a')
table = f['Data']['Table Layout']
recLayout = f['Metadata']['_record_layout']
# now loop through Table Layout and populate all the 1 and 2 d arrays
step = 10 # number of records to load at once
total_steps = int(len(self._recIndexList) / step)
if total_steps * step < len(self._recIndexList):
total_steps += 1
for i in range(total_steps):
startTimeIndex = i*step
if (i+1)*step < len(self._recIndexList) - 1:
endTimeIndex = (i+1)*step - 1
else:
endTimeIndex = len(self._recIndexList) - 1
table_data = table[self._recIndexList[startTimeIndex][0]:self._recIndexList[endTimeIndex][1]]
# loop through all groups
for key in list(arrDict.keys()):
tableSubset = _TableSubset(self._arraySplitParms, key, table_data)
# its possible no data in this slice for this subset
if len(tableSubset.table) == 0:
continue
timestamps = arrDict[key]['ut1_unix']
# get index of first and last time found
first_ut1_unix = tableSubset.table[0]['ut1_unix']
last_ut1_unix = tableSubset.table[-1]['ut1_unix']
time_index_1 = numpy.searchsorted(timestamps, first_ut1_unix)
time_index_2 = numpy.searchsorted(timestamps, last_ut1_unix) + 1
groupName = tableSubset.getGroupName()
# get shape of 2D data (number of dimensions dynamic)
twoDShape = []
for indParm in indParmList:
if indParm in self._arraySplitParms:
# not needed
continue
twoDShape.append(len(arrDict[key][indParm]))
twoDShape.append(time_index_2 - time_index_1)
# ind parm indexes
indParmIndexDict = {}
for indParm in indParmList:
if indParm in self._arraySplitParms:
# not needed
continue
values = tableSubset.table[indParm]
indParmIndexDict[indParm] = numpy.zeros((len(tableSubset.table),), numpy.int)
for i in range(len(arrDict[key][indParm])):
v = arrDict[key][indParm][i]
indices = numpy.argwhere(values == v)
indParmIndexDict[indParm][indices] = i
# finally time dimension
values = tableSubset.table['ut1_unix']
timeIndices = numpy.zeros((len(tableSubset.table),), numpy.int)
thisTimestampArr = numpy.unique(tableSubset.table['ut1_unix'])
for i in range(len(thisTimestampArr)):
v = thisTimestampArr[i]
indices = numpy.argwhere(values == v)
timeIndices[indices] = i
# concatenate
tableIndex = []
for indParm in indParmList:
if indParm in self._arraySplitParms:
# not needed
continue
tableIndex.append(indParmIndexDict[indParm])
tableIndex.append(timeIndices)
for parm in recLayout.dtype.names[len(self.requiredFields):]:
if recLayout[parm][0] == 1:
dset = tableSubset.table[parm][tableSubset.oneDIndices]
if not groupName is None:
f['Data']['Array Layout'][groupName]['1D Parameters'][parm][time_index_1:time_index_2] = dset
else:
f['Data']['Array Layout']['1D Parameters'][parm][time_index_1:time_index_2] = dset
elif recLayout[parm][0] == 2:
if self._madParmObj.isInteger(parm):
dset2 = numpy.zeros(tuple(twoDShape), dtype=numpy.int64)
dset2[:] = numpy.iinfo(numpy.int64).min
elif self._madParmObj.isString(parm):
strLen = self._madParmObj.getStringLen(parm)
dset2 = numpy.zeros(tuple(twoDShape), dtype=numpy.dtype((str, strLen)))
dset2[:] = ''
else:
dset2 = numpy.zeros(tuple(twoDShape), dtype=numpy.float64)
dset2[:] = numpy.nan
dset2[tuple(tableIndex)] = tableSubset.table[parm]
if not groupName is None:
fdata = f['Data']['Array Layout'][groupName]['2D Parameters'][parm]
else:
fdata = f['Data']['Array Layout']['2D Parameters'][parm]
if len(indParmList) - self._num2DSplit == 1:
fdata[:,time_index_1:time_index_2] = dset2
elif len(indParmList) - self._num2DSplit == 2:
fdata[:,:,time_index_1:time_index_2] = dset2
elif len(indParmList) - self._num2DSplit == 3:
fdata[:,:,:,time_index_1:time_index_2] = dset2
elif len(indParmList) - self._num2DSplit == 4:
fdata[:,:,:,:,time_index_1:time_index_2] = dset2
elif len(indParmList) - self._num2DSplit == 5:
fdata[:,:,:,:,:,time_index_1:time_index_2] = dset2
else:
raise ValueError('Can not handle more than 5 independent spatial parms - there are %i' % (len(indParmList)))
f.close()
def _writeNetCDF4(self, f, arraySplittingParms):
"""_writeNetCDF4 will write to a netCDF4 file f
arraySplittingParms - a list of parameters as mnemonics used to split
arrays into subarrays. For example, beamcode would split data with separate beamcodes
into separate arrays. The number of separate arrays will be up to the product of the number of
unique values found for each parameter, with the restriction that combinations with no records will
not create a separate array.
"""
# create merged datasets table and recLayout
datasetList = []
recordList = []
for rec in self._privList:
if rec.getType() == 'data':
datasetList.append(rec._dataset)
if len(recordList) == 0:
recordList.append(rec._recordSet)
elif rec.getType() == 'catalog':
f.catalog_text = rec.getText()
elif rec.getType() == 'header':
f.header_text = rec.getText()
if len(datasetList) == 0:
raise IOError('No data records in file')
table = numpy.concatenate(datasetList)
recLayout = numpy.concatenate(recordList)
if self._experimentParameters is None:
self.refreshSummary()
# write Experiment Parameters
for i in range(len(self._experimentParameters)):
name = self._experimentParameters['name'][i]
# make text acceptable attribute names
if type(name) in (bytes, numpy.bytes_):
name = name.replace(b' ', b'_')
name = name.replace(b'(s)', b'')
else:
name = name.replace(' ', '_')
name = name.replace('(s)', '')
f.setncattr(name, self._experimentParameters['value'][i])
indParmList = self.getIndSpatialParms()
# add "Parameters Used to Split Array Data" to Metadata
if len(arraySplittingParms) > 0:
arrSplitParmDesc = ''
for arrSplitParm in arraySplittingParms:
arrSplitParmDesc += '%s: ' % (arrSplitParm)
arrSplitParmDesc += '%s' % (self._madParmObj.getSimpleParmDescription(arrSplitParm))
if arrSplitParm != arraySplittingParms[-1]:
arrSplitParmDesc += ' -- '
f.parameters_used_to_split_data = arrSplitParmDesc
arraySplittingList = [] # list of lists of all existing values for each array splitting parm
for parm in arraySplittingParms:
if type(parm) in (bytes, numpy.bytes_):
parm = parm.decode('utf-8')
arraySplittingList.append(numpy.unique(table[parm]))
tableSubsets = []
for combo in itertools.product(*arraySplittingList):
tableSubsets.append(_TableSubset(arraySplittingParms, combo, table))
for tableSubset in tableSubsets:
uniqueIndValueDict = {}
for indParm in indParmList:
uniqueIndValueDict[indParm] = numpy.unique(tableSubset.table[indParm])
unique_times = numpy.unique(tableSubset.table['ut1_unix'])
group_name = tableSubset.getGroupName()
if group_name != None:
group_name = group_name.strip().replace(' ', '_')
thisGroup = f.createGroup(group_name)
else:
thisGroup = f # no splitting, so no subgroup needed
# next step - create dimensions
dims = []
thisGroup.createDimension("timestamps", len(unique_times))
timeVar = thisGroup.createVariable("timestamps", 'f8', ("timestamps",),
zlib=True)
timeVar.units = 'Unix seconds'
timeVar.description = 'Number of seconds since UT midnight 1970-01-01'
timeVar[:] = unique_times
dims.append("timestamps")
for indParm in indParmList:
thisGroup.createDimension(indParm, len(uniqueIndValueDict[indParm]))
if self._madParmObj.isInteger(indParm):
thisVar = thisGroup.createVariable(indParm, 'i8', (indParm,), zlib=True)
elif self._madParmObj.isString(indParm):
thisVar = thisGroup.createVariable(indParm, self.getStringFormat(indParm), (indParm,), zlib=True)
else:
thisVar = thisGroup.createVariable(indParm, 'f8', (indParm,), zlib=True)
thisVar[:] = uniqueIndValueDict[indParm]
thisVar.units = self._madParmObj.getParmUnits(indParm)
thisVar.description = self._madParmObj.getSimpleParmDescription(indParm)
dims.append(indParm)
# create one and two D parm arrays, set 1D
twoDVarDict = {} # key = parm name, value = netCDF4 variable
for parm in recLayout.dtype.names[len(self.requiredFields):]:
if recLayout[parm][0] == 1:
dset = tableSubset.table[parm][tableSubset.oneDIndices]
if self.parmIsInt(parm):
oneDVar = thisGroup.createVariable(parm, 'i8', (dims[0],), zlib=True)
elif self.parmIsString(parm):
oneDVar = thisGroup.createVariable(parm, self.getStringFormat(parm), (dims[0],),
zlib=True)
else: # float
oneDVar = thisGroup.createVariable(parm, 'f8', (dims[0],), zlib=True)
oneDVar.units = self._madParmObj.getParmUnits(parm)
oneDVar.description = self._madParmObj.getSimpleParmDescription(parm)
try:
oneDVar[:] = dset
except:
raise ValueError('There may be an issue with array splitting because more records than times')
elif recLayout[parm][0] == 2:
if self.parmIsInt(parm):
twoDVarDict[parm] = thisGroup.createVariable(parm, 'i8', dims, zlib=True)
elif self.parmIsString(parm):
twoDVarDict[parm] = thisGroup.createVariable(parm, self.getStringFormat(parm), dims, zlib=True)
else:
twoDVarDict[parm] = thisGroup.createVariable(parm, 'f8', dims, zlib=True)
twoDVarDict[parm].units = self._madParmObj.getParmUnits(parm)
twoDVarDict[parm].description = self._madParmObj.getSimpleParmDescription(parm)
# two D parm arrays
# get shape of 2D data (number of dimensions dynamic)
twoDShape = []
twoDShape.append(len(unique_times))
for indParm in indParmList:
twoDShape.append(len(uniqueIndValueDict[indParm]))
dsetDict = {} # key = parm, value 2D dataset
for parm in recLayout.dtype.names[len(self.requiredFields):]:
if recLayout[parm][0] == 2:
dsetDict[parm] = numpy.zeros(twoDShape, dtype=table.dtype[parm])
if self.parmIsInt(parm):
dsetDict[parm][:] = self.missing_int
else:
dsetDict[parm][:] = self.missing
# precalculate the indices
# time index
time_indices = numpy.zeros((1, len(tableSubset.table)), numpy.int)
times = tableSubset.table['ut1_unix']
for i in range(len(unique_times)):
t = unique_times[i]
indices = numpy.argwhere(times == t)
time_indices[0, indices] = i
# ind parm indexes
indParmIndexDict = {}
for indParm in indParmList:
values = tableSubset.table[indParm]
indParmIndexDict[indParm] = numpy.zeros((1, len(tableSubset.table)), numpy.int)
for i in range(len(uniqueIndValueDict[indParm])):
v = uniqueIndValueDict[indParm][i]
indices = numpy.argwhere(values == v)
indParmIndexDict[indParm][0, indices] = i
# concatenate
tableIndex = time_indices
for indParm in indParmList:
tableIndex = numpy.concatenate((tableIndex, indParmIndexDict[indParm]), 0)
# set 2D parms
for parm in recLayout.dtype.names[len(self.requiredFields):]:
if recLayout[parm][0] == 2:
if len(indParmList) == 1:
dsetDict[parm][tableIndex[0], tableIndex[1]] = tableSubset.table[parm]
elif len(indParmList) == 2:
dsetDict[parm][tableIndex[0], tableIndex[1], tableIndex[2]] = tableSubset.table[parm]
elif len(indParmList) == 3:
dsetDict[parm][tableIndex[0], tableIndex[1], tableIndex[2], tableIndex[3]] = tableSubset.table[parm]
elif len(indParmList) == 0:
continue
else:
raise ValueError('Can not handle more than 3 independent spatial parms - there are %i' % (len(indParmList)))
# write the datasets out
for parm in recLayout.dtype.names[len(self.requiredFields):]:
if recLayout[parm][0] == 2:
if len(indParmList) == 1:
twoDVarDict[parm][:,:] = dsetDict[parm]
elif len(indParmList) == 2:
twoDVarDict[parm][:,:,:] = dsetDict[parm]
elif len(indParmList) == 3:
twoDVarDict[parm][:,:,:,:] = dsetDict[parm]
def _firstDumpNetCDF4(self, f, parmIndexDict):
"""_firstDumpNetCDF4 will dump initial data to a netCDF4 file f. Called via dump
parmIndexDict - is a dictionary with key = timestamps and ind spatial parm names,
value = dictionary of keys = unique values, value = index of that value. Can only be used when
arraySplittingParms == []
"""
# create merged datasets table and recLayout
datasetList = []
recordList = []
for rec in self._privList:
if rec.getType() == 'data':
datasetList.append(rec._dataset)
if len(recordList) == 0:
recordList.append(rec._recordSet)
elif rec.getType() == 'catalog':
f.catalog_text = rec.getText()
elif rec.getType() == 'header':
f.header_text = rec.getText()
if len(datasetList) == 0:
raise IOError('No data records in file')
table = numpy.concatenate(datasetList)
recLayout = numpy.concatenate(recordList)
if self._experimentParameters is None:
self.refreshSummary()
# write Experiment Parameters
for i in range(len(self._experimentParameters)):
name = self._experimentParameters['name'][i]
# make text acceptable attribute names
if type(name) in (bytes, numpy.bytes_):
name = name.replace(b' ', b'_')
name = name.replace(b'(s)', b'')
else:
name = name.replace(' ', '_')
name = name.replace('(s)', '')
f.setncattr(name, self._experimentParameters['value'][i])
indParmList = self.getIndSpatialParms()
uniqueIndValueDict = {}
for indParm in indParmList:
uniqueIndValueDict[indParm] = numpy.array(list(parmIndexDict[indParm].keys()))
unique_times = numpy.array(list(parmIndexDict['ut1_unix'].keys()))
thisGroup = f # no splitting, so no subgroup needed
# next step - create dimensions
dims = []
thisGroup.createDimension("timestamps", len(unique_times))
timeVar = thisGroup.createVariable("timestamps", 'f8', ("timestamps",),
zlib=True)
timeVar.units = 'Unix seconds'
timeVar.description = 'Number of seconds since UT midnight 1970-01-01'
timeVar[:] = unique_times
dims.append("timestamps")
for indParm in indParmList:
thisGroup.createDimension(indParm, len(uniqueIndValueDict[indParm]))
if self._madParmObj.isInteger(indParm):
thisVar = thisGroup.createVariable(indParm, 'i8', (indParm,), zlib=False)
elif self._madParmObj.isString(indParm):
thisVar = thisGroup.createVariable(indParm, self.getStringFormat(indParm), (indParm,), zlib=False)
else:
thisVar = thisGroup.createVariable(indParm, 'f8', (indParm,), zlib=False)
thisVar[:] = uniqueIndValueDict[indParm]
thisVar.units = self._madParmObj.getParmUnits(indParm)
thisVar.description = self._madParmObj.getSimpleParmDescription(indParm)
dims.append(indParm)
# create one and two D parm arrays, set 1D
twoDVarDict = {} # key = parm name, value = netCDF4 variable
for parm in recLayout.dtype.names[len(self.requiredFields):]:
if recLayout[parm][0] == 1:
dset = table[parm][:]
if self.parmIsInt(parm):
oneDVar = thisGroup.createVariable(parm, 'i8', (dims[0],), zlib=False)
elif self.parmIsString(parm):
oneDVar = thisGroup.createVariable(parm, self.getStringFormat(parm), (dims[0],),
zlib=False)
else: # float
oneDVar = thisGroup.createVariable(parm, 'f8', (dims[0],), zlib=False)
oneDVar.units = self._madParmObj.getParmUnits(parm)
oneDVar.description = self._madParmObj.getSimpleParmDescription(parm)
lastTS = 0.0
for i, ts in enumerate(table['ut1_unix']):
if ts != lastTS:
# set it
oneDVar[parmIndexDict['ut1_unix'][ts]] = dset[i]
lastTS = ts
elif recLayout[parm][0] == 2:
if self.parmIsInt(parm):
twoDVarDict[parm] = thisGroup.createVariable(parm, 'i8', dims, zlib=False)
elif self.parmIsString(parm):
twoDVarDict[parm] = thisGroup.createVariable(parm, self.getStringFormat(parm), dims, zlib=False)
else:
twoDVarDict[parm] = thisGroup.createVariable(parm, 'f8', dims, zlib=False, fill_value=numpy.nan)
twoDVarDict[parm].units = self._madParmObj.getParmUnits(parm)
twoDVarDict[parm].description = self._madParmObj.getSimpleParmDescription(parm)
# set 2D parms
for i in range(len(table)):
parmIndices = [parmIndexDict['ut1_unix'][table['ut1_unix'][i]]]
for indParm in indParmList:
item = table[indParm][i]
if type(item) in (bytes, numpy.bytes_):
item = item.decode('utf-8')
parmIndices.append(parmIndexDict[indParm][item])
for parm in recLayout.dtype.names[len(self.requiredFields):]:
if recLayout[parm][0] == 2:
if len(indParmList) == 1:
twoDVarDict[parm][parmIndices[0], parmIndices[1]] = table[parm][i]
elif len(indParmList) == 2:
twoDVarDict[parm][parmIndices[0], parmIndices[1], parmIndices[2]] = table[parm][i]
elif len(indParmList) == 3:
twoDVarDict[parm][parmIndices[0], parmIndices[1], parmIndices[2], parmIndices[3]] = table[parm][i]
elif len(indParmList) == 0:
continue
else:
raise ValueError('Can not handle more than 3 independent spatial parms - there are %i' % (len(indParmList)))
def _appendNetCDF4(self, f, parmIndexDict):
"""_appendNetCDF4 will dump appended data to a netCDF4 file f. Called via dump
parmIndexDict - is a dictionary with key = timestamps and ind spatial parm names,
value = dictionary of keys = unique values, value = index of that value. Can only be used when
arraySplittingParms == []
"""
# create merged datasets table and recLayout
datasetList = []
recordList = []
for rec in self._privList:
if rec.getType() == 'data':
datasetList.append(rec._dataset)
if len(recordList) == 0:
recordList.append(rec._recordSet)
elif rec.getType() == 'catalog':
f.catalog_text = rec.getText()
elif rec.getType() == 'header':
f.header_text = rec.getText()
if len(datasetList) == 0:
raise IOError('No data records in file')
table = numpy.concatenate(datasetList)
recLayout = numpy.concatenate(recordList)
indParmList = self.getIndSpatialParms()
uniqueIndValueDict = {}
for indParm in indParmList:
uniqueIndValueDict[indParm] = numpy.array(list(parmIndexDict[indParm].keys()))
unique_times = numpy.array(list(parmIndexDict['ut1_unix'].keys()))
thisGroup = f # no splitting, so no subgroup needed
# next step - create dimensions
dims = []
timeVar = thisGroup.variables["timestamps"]
dims.append("timestamps")
for indParm in indParmList:
thisVar = thisGroup.variables[indParm]
dims.append(indParm)
# create one and two D parm arrays, set 1D
twoDVarDict = {} # key = parm name, value = netCDF4 variable
for parm in recLayout.dtype.names[len(self.requiredFields):]:
if recLayout[parm][0] == 1:
dset = table[parm][:]
oneDVar = thisGroup.variables[parm]
lastTS = 0.0
for i, ts in enumerate(table['ut1_unix']):
if ts != lastTS:
# set it
oneDVar[parmIndexDict['ut1_unix'][ts]] = dset[i]
lastTS = ts
elif recLayout[parm][0] == 2:
twoDVarDict[parm] = thisGroup.variables[parm]
# set 2D parms
for parm in recLayout.dtype.names[len(self.requiredFields):]:
if recLayout[parm][0] == 2:
for i in range(len(table)):
parmIndices = [parmIndexDict['ut1_unix'][table['ut1_unix'][i]]]
for indParm in indParmList:
item = table[indParm][i]
if type(item) in (bytes, numpy.bytes_):
item = item.decode('utf-8')
parmIndices.append(parmIndexDict[indParm][item])
if len(indParmList) == 1:
twoDVarDict[parm][parmIndices[0], parmIndices[1]] = table[parm][i]
elif len(indParmList) == 2:
twoDVarDict[parm][parmIndices[0], parmIndices[1], parmIndices[2]] = table[parm][i]
elif len(indParmList) == 3:
twoDVarDict[parm][parmIndices[0], parmIndices[1], parmIndices[2], parmIndices[3]] = table[parm][i]
elif len(indParmList) == 0:
continue
else:
raise ValueError('Can not handle more than 3 independent spatial parms - there are %i' % (len(indParmList)))
def _addLayoutDescription(self, group):
"""_addLayoutDescription added a Layout Description dataset to h5py group
"""
indSpatialParms = self.getIndSpatialParms()
layoutDesc = self._getLayoutDescription() % (str(indSpatialParms),
str(indSpatialParms),
1+len(indSpatialParms),
len(indSpatialParms),
str(indSpatialParms),
str(indSpatialParms))
LayoutDescription = layoutDesc.split('\n')
# create a recarray to hold this text
textArr = numpy.recarray((len(LayoutDescription),), dtype=[('Layout Description', h5py.special_dtype(vlen=str))])
for i in range(len(LayoutDescription)):
textArr['Layout Description'][i] = LayoutDescription[i]
group.create_dataset('Layout Description', data=textArr)
def _getLayoutDescription(self):
"""_getLayoutDescription returns a description of the layout selected.
Returns:
LayoutDescription: A list of strings summarizing the Layout Description
Affects: Nothing
Exceptions: None
"""
LayoutDescription = """
This data layout contains reshaped data from the Table Layout. The reshaped data is stored as an array, with time and
the independent spatial parameters (%s) in different dimensions. It creates an array for each parameter found in file.
This layout contains:
- "1D parameters" group: contains one 1D-array for each 1d parameter
stored in the file. Time-dependent only parameters.
- "2D parameters" group: contains one 2D-array for each 2d parameter
stored in the file. Time and %s are independent parameters.
Every 2D array has %i dimensions - one for time, and %i for
the independent spatial parameters (%s).
- timestamps: Time vector in seconds from 1/1/1970.
- %s : The independent spatial parameters for this file"""
return(LayoutDescription)
def _addDataParametersTable(self, group, dim=None):
"""_addDataParametersTable adds the "Data Parameters" table to the h5py Group group if any parameters found
Inputs:
group - the h5py Group to add the dataset to
dim - if None, include all parameters. If 1 or 2, just include non-required 1 or 2 D parms
"""
if dim not in (None, 1, 2):
raise ValueError('dim must be in (None, 1, 2), not <%s>' % (str(dim)))
# this first pass is just to set the maximum length of all the strings
longestMnemStr = 0
longestDescStr = 0
longestUnitsStr = 0
longestCategoryStr = 0
count = 0
for i, parm in enumerate(self._tableDType.names):
if dim in (1,2) and i < len(self.requiredFields):
# skipping default parms
continue
if dim in (1,2):
if self.getParmDim(parm) != dim:
continue
count += 1
if len(self._madParmObj.getParmMnemonic(parm)) > longestMnemStr:
longestMnemStr = len(self._madParmObj.getParmMnemonic(parm))
if len(self._madParmObj.getSimpleParmDescription(parm)) > longestDescStr:
longestDescStr = len(self._madParmObj.getSimpleParmDescription(parm))
if len(self._madParmObj.getParmUnits(parm)) > longestUnitsStr:
longestUnitsStr = len(self._madParmObj.getParmUnits(parm))
if len(self._madParmObj.getParmCategory(parm)) > longestCategoryStr:
longestCategoryStr = len(self._madParmObj.getParmCategory(parm))
if count == 0: # no parms to add
return
parmArr = numpy.recarray((count,),
dtype = [('mnemonic', '|S%i' % (longestMnemStr)),
('description', '|S%i' % (longestDescStr)),
('isError', 'int'),
('units', '|S%i' % (longestUnitsStr)),
('category', '|S%i' % (longestCategoryStr))])
# set all the values
count = 0
for i, parm in enumerate(self._tableDType.names):
if dim in (1,2) and i < len(self.requiredFields):
# skipping default parms
continue
if dim in (1,2):
if self.getParmDim(parm) != dim:
continue
parmArr['mnemonic'][count] = self._madParmObj.getParmMnemonic(parm)
parmArr['description'][count] = self._madParmObj.getSimpleParmDescription(parm)
parmArr['isError'][count] = self._madParmObj.isError(parm)
parmArr['units'][count] = self._madParmObj.getParmUnits(parm)
parmArr['category'][count] = self._madParmObj.getParmCategory(parm)
count += 1
group.create_dataset("Data Parameters", data=parmArr)
def _writeRecordLayout(self, metadataGroup):
"""_writeRecordLayout adds the "_record_layout" table to the Metadata group metadataGroup if needed
"""
tableName = '_record_layout'
if self._recDset is None:
raise IOError('self._recDset not yet specified')
if tableName not in list(metadataGroup.keys()):
dset = metadataGroup.create_dataset(tableName, data=self._recDset)
dset.attrs['description'] = 'This is meant to be internal data. For each Madrigal record and parameter, it has a 2 if its a 2D parameter, 1 if its a 1D parameter, and 0 if there is no data.'
def _verifyFormat(self, tableDset, recDset):
"""verifyFormat raises an exception if any problem with the Hdf5 input file.
Inputs:
tableDset - dataset from hdfFile["Data"]["Table Layout"]
recDset - dataset from hdfFile["Metadata"]["_record_layout"]
Rules: 1. self._tableDset must start with int columns 'year', 'month', 'day', 'hour', 'min', 'sec',
'recno', 'kindat', 'kinst' and float columns 'ut1_unix', 'ut2_unix'
2. The len of self._recDset must be 1 and must start with the same columns
"""
for i, requiredField in enumerate(self.requiredFields):
if requiredField != tableDset.dtype.names[i]:
raise IOError('Field %s not found in table dset of Hdf5 file %s' % (requiredField, str(self._fullFilename)))
if requiredField != self._recDset.dtype.names[i]:
raise IOError('Field %s not found in record dset of Hdf5 file %s' % (requiredField, str(self._fullFilename)))
if len(recDset) != 1:
raise IOError('recDset must have len 1, not' % (len(recDset)))
for field in self._recDset.dtype.names:
if self._recDset[0][field] not in (1,2,3):
raise IOError('For field %s, got illegal recordset value %s' % (field, str(self._recDset[0][field])))
def _appendCatalogRecs(self, expNotesDataset):
"""_appendCatalogRecs will append 0 or more MadrigalCatalogRecords to self._privList based
on the contents in h5py dataset expNotesDataset
"""
start_delimiter = 'Catalog information from record'
end_delimiter = 'Header information from record'
in_catalog = False
catalog_text = ''
start_found = False
if len(self._kinstList) > 0:
kinst = self._kinstList[0]
else:
kinst = None
for line in expNotesDataset:
line = line[0]
if type(line) in (bytes, numpy.bytes_):
line = line.decode('utf8')
if not in_catalog and line.find(start_delimiter) != -1:
in_catalog = True
continue
if in_catalog and (line.find(end_delimiter) != -1 or line.find(start_delimiter) != -1):
start_found = False
if len(catalog_text) > 0:
self._privList.append(MadrigalCatalogRecord(kinst,None,None,None,None,
None,None,None,None,None,
None,None,None,None,None,
None,None,self._madInstObj, '',
catalog_text))
catalog_text = ''
if line.find(start_delimiter) == -1:
in_catalog = False
continue
if in_catalog and not start_found and len(line.split()) == 0:
continue
if in_catalog:
start_found = True
catalog_text += line + ' ' * (80 - len(line))
# see if last part was a catalog
if len(catalog_text) > 0 and len(catalog_text.split()) > 0:
self._privList.append(MadrigalCatalogRecord(kinst,None,None,None,None,
None,None,None,None,None,
None,None,None,None,None,
None,None,self._madInstObj,
catalog_text))
catalog_text = ''
def _appendHeaderRecs(self, expNotesDataset):
"""_appendHeaderRecs will append 0 or more MadrigalHeaderRecords to self._privList based
on the contents in h5py dataset expNotesDataset
"""
start_delimiter = 'Header information from record'
end_delimiter = 'Catalog information from record'
in_header = False
header_text = ''
start_found = False
if len(self._kinstList) > 0:
kinst = self._kinstList[0]
else:
kinst = None
if len(self._kindatList) > 0:
kindat = self._kindatList[0]
else:
kindat = None
for line in expNotesDataset:
line = line[0]
if type(line) in (bytes, numpy.bytes_):
line = line.decode('utf8')
if not in_header and line.find(start_delimiter) != -1:
in_header = True
continue
if in_header and (line.find(end_delimiter) != -1 or line.find(start_delimiter) != -1):
start_found = False
if len(header_text) > 0:
self._privList.append(MadrigalHeaderRecord(kinst,kindat,None,None,None,None,
None,None,None,None,None,None,
None,None,None,None,None,None,
None,self._madInstObj,
self._madKindatObj, header_text))
header_text = ''
if line.find(start_delimiter) == -1:
in_header = False
continue
if in_header and not start_found and len(line.split()) == 0:
continue
if in_header:
header_text += line + ' ' * (80 - len(line))
# see if last part was a header
if len(header_text) > 0 and len(header_text.split()) > 0:
self._privList.append(MadrigalHeaderRecord(kinst,kindat,None,None,None,None,
None,None,None,None,None,None,
None,None,None,None,None,None,
None,self._madInstObj,
self._madKindatObj, header_text))
header_text = ''
def _getArraySplitParms(self, metadataGroup):
"""_getArraySplitParms appends a list of parameters used to split arrays (if any) from
metadataGroup["Parameters Used to Split Array Data"]. If no such table or empty, returns
empty list
"""
retList2 = []
try:
dset = metadataGroup["Parameters Used to Split Array Data"]
retList2 = [mnem.lower() for mnem in dset['mnemonic']]
except:
return(retList2)
# verify ascii
retList = []
for mnem in retList2:
if type(mnem) in (bytes, numpy.bytes_):
retList.append(mnem.decode("ascii"))
else:
retList.append(mnem)
return(retList)
def writeExperimentNotes(self, metadataGroup, refreshCatHeadTimes):
"""writeExperimentNotes writes the "Experiment Notes" dataset to the h5py group metadataGroup
if any catalog or header records found.
refreshCatHeadTimes - if True, update start and and times in the catalog and header
records to represent the times in the data. If False, use existing times in those records.
"""
# templates
cat_template = 'Catalog information from record %i:'
head_template = 'Header information from record %i:'
if "Experiment Notes" in list(metadataGroup.keys()):
# already exists
return
recDict = {} # key = rec number, value = tuple of recarray of lines, 'Catalog' or 'Header' str)
for i, rec in enumerate(self._privList):
if rec.getType() == 'catalog':
if refreshCatHeadTimes:
sDT = self.getEarliestDT()
eDT = self.getLatestDT()
rec.setTimeLists(sDT.year, sDT.month, sDT.day,
sDT.hour, sDT.minute, sDT.second, int(sDT.microsecond/10000),
eDT.year, eDT.month, eDT.day,
eDT.hour, eDT.minute, eDT.second, int(eDT.microsecond/10000))
recarray = rec.getLines()
recDict[i] = (recarray, 'Catalog')
elif rec.getType() == 'header':
if refreshCatHeadTimes:
sDT = self.getEarliestDT()
eDT = self.getLatestDT()
rec.setTimeLists(sDT.year, sDT.month, sDT.day,
sDT.hour, sDT.minute, sDT.second, int(sDT.microsecond/10000),
eDT.year, eDT.month, eDT.day,
eDT.hour, eDT.minute, eDT.second, int(eDT.microsecond/10000))
recarray = rec.getLines()
recDict[i] = (recarray, 'Header')
keys = list(recDict.keys())
keys.sort()
if len(keys) == 0:
return
recarray = None
for key in keys:
new_recarray = numpy.recarray((2,), dtype=[('File Notes', h5py.special_dtype(vlen=str))])
if recDict[key][1] == 'Catalog':
topStr = cat_template % (key)
else:
topStr = head_template % (key)
new_recarray[0]['File Notes'] = topStr + ' ' * (80 - len(topStr))
new_recarray[1]['File Notes'] = ' ' * 80
if recarray is None:
recarray = new_recarray
else:
recarray = numpy.concatenate((recarray, new_recarray))
recarray = numpy.concatenate((recarray, recDict[key][0]))
metadataGroup.create_dataset('Experiment Notes', data=recarray)
def _getKinstList(self, recarr):
"""_getKinstList returns an array of instrument code ints by parsing a numpy recarray
with columns name and value. If name is "instrument code(s)", then parse comma separated
kinst int values in values column. Returns empty list if not found
"""
retList = []
for i in range(len(recarr)):
try:
if int(recarr[i]['name'].decode('utf8').find('instrument code(s)')) != -1:
retList = [int(float(kinst)) for kinst in recarr[i]['value'].decode('utf8').split(',')]
break
except AttributeError:
# not binary
if int(recarr[i]['name'].find('instrument code(s)')) != -1:
retList = [int(float(kinst)) for kinst in recarr[i]['value'].split(',')]
break
return(retList)
def _getKindatList(self, recarr):
"""_getKindatList returns an array of kind of data code ints by parsing a numpy recarray
with columns name and value. If name is "kindat code(s)", then parse comma separated
kindat int values in values column. Returns empty list if not found
"""
retList = []
for i in range(len(recarr)):
try:
if recarr[i]['name'].decode('utf8').find('kindat code(s)') != -1:
retList = [int(float(kindat)) for kindat in recarr[i]['value'].decode('utf8').split(',')]
break
except AttributeError:
# not binary
if recarr[i]['name'].find('kindat code(s)') != -1:
retList = [int(float(kindat)) for kindat in recarr[i]['value'].split(',')]
break
return(retList)
def _printSummary(self, f, filterList):
"""_printSummary prints an overview of the original filename and filters used if any
to open file f (may be stdout)
Inputs:
f - open file to write to
filterList - a list of madrigal.derivation.MadrigalFilter objects to be described in the
summary. Default is None, in which case not described in summary. Ignored if summary
is not 'summary'
"""
if self._fullFilename is not None:
f.write('Data derived from file %s:\n' % (self._fullFilename))
if filterList is None:
return
if len(filterList) == 0:
return
f.write('Filters used:\n')
for i in range(len(filterList)):
f.write('Filter %i:\n' % (i+1))
f.write('%s\n' % (str(filterList[i])))
def _getGroupName(self, indValues):
"""_getGroupName returns the name of a array group when split
Input:
indValues - a list of values, one for each array splitting parameter
"""
groupName = 'Array with '
for i, parm in enumerate(self._arraySplitParms):
if type(parm) == bytes:
parmString = parm.decode('utf8')
else:
parmString = parm
groupName += '%s=%s ' % (parmString, str(indValues[i]))
if i < len(indValues)-1:
groupName += 'and '
return(groupName)
""" the following methods are added to allow this class to emulate a list."""
def __len__(self):
return len(self._privList)
def __getitem__(self, key):
return self._privList[key]
def __setitem__(self, key, value):
# check that value in (MadrigalCatalogRecord, MadrigalHeaderRecord, MadrigalDataRecord)
if not isinstance(value, MadrigalCatalogRecord) and \
not isinstance(value, MadrigalHeaderRecord) and \
not isinstance(value, MadrigalDataRecord):
# check that its not an empty list (used to delete records)
okay = False
if type(value) == list:
if len(value) == 0:
okay = True
if not okay:
raise ValueError('In MadrigalCedarFile, can only add MadrigalCatalogRecord, MadrigalHeaderRecord, or MadrigalDataRecord')
self._privList[key] = value
def __getslice__(self, i, j):
return self._privList[i:j]
def __setslice__(self,i,j,seq):
# check every item in seq
for item in seq:
if not isinstance(value, MadrigalCatalogRecord) and \
not isinstance(value, MadrigalHeaderRecord) and \
not isinstance(value, MadrigalDataRecord):
raise ValueError('In MadrigalCedarFile, can only add MadrigalCatalogRecord, MadrigalHeaderRecord, or MadrigalDataRecord')
self._privList[max(0, i):max(0, j):] = seq
def __delslice__(self, i, j):
del self._privList[max(0, i):max(0, j):]
def __delitem__(self, key):
del self._privList[key]
def __iter__(self):
return iter(self._privList)
def __contains__(self, other):
for item in self._privList:
if item == other:
return 1
# not found
return 0
def __str__(self):
retStr = ''
for item in self._privList:
retStr += '%s\n' % (str(item))
return retStr
def append(self, item):
# check that value in (MadrigalCatalogRecord, MadrigalHeaderRecord, MadrigalDataRecord)
if not isinstance(item, MadrigalCatalogRecord) and \
not isinstance(item, MadrigalHeaderRecord) and \
not isinstance(item, MadrigalDataRecord):
raise ValueError('In MadrigalCedarFile, can only add MadrigalCatalogRecord, MadrigalHeaderRecord, or MadrigalDataRecord')
if isinstance(item, MadrigalDataRecord):
if self._tableDType != None:
if item.getDType() != self._tableDType:
raise ValueError('Varying dtypes found: %s versus %s' % (str(item.getDType()), str(self._tableDType)))
else:
self.setDType(item.getDType())
if self._recDset is not None:
if item.getRecordset() != self._recDset:
raise ValueError('Varying recordsets found: %s versus %s' % (str(item.getRecordset()), str(self._recDset)))
else:
self._recDset = item.getRecordset()
if self._oneDList is None:
# set all internal data structures set by data records only if not yet set
self._oneDList = item.get1DParms()
self._twoDList = item.get2DParms()
self._ind2DList = item.getInd2DParms()
# set self._num2DSplit
twoDSet = set([o.mnemonic for o in self._twoDList])
arraySplitSet = set(self._arraySplitParms)
self._num2DSplit = len(twoDSet.intersection(arraySplitSet))
if self._earliestDT is None:
self._earliestDT = item.getStartDatetime()
self._latestDT = item.getEndDatetime()
else:
if item.getStartDatetime() < self._earliestDT:
self._earliestDT = item.getStartDatetime()
if item.getEndDatetime() > self._latestDT:
self._latestDT = item.getEndDatetime()
if item.getKinst() not in self._kinstList:
self._kinstList.append(item.getKinst())
if item.getKindat() not in self._kindatList:
self._kindatList.append(item.getKindat())
item.setRecno(self._totalDataRecords)
self._totalDataRecords += 1
# update self._recIndexList
if len(self._recIndexList) > 0:
lastIndex = self._recIndexList[-1][1]
else:
lastIndex = 0
self._recIndexList.append((lastIndex, lastIndex + len(item.getDataset())))
if len(self._ind2DList) > 0:
dataset = item.getDataset()
rowsToCheck = dataset
for i, thisRow in enumerate(rowsToCheck):
# update self._arrDict
if not self._arraySplitParms == []:
arraySplitParms = []
for parm in self._arraySplitParms:
if type(parm) == bytes:
arraySplitParms.append(parm.decode('utf8'))
else:
arraySplitParms.append(parm)
key = tuple([thisRow[parm] for parm in arraySplitParms])
# array splitting parameters can never be nan
for this_value in key:
if not this_value.dtype.type is numpy.string_:
if numpy.isnan(this_value):
raise ValueError('parm %s is an array splitting parameter, so its illegal to have a nan value for it anywhere in the file' % (str(parm)))
else:
key = '' # no splitting
if key not in list(self._arrDict.keys()):
self._arrDict[key] = {}
# first add ut1_unix if needed
if 'ut1_unix' in self._arrDict[key]:
if thisRow['ut1_unix'] not in self._arrDict[key]['ut1_unix']:
self._arrDict[key]['ut1_unix'] = self._arrDict[key]['ut1_unix'].union([thisRow['ut1_unix']])
else:
self._arrDict[key]['ut1_unix'] = set([thisRow['ut1_unix']])
# now deal with all ind parms
for parm in self._ind2DList:
mnem = parm.mnemonic
if mnem in self._arraySplitParms:
# no need to create separate dimension since already split out
continue
if mnem in self._arrDict[key]:
if thisRow[mnem] not in self._arrDict[key][mnem]:
self._arrDict[key][mnem] = self._arrDict[key][mnem].union([thisRow[mnem]])
else:
self._arrDict[key][mnem] = set([thisRow[mnem]])
# enforce nan rule for ind2DList
thisList = list(self._arrDict[key][mnem])
if len(thisList) > 0:
skip = False
if type(thisList[0]) != bytes:
skip = True
if type(thisList[0]) == numpy.ndarray:
if thisList[0].dtype.type is numpy.string_:
skip = True
if not skip:
if numpy.any(numpy.isnan(thisList)):
raise ValueError('Cannot have nan in ind parm %s: %s' % (mnem, str(self._arrDict[key][mnem])))
self._privList.append(item)
def count(self, other):
return self._privList.count(other)
def index(self, other):
return self._privList.index(other)
def insert(self, i, x):
self._privList.insert(i, x)
def pop(self, i):
return self._privList.pop(i)
def remove(self, x):
self._privList.remove(x)
def reverse(self):
self._privList.reverse()
def sort(self):
self._privList.sort()
def __del__(self):
if not self._closed and self._createFlag and self._format == 'hdf5':
print(('Warning - created file %s being closed by __del__. Best practice is to call close() directly, to avoid this warning' % (str(self._fullFilename))))
self.close()
class MadrigalDataRecord:
"""MadrigalDataRecord holds all the information in a Cedar data record."""
# cedar special values
missing = numpy.nan
assumed = -1.0
knownbad = -2.0
missing_int = numpy.iinfo(numpy.int64).min
assumed_int = -1
knownbad_int = -2
# standard parms
_stdParms = ['year', 'month', 'day', 'hour', 'min', 'sec',
'recno', 'kindat', 'kinst', 'ut1_unix', 'ut2_unix']
def __init__(self,kinst=None,
kindat=None,
sYear=None,sMonth=None,sDay=None,sHour=None,sMin=None,sSec=None,sCentisec=None,
eYear=None,eMonth=None,eDay=None,eHour=None,eMin=None,eSec=None,eCentisec=None,
oneDList=None,
twoDList=None,
nrow=None,
madInstObj=None,
madParmObj=None,
ind2DList=None,
dataset=None, recordSet=None, verbose=True,
parmObjList=None,
madDB=None):
"""__init__ creates a MadrigalDataRecord with all missing data.
Note: all inputs have default values because there are two ways to populate this structure:
1) with all inputs from kinst to nrow when new data is being created, or
2) with numpy arrays dataset and recordSet from existing Hdf5 file.
Inputs:
kinst - the kind of instrument code. A warning will be raised if not in instTab.txt.
Default is None, in which case recno, dataset, and recordSet must be given.
kindat - kind of data code. Must be a non-negative integer.
Default is None, in which case recno, dataset, and recordSet must be given.
sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec - record start time. sCentisec must be 0-99
Default is None, in which case recno, dataset, and recordSet must be given.
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec - record end time. eCentisec must be 0-99
Default is None, in which case recno, dataset, and recordSet must be given.
oneDList - list of one-dimensional parameters in record. Parameters can be defined as codes
(integers) or case-insensitive mnemonic strings (eg, "Gdalt"), or CedarParameter objects.
Default is None, in which case recno, dataset, and recordSet must be given.
twoDList - list of two-dimensional parameters in record. Parameters can be defined as codes
(integers) or case-insensitive mnemonic strings (eg, "Gdalt"), or CedarParameter objects.
Default is None, in which case recno, dataset, and recordSet must be given.
nrow - number of rows of 2D data to create. Until set, all values default to missing.
Default is None, in which case recno, dataset, and recordSet must be given.
madInstObj - a madrigal.metadata.MadrigalInstrument object. If None, one will be created.
Used to verify kinst.
madParmObj - a madrigal.data.MadrigalParameter object. If None, one will be created.
Used to verify convert parameters to codes.
ind2DList - list of indepedent spatial two-dimensional parameters in record.
Parameters can be defined as codes. Each must also be listed in twoDList.
(integers) or case-insensitive mnemonic strings (eg, "Gdalt"), or CedarParameter objects.
Default is None, in which case dataset, and recordSet must be given.
dataset - an h5py dataset, as found in the Hdf5 group "Data", dataset "Table Layout".
Set to None if this is a new record.
recordSet - an h5py dataset, as found in the Hdf5 group "Metadata", dataset "_record_layout".
Set to None if this is a new record.
verbose - if True (the default), print all warnings. If False, surpress warnings
parmObjList - a list or tuple of three lists: self._oneDList, self._twoDList, and
self._ind2DList described below. Used only to speed performance. Default is
None, in which case new copies are created.
madDB - madrigal.metadata.MadrigalDB object. If None, one will be created.
Outputs: None
Returns: None
Affects:
Creates attributes:
self._madInstObj - madrigal.metadata.MadrigalInstrument object
self._madParmObj - madrigal.data.MadrigalParameters object
self._dataset - h5py dataset in from of Table Layout numpy recarray
self._recordSet - h5py dataset in form of _record_layout numpy recarray
self._verbose - bool indicating verbose or not
self._oneDList - a list of 1D CedarParameter objects in this MadrigalDataRecord
self._twoDList - a list of 2D CedarParameter objects in this MadrigalDataRecord
self._ind2DList - a list of independent spatial parameters in self._twoDList
"""
if madDB is None:
self._madDB = madrigal.metadata.MadrigalDB()
else:
self._madDB = madDB
# create any needed Madrigal objects, if not passed in
if madInstObj is None:
self._madInstObj = madrigal.metadata.MadrigalInstrument(self._madDB)
else:
self._madInstObj = madInstObj
if madParmObj is None:
self._madParmObj = madrigal.data.MadrigalParameters(self._madDB)
else:
self._madParmObj = madParmObj
if twoDList is None:
twoDList = []
if ind2DList is None:
ind2DList = []
if dataset is None or recordSet is None:
if ind2DList is None:
# get it from cachedFiles.ini
extraParms, ind2DList, splitParms = self._madDB.getKinstKindatConfig(kinst, kindat)
# verify there are independent spatial parms if there are 2D parms
if not len(twoDList) == 0 and len(ind2DList) == 0:
raise ValueError('Cannot have 2D parms without an independent spatial parm set')
self._createArraysFromArgs(kinst,kindat,sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec,
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec,
oneDList,twoDList,nrow,ind2DList)
else:
# verify there are independent spatial parms if there are 2D parms
if not len(twoDList) == 0 and len(ind2DList) == 0:
raise ValueError('Cannot have 2D parms without an independent spatial parm set')
self._dataset = dataset
self._recordSet = recordSet
if parmObjList is not None:
self._oneDList = copy.deepcopy(parmObjList[0])
self._twoDList = copy.deepcopy(parmObjList[1])
self._ind2DList = copy.deepcopy(parmObjList[2])
else:
# populate self._oneDList, self._twoDList, and self._ind2DList
self._oneDList = []
self._twoDList = []
self._ind2DList = []
for parm in self._recordSet.dtype.names[len(self._stdParms):]:
if self.parmIsInt(parm):
isInt = True
else:
isInt = False
newCedarParm = CedarParameter(self._madParmObj.getParmCodeFromMnemonic(parm),
parm, self._madParmObj.getParmDescription(parm),
isInt)
if self._recordSet[parm][0] == 1:
self._oneDList.append(newCedarParm)
if self._recordSet[parm][0] in (2,3):
self._twoDList.append(newCedarParm)
if self._recordSet[parm][0] == 3:
self._ind2DList.append(newCedarParm)
self._verbose = bool(verbose)
def getType(self):
""" returns the type 'data'"""
return 'data'
def getDType(self):
"""getDType returns the dtype of the table array with this data
"""
return(self._dataset.dtype)
def getRecDType(self):
"""getRecDType returns the dtype of _record_array
"""
return(self._recordSet.dtype)
def getDataset(self):
"""getDataset returns the dataset table
"""
return(self._dataset)
def getRecordset(self):
"""getRecordset returns the recordSet table
"""
return(self._recordSet)
def parmIsInt(self, parm):
"""parmIsInt returns True if this parm (mnemonic) is integer type, False if float or string
Raises ValueError if parm not in record
"""
try:
typeStr = str(self._dataset.dtype[parm.lower()].kind)
except KeyError:
raise ValueError('Parm <%s> not found in file' % (str(parm)))
if typeStr.find('i') != -1:
return(True)
else:
return(False)
def parmIsString(self, parm):
"""parmIsString returns True if this parm (mnemonic) is String type, False if float or int
Raises ValueError is parm not in record
"""
try:
typeStr = str(self._dataset.dtype[parm.lower()].kind)
except KeyError:
raise ValueError('Parm <%s> not found in file' % (str(parm)))
if typeStr.find('S') != -1:
return(True)
else:
return(False)
def getStrLen(self, parm):
"""getStrLen returns True if this parm (mnemonic) is integer type, False if float or string
Raises ValueError is parm not in record, or is not String
"""
if not self.parmIsString(parm):
raise ValueError('Parm <%s> not string type' % (str(parm)))
return(self._dataset.dtype[parm.lower()].itemsize)
def add1D(self, oneDParm):
"""add1D adds a new one-dim parameter to a MadrigalDataRecord
Input: oneDParm - Parameter can be defined as codes (integer) or case-insensitive
mnemonic string (eg, "Gdalt")
Affects: 1) adds new column to self._dataset with all values Nan, and 2) adds
value to end of self._recordSet with value = 1 since 1D parm
If these addition makes self._dataset.dtype differ from that in MadrigalCedarFile, appending this
MadrigalDataRecord to MadrigalCedarFile will raise an IOError. Also raises error if parm already exists.
"""
self.addParm(oneDParm, 1)
def add2D(self, twoDParm):
"""add2D adds a new two-dim parameter to a MadrigalDataRecord
Input: twoDParm - Parameter can be defined as codes (integer) or case-insensitive
mnemonic string (eg, "Gdalt")
Affects: 1) adds new column to self._dataset with all values Nan, and 2) adds
value to end of self._recordSet with value = 2 since 2D parm
If these addition makes self._dataset.dtype differ from that in MadrigalCedarFile, appending this
MadrigalDataRecord to MadrigalCedarFile will raise an IOError. Also raises error if parm already exists.
"""
self.addParm(twoDParm, 2)
def addParm(self, newParm, dim):
"""addParm adds a new one or two-dim parameter to a MadrigalDataRecord
Input: newParm - Parameter can be defined as codes (integer) or case-insensitive
mnemonic string (eg, "Gdalt")
dim - either 1 for scalar, or 2 for vector parm
Affects: 1) adds new column to self._dataset with all values Nan, and 2) adds
value to end of self._recordSet with value = dim
If these addition makes self._dataset.dtype differ from that in MadrigalCedarFile, appending this
MadrigalDataRecord to MadrigalCedarFile will raise an IOError. Also raises error if parm already exists.
"""
if dim not in (1,2):
raise ValueError('dim must be 1 or 2, not %s' % (str(dim)))
# see if its an integer
try:
code = int(newParm)
isInt = True
except:
isInt = False
if isInt:
# try to look up mnemonic
mnem = self._madParmObj.getParmMnemonic(int(newParm)).lower()
if mnem == str(newParm):
raise IOError('Cannot use unknown parm %i' % (int(newParm)))
else:
# this must succeed or an exception raised
try:
code = self._madParmObj.getParmCodeFromMnemonic(newParm.lower())
except ValueError:
raise IOError('Mnem %s not found' % (newParm))
mnem = newParm.lower()
# issue warning if an unneeded time parameter being added
if self._verbose and abs(code) < timeParms:
sys.stderr.write('WARNING: Parameter %s is a time parameter that potentially conflicts with prolog times\n' % (parm[1]))
# figure out dtype
format = self._madParmObj.getParmFormat(mnem)
if format[-1] == 'i':
dtype = numpy.int64
else:
dtype = numpy.float64
data = numpy.zeros((len(self._dataset),), dtype)
data[:] = numpy.nan
self._dataset = numpy.lib.recfunctions.append_fields(self._dataset, mnem, data)
data = numpy.array([dim], numpy.int64)
self._recordSet = numpy.lib.recfunctions.append_fields(self._recordSet, mnem, data, usemask=False)
newCedarParm = CedarParameter(self._madParmObj.getParmCodeFromMnemonic(mnem),
mnem, self._madParmObj.getParmDescription(mnem),
isInt)
if dim == 1:
self._oneDList.append(newCedarParm)
else:
self._twoDList.append(newCedarParm)
def set1D(self, parm, value):
"""set1D sets a 1D value for a given 1D parameter
Inputs:
parm - can be defined as code (integer) or case-insensitive mnemonic string (eg, "Gdalt")
value - double (or string convertable to double) value to set 1D parameter to. To set special Cedar values, the global values
missing, assumed, or knownbad may be used, or the strings "missing", "assumed", or "knownbad"
May also be int or string if that type
Outputs: None
"""
parm = self._madParmObj.getParmMnemonic(parm).lower()
if value == 'missing':
if self.parmIsInt(parm):
value = self.missing_int
elif self.parmIsString(parm):
value = ' ' * self.getStrLen(parm)
else:
value = self.missing
if self._madParmObj.isError(parm):
if value == 'assumed':
if self.parmIsInt(parm):
value = self.assumed_int
else:
value = self.assumed
elif value == 'knownbad':
if self.parmIsInt(parm):
value = self.knownbad_int
else:
value = self.knownbad
elif value in ('assumed', 'knownbad'):
raise ValueError('It is illegal to set the non-error parm %s to %s' % (parm, value))
# make sure this is a one-d parm
try:
dim = self._recordSet[parm]
except ValueError:
raise ValueError('parm %s does not exist' % (str(parm)))
if dim != 1:
raise ValueError('parm %s is 2D, not 1D' % (str(parm)))
# set it
self._dataset[parm] = value
def set2D(self, parm, row, value):
"""set2D sets a 2D value for a given 2D parameter and row
Inputs:
parm - can be defined as code (integer) or case-insensitive mnemonic string (eg, "Gdalt")
row - row number to set data. Starts at 0.
value - double (or string convertable to double) value to set 2D parameter to. To set special Cedar values, the global values
missing, assumed, or knownbad may be used, or the strings "missing", "assumed", or "knownbad"
May also be int or string if that type
Outputs: None
"""
if row >= len(self._dataset) or row < 0:
raise ValueError('Illegal value of row %i with nrow = %i' % (row,len(self._dataset)))
parm = self._madParmObj.getParmMnemonic(parm).lower()
isString = self._madParmObj.isString(parm)
if value == 'missing':
if self.parmIsInt(parm):
value = self.missing_int
elif self.parmIsString(parm):
value = ' ' * self.getStrLen(parm)
else:
value = self.missing
if self._madParmObj.isError(parm):
if value == 'assumed':
if self.parmIsInt(parm):
value = self.assumed_int
else:
value = self.assumed
elif value == 'knownbad':
if self.parmIsInt(parm):
value = self.knownbad_int
else:
value = self.knownbad
elif value in ('assumed', 'knownbad'):
raise ValueError('It is illegal to set the non-error parm %s to %s' % (parm, value))
# make sure this is a two-d parm
try:
dim = self._recordSet[parm]
except ValueError:
raise ValueError('parm %s does not exist' % (str(parm)))
if dim not in (2, 3):
raise ValueError('parm %s is 1D, not 2D' % (str(parm)))
# if its ind parm, make sure its not nan
if parm in self._ind2DList:
if not isString:
if numpy.isnan(value):
raise ValueError('Cannot set ind parm %s to nan at row %i' % (parm, row))
# set it
self._dataset[parm][row] = value
def set2DParmValues(self, parm, values):
"""set2DParmValues sets all 2D value in all rows for a given 2D parameter
Inputs:
parm - can be defined as code (integer) or case-insensitive mnemonic string (eg, "Gdalt")
value - list, tuple, or numpy array of int64 or float64 type. Must match len of nrows. User is responsible
for having set all special values to missing, assumed and knownbad as defined at top
of this class for either ints or floats
Outputs: None
"""
# make sure this is a two-d parm
parm = self._madParmObj.getParmMnemonic(parm).lower()
isString = self._madParmObj.isString(parm)
try:
dim = self._recordSet[parm]
except ValueError:
raise ValueError('parm %s does not exist' % (str(parm)))
if dim not in (2, 3):
raise ValueError('parm %s is 1D, not 2D' % (str(parm)))
if parm in self._ind2DList:
if not isString:
if numpy.any(numpy.isnan(values)):
raise ValueError('Cannot set any ind parm %s value to nan: %s' % (parm, str(values)))
# set it
self._dataset[parm] = values
def get1D(self, parm):
"""get1D returns the 1D value for a given 1D parameter
Inputs:
parm - can be defined as code (integer) or case-insensitive mnemonic string (eg, "Gdalt")
Outputs: value
"""
# make sure this is a one-d parm
parm = self._madParmObj.getParmMnemonic(parm).lower()
isString = self._madParmObj.isString(parm)
try:
dim = self._recordSet[parm]
except ValueError:
raise ValueError('parm %s does not exist' % (str(parm)))
if dim != 1:
raise ValueError('parm %s is 2D, not 1D' % (str(parm)))
value = self._dataset[parm][0]
# check for special values
if not isString:
if numpy.isnan(value):
return('missing')
# if its an error parameter, allow assumed or knownbad
if self._madParmObj.isError(parm):
if int(value) == self.assumed_int:
return('assumed')
if int(value) == self.knownbad_int:
return('knownbad')
return value
def get2D(self, parm, row):
"""get2D returns the 2D value for a given 2D parameter
Inputs:
parm - can be defined as code (integer) or case-insensitive mnemonic string (eg, "Gdalt")
row - row number to get data. Starts at 0.
Outputs: double value, or the strings "missing", "assumed", or "knownbad"
"""
if row >= len(self._dataset) or row < 0:
raise ValueError('Illegal value of row %i with nrow = %i' % (row,len(self._dataset)))
# make sure this is a two-d parm
parm = self._madParmObj.getParmMnemonic(parm).lower()
isString = self._madParmObj.isString(parm)
try:
dim = self._recordSet[parm]
except ValueError:
raise ValueError('parm %s does not exist' % (str(parm)))
if dim not in (2,3):
raise ValueError('parm %s is 1D, not 2D' % (str(parm)))
value = self._dataset[parm][row]
# check for special values
if not isString:
if numpy.isnan(value):
return('missing')
# if its an error parameter, allow assumed or knownbad
if self._madParmObj.isError(parm):
if int(value) == self.assumed_int:
return('assumed')
if int(value) == self.knownbad_int:
return('knownbad')
return value
def getRow(self, row):
"""getRow returns the row of data in order defined in self._dataset.dtype
Input: row number
IndexError raised if not a valid row index
"""
return(self._dataset[row])
def setRow(self, row, values):
"""setRow sets an entire row of data at once
Inputs:
row - row number to set
values - a tuple of values in the right format to match self._dataset.dtype
"""
self._dataset[row] = values
def delete1D(self, parm):
"""delete1D removes the given 1D parameter from the record
Inputs:
parm - can be defined as code (integer) or case-insensitive mnemonic string (eg, "Gdalt")
Outputs: None
Raise exception if 1D parm does not exist. If this deletion makes self._dataset.dtype differ from that in
MadrigalCedarFile, appending this MadrigalDataRecord to MadrigalCedarFile will raise an IOError.
"""
# make sure this is a one-d parm
parm = self._madParmObj.getParmMnemonic(parm).lower()
try:
dim = self._recordSet[parm]
except ValueError:
raise ValueError('parm %s does not exist' % (str(parm)))
if dim != 1:
raise ValueError('parm %s is 2D, not 1D' % (str(parm)))
self._dataset = numpy.lib.recfunctions.drop_fields(self._dataset, parm)
self._recordSet = numpy.lib.recfunctions.drop_fields(self._recordSet, parm)
# find index to delete from self._oneDList
index = None
for i, parmObj in enumerate(self._oneDList):
if parmObj.mnemonic == parm:
index = i
break
if index is None:
raise ValueError('Did not find parm %s in self._oneDList' % (str(parm)))
del self._oneDList[index]
def delete2DParm(self, parm):
"""delete2DParm removes the given 2D parameter from every row in the record
Inputs:
parm - can be defined as code (integer) or case-insensitive mnemonic string (eg, "Gdalt")
Outputs: None
Raise exception if 2D parm does not exist. If this deletion makes self._dataset.dtype differ from that in
MadrigalCedarFile, appending this MadrigalDataRecord to MadrigalCedarFile will raise an IOError.
"""
# make sure this is a two-d parm
parm = self._madParmObj.getParmMnemonic(parm).lower()
try:
dim = self._recordSet[parm]
except ValueError:
raise ValueError('parm %s does not exist' % (str(parm)))
if dim not in (2,3):
raise ValueError('parm %s is 1D, not 2D' % (str(parm)))
self._dataset = numpy.lib.recfunctions.drop_fields(self._dataset, parm)
self._recordSet = numpy.lib.recfunctions.drop_fields(self._recordSet, parm)
# find index to delete from self._twoDList
index = None
for i, parmObj in enumerate(self._twoDList):
if parmObj.mnemonic == parm:
index = i
break
if index is None:
raise ValueError('Did not find parm %s in self._twoDList' % (str(parm)))
del self._twoDList[index]
def delete2DRows(self, rows):
"""delete2DRows removes the given 2D row or rows in the record (first is row 0)
Inputs:
row number (integer) or list of row numbers to delete (first is row 0)
Outputs: None
Raise exception if row does not exist
"""
# make sure row is a list
if type(rows) in (int, int):
rows = [rows]
keepIndices = []
count = 0 # make sure all rows actually exist
for i in range(self.getNrow()):
if i not in rows:
keepIndices.append(i)
else:
count += 1
if count != len(rows):
raise ValueError('Some row in %s out of range, total number of rows is %i' % (str(rows), self.getNrow()))
self._dataset = self._dataset[keepIndices]
def getKinst(self):
"""getKinst returns the kind of instrument code (int) for a given data record.
Inputs: None
Outputs: the kind of instrument code (int) for a given data record.
"""
return(self._dataset['kinst'][0])
def setKinst(self, newKinst):
"""setKinst sets the kind of instrument code (int) for a given data record.
Inputs: newKinst - new instrument code (integer)
Outputs: None
Affects: sets self._dataset['kinst']
"""
newKinst = int(newKinst)
if newKinst < 0:
raise ValueError('Kinst must not be less than 0, not %i' % (newKinst))
# verify and set kinst
instList = self._madInstObj.getInstrumentList()
found = False
for inst in instList:
if inst[2] == newKinst:
self._instrumentName = inst[0]
found = True
break
if found == False:
self._instrumentName = 'Unknown instrument'
sys.stderr.write('Warning: kinst %i not found in instTab.txt\n' % (newKinst))
self._dataset['kinst'] = newKinst
def getKindat(self):
"""getKindat returns the kind of data code (int) for a given data record.
Inputs: None
Outputs: the kind of data code (int) for a given data record.
"""
return(self._dataset['kindat'][0])
def setKindat(self, newKindat):
"""setKindat sets the kind of data code (int) for a given data record.
Inputs: newKindat (integer)
Outputs: None
Affects: sets self._dataset['kindat']
"""
if int(newKindat) < 0:
raise ValueError('kindat cannot be negative: %i' % (int(newKindat)))
self._dataset['kindat'] = int(newKindat)
def getRecno(self):
"""getRecno returns the recno (int) for a given data record.
Inputs: None
Outputs: the recno (int) for a given data record. May be 0 if not yet in a file
"""
return(self._dataset['kindat'][0])
def setRecno(self, newRecno):
"""setRecno sets the recno (int) for a given data record.
Inputs: newRecno (integer)
Outputs: None
Affects: sets self._dataset['recno']
"""
if int(newRecno) < 0:
raise ValueError('recno cannot be negative: %i' % (int(newRecno)))
self._dataset['recno'] = int(newRecno)
def getNrow(self):
"""getNrow returns the number of 2D data rows (int) for a given data record.
Inputs: None
Outputs: the number of 2D data rows.
"""
return(len(self._dataset))
def getStartTimeList(self):
"""getStartTimeList returns a tuple containing sYear, sMonth, sDay, sHour, sMin, sSec, and sCentisec
Inputs: None
Outputs: a tuple containing sYear, sMonth, sDay, sHour, sMin, sSec, and sCentisec.
"""
startDT = self.getStartDatetime()
return((startDT.year, startDT.month, startDT.day,
startDT.hour, startDT.minute, startDT.second, int(startDT.microsecond/1.0E4)))
def getStartDatetime(self):
"""getStartDatetime returns a start record datetime
Inputs: None
Outputs: a datetime.datetime object representing the start time of the record
"""
return(datetime.datetime.utcfromtimestamp(self._dataset['ut1_unix'][0]))
def setStartTimeList(self, sYear, sMonth, sDay, sHour, sMin, sSec, sCentisec=0):
"""setStartTimeList changes the data record start time
Inputs: integers sYear, sMonth, sDay, sHour, sMin, sSec. sCentisec defaults to 0
Outputs: None
Affects: changes self._dataset fields ut1_unix, year, month, day, hour, min,sec
Prints warning if new start time after present end time
"""
# check validity of input time
sCentisec = int(sCentisec)
if sCentisec < 0 or sCentisec > 99:
raise ValueError('Illegal sCentisec %i' % (sCentisec))
try:
sDT = datetime.datetime(sYear, sMonth, sDay, sHour, sMin, sSec, int(sCentisec*1E4))
except:
raise ValueError('Illegal datetime %s' % (str((sYear, sMonth, sDay, sHour, sMin, sSec, sCentisec))))
if sDT > self.getEndDatetime():
sys.stderr.write('Warning: New starting time %s after present ending time %s\n' % (str(sDT),
str(self.getEndDatetime())))
ut1_unix = (sDT - datetime.datetime(1970,1,1)).total_seconds()
self._dataset['ut1_unix'] = ut1_unix
# need to reset average time
aveDT = sDT + (self.getEndDatetime() - sDT)/2
self._dataset['year'] = aveDT.year
self._dataset['month'] = aveDT.month
self._dataset['day'] = aveDT.day
self._dataset['hour'] = aveDT.hour
self._dataset['min'] = aveDT.minute
self._dataset['sec'] = aveDT.second
def getEndTimeList(self):
"""getEndTimeList returns a tuple containing eYear, eMonth, eDay, eHour, eMin, eSec, and eCentisec
Inputs: None
Outputs: a tuple containing eYear, eMonth, eDay, eHour, eMin, eSec, and eCentisec.
"""
endDT = self.getEndDatetime()
return((endDT.year, endDT.month, endDT.day,
endDT.hour, endDT.minute, endDT.second, int(endDT.microsecond/1.0E4)))
def getEndDatetime(self):
"""getEndDatetime returns a end record datetime
Inputs: None
Outputs: a datetime.datetime object representing the end time of the record
"""
return(datetime.datetime.utcfromtimestamp(self._dataset['ut2_unix'][0]))
def setEndTimeList(self, eYear, eMonth, eDay, eHour, eMin, eSec, eCentisec=0):
"""setEndTimeList changes the data record end time
Inputs: integers eYear, eMonth, eDay, eHour, eMin, eSec. eCentisec defaults to 0
Outputs: None
Affects: changes self._dataset fields ut2_unix, year, month, day, hour, min,sec
Prints warning if new start time after present end time
"""
# check validity of input time
eCentisec = int(eCentisec)
if eCentisec < 0 or eCentisec > 99:
raise ValueError('Illegal eCentisec %i' % (eCentisec))
try:
eDT = datetime.datetime(eYear, eMonth, eDay, eHour, eMin, eSec, int(eCentisec*1E4))
except:
raise ValueError('Illegal datetime %s' % (str((eYear, eMonth, eDay, eHour, eMin, eSec, eCentisec))))
if eDT < self.getStartDatetime():
sys.stderr.write('Warning: New ending time %s before present starting time %s\n' % (str(eDT),
str(self.getStartDatetime())))
ut2_unix = (eDT - datetime.datetime(1970,1,1)).total_seconds()
self._dataset['ut2_unix'] = ut2_unix
# need to reset average time
aveDT = eDT - (eDT - self.getStartDatetime())/2
self._dataset['year'] = aveDT.year
self._dataset['month'] = aveDT.month
self._dataset['day'] = aveDT.day
self._dataset['hour'] = aveDT.hour
self._dataset['min'] = aveDT.minute
self._dataset['sec'] = aveDT.second
def get1DParms(self):
"""get1DParms returns a list of 1D parameters in the MadrigalDataRecord.
Inputs: None
Outputs: a list of 1D CedarParameter objects in the MadrigalDataRecord.
"""
return(self._oneDList)
def get2DParms(self):
"""get2DParms returns a list of 2D parameters in the MadrigalDataRecord.
Inputs: None
Outputs: a list of 2D CedarParameter objects in the MadrigalDataRecord. Includes
both independent and dependent parms.
"""
return(self._twoDList)
def getInd2DParms(self):
"""getInd2DParms returns a list of the subset 2D parameters ithat are independent parmeters.
Inputs: None
Outputs: a list of independent 2D CedarParameter objects in the MadrigalDataRecord.
"""
return(self._ind2DList)
def getParmDim(self, parm):
"""getParmDim returns the dimension (1, 2, or 3 for independent spatial parm) of a given parm mnemonic
Raise KeyError if that parameter not found in file
"""
for obj in self._oneDList:
if obj.mnemonic.lower() == parm.lower():
return(1)
# do ind 2D next since they are in both lists
for obj in self._ind2DList:
if obj.mnemonic.lower() == parm.lower():
return(3)
for obj in self._twoDList:
if obj.mnemonic.lower() == parm.lower():
return(2)
raise KeyError('Parm <%s> not found in data' % (str(parm)))
def getHeaderKodLines(self):
"""getHeaderKodLines creates the lines in the Madrigal header record that start KOD and describe parms
Inputs: None
Returns: a string of length 80*num parms. Each 80 characters contains a description
of a single parm accodring to the Cedar Standard
"""
# create a list of oneDCedar codes for the data record.
# Each item has three elements: (code, parameter description, units)
oneDCedarCodes = []
for parm in self._oneDList:
oneDCedarCodes.append((parm.code, self._madParmObj.getSimpleParmDescription(parm.code),
self._madParmObj.getParmUnits(parm.code)))
oneDCedarCodes.sort(key=compareParms)
# create a list of twoDCedar codes for the data record.
# Each item has three elements: (code, parameter description, units)
twoDCedarCodes = []
for parm in self._twoDList:
twoDCedarCodes.append((parm.code, self._madParmObj.getSimpleParmDescription(parm.code),
self._madParmObj.getParmUnits(parm.code)))
twoDCedarCodes.sort(key=compareParms)
# write out lines - one D
retStr = ''
if len(oneDCedarCodes) > 0:
retStr += 'C 1D Parameters:' + (80 - len('C 1D Parameters:'))*' '
for i in range(len(oneDCedarCodes)):
code = oneDCedarCodes[i][0]
desc = oneDCedarCodes[i][1]
units = oneDCedarCodes[i][2]
line = 'KODS(%i)' % (i)
line += (10-len(line))*' '
codeNum = str(code)
codeNum = (10-len(codeNum))* ' ' + codeNum
line += codeNum
if len(desc) > 48:
desc = ' ' + desc[:48] + ' '
else:
desc = ' ' + desc + (49-len(desc))* ' '
line += desc
units = units[:10] + (10-len(units[:10]))*' '
line += units
retStr += line
# two D
if len(twoDCedarCodes) > 0:
retStr += 'C 2D Parameters:' + (80 - len('C 2D Parameters:'))*' '
for i in range(len(twoDCedarCodes)):
code = twoDCedarCodes[i][0]
desc = twoDCedarCodes[i][1]
units = twoDCedarCodes[i][2]
line = 'KODM(%i)' % (i)
line += (10-len(line))*' '
codeNum = str(code)
codeNum = (10-len(codeNum))* ' ' + codeNum
line += codeNum
if len(desc) > 48:
desc = ' ' + desc[:48] + ' '
else:
desc = ' ' + desc + (49-len(desc))* ' '
line += desc
units = units[:10] + (10-len(units[:10]))*' '
line += units
retStr += line
return(retStr)
def _createArraysFromArgs(self,kinst,kindat,sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec,
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec,
oneDList,twoDList,nrow,ind2DList):
"""_createArraysFromArgs creates a table layout array and record array numpy array based in input arguments.
Inputs:
kinst - the kind of instrument code. A warning will be raised if not in instTab.txt.
Default is None, in which case recno, dataset, and recordSet must be given.
kindat - kind of data code. Must be a non-negative integer.
Default is None, in which case recno, dataset, and recordSet must be given.
sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec - record start time. sCentisec must be 0-99
Default is None, in which case recno, dataset, and recordSet must be given.
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec - record end time. eCentisec must be 0-99
Default is None, in which case recno, dataset, and recordSet must be given.
oneDList - list of one-dimensional parameters in record. Parameters can be defined as codes
(integers) or case-insensitive mnemonic strings (eg, "Gdalt"), or CedarParameter objects.
Default is None, in which case recno, dataset, and recordSet must be given.
twoDList - list of two-dimensional parameters in record. Parameters can be defined as codes
(integers) or case-insensitive mnemonic strings (eg, "Gdalt"), or CedarParameter objects.
Default is None, in which case recno, dataset, and recordSet must be given.
nrow - number of rows of 2D data to create. Until set, all values default to missing.
Default is None, in which case recno, dataset, and recordSet must be given.
ind2DList - list of indepedent spatial two-dimensional parameters in record.
Parameters can be defined as codes. Each must also be listed in twoDList.
(integers) or case-insensitive mnemonic strings (eg, "Gdalt"), or CedarParameter objects.
Default is None, in which case dataset, and recordSet must be given.
"""
defaultParms = self._stdParms
dataDtype = [] # data type for the Table Layout recarray
recDType = [] # data type for _record_layout recarray
recDims = [] # dimension of each parameter (1 for 1D, 2 for dependent 2D, 3 for independent 2D)
parmsAddedSoFar = [] # mnemonics added so far
# the following is simply to ensure that independent 2D parms are also listed in twoDList
twoDParms = []
for parm in twoDList:
if isinstance(parm, CedarParameter):
parm = parm.mnemonic
mnem = self._madParmObj.getParmMnemonic(parm)
if mnem in twoDParms:
raise ValueError('Duplicate parmeter %s in twoDList' % (mnem))
twoDParms.append(mnem)
# default parms
for parm in defaultParms:
mnem = self._madParmObj.getParmMnemonic(parm)
if self._madParmObj.isInteger(mnem):
dataDtype.append((mnem.lower(), int))
else: # default parms cannot be strings
dataDtype.append((mnem.lower(), float))
recDType.append((parm.lower(), int))
recDims.append(1)
parmsAddedSoFar.append(mnem)
# one D parms
for parm in oneDList:
if isinstance(parm, CedarParameter):
parm = parm.mnemonic
mnem = self._madParmObj.getParmMnemonic(parm)
if mnem in parmsAddedSoFar:
continue # legal because it may be a default parm
if self._madParmObj.isInteger(mnem):
dataDtype.append((mnem.lower(), int))
elif self._madParmObj.isString(mnem):
strLen = self._madParmObj.getStringLen(mnem)
dataDtype.append((mnem.lower(), numpy.string_, strLen))
else:
dataDtype.append((mnem.lower(), float))
recDType.append((parm.lower(), int))
recDims.append(1)
parmsAddedSoFar.append(mnem)
for parm in ind2DList:
if isinstance(parm, CedarParameter):
parm = parm.mnemonic
mnem = self._madParmObj.getParmMnemonic(parm)
if mnem in parmsAddedSoFar:
raise ValueError('Duplicate parmeter %s' % (mnem))
if mnem not in twoDParms:
raise ValueError('Independent 2D parm %s not found in twoDList' % (mnem))
if self._madParmObj.isInteger(mnem):
dataDtype.append((mnem.lower(), int))
elif self._madParmObj.isString(mnem):
strLen = self._madParmObj.getStringLen(mnem)
dataDtype.append((mnem.lower(), numpy.string_, strLen))
else:
dataDtype.append((mnem.lower(), float))
recDType.append((parm.lower(), int))
recDims.append(3)
parmsAddedSoFar.append(mnem)
for parm in twoDList:
if isinstance(parm, CedarParameter):
parm = parm.mnemonic
mnem = self._madParmObj.getParmMnemonic(parm)
if mnem in parmsAddedSoFar:
continue # legal because may be independent parm
if self._madParmObj.isInteger(mnem):
dataDtype.append((mnem.lower(), int))
elif self._madParmObj.isString(mnem):
strLen = self._madParmObj.getStringLen(mnem)
dataDtype.append((mnem.lower(), numpy.string_, strLen))
else:
dataDtype.append((mnem.lower(), float))
recDType.append((parm.lower(), int))
recDims.append(2)
# create two recarrays
self._dataset = numpy.recarray((max(nrow, 1),), dtype = dataDtype)
self._recordSet = numpy.array([tuple(recDims),], dtype = recDType)
# set prolog values
sDT = datetime.datetime(int(sYear),int(sMonth),int(sDay),int(sHour),int(sMin),int(sSec),int(sCentisec)*10000)
eDT = datetime.datetime(int(eYear),int(eMonth),int(eDay),int(eHour),int(eMin),int(eSec),int(eCentisec)*10000)
midDT = sDT + ((eDT-sDT)/2)
self._dataset['year'] = midDT.year
self._dataset['month'] = midDT.month
self._dataset['day'] = midDT.day
self._dataset['hour'] = midDT.hour
self._dataset['min'] = midDT.minute
self._dataset['sec'] = midDT.second
self._dataset['recno'] = 0
self._dataset['kindat'] = kindat
self.setKinst(kinst)
self._dataset['ut1_unix'] = madrigal.metadata.getUnixUTFromDT(sDT)
self._dataset['ut2_unix'] = madrigal.metadata.getUnixUTFromDT(eDT)
# set all other values to default
for i in range(len(defaultParms), len(dataDtype)):
if dataDtype[i][1] == float:
self._dataset[dataDtype[i][0]] = self.missing
elif dataDtype[i][1] == int:
self._dataset[dataDtype[i][0]] = self.missing_int
else:
# string type only one left
strLen = self._madParmObj.getStringLen(dataDtype[i][0])
self._dataset[dataDtype[i][0]] = ' ' * strLen
def __get2DValueList__(self, parm):
"""__get2DValueList__ returns a list containing all the 2D values of a given parameter.
Inputs:
parm - can be defined as code (integer) or case-insensitive mnemonic string (eg, "Gdalt")
Outputs: a list containing all the 2D values of a given parameter. Special values will
be given the values 'missing', 'assumed', or 'knownbad'
"""
retList = []
nrow = self.getNrow()
for i in range(nrow):
retList.append(self.get2D(parm,i))
return(retList)
def __get2DMainValueList__(self, code, scaleFactor):
"""__get2DMainValueList__ returns a list containing all the 2D values of a given main parameter.
Inputs:
code - parameter code (integer). Must
be a parameter with an additional increment parameter.
Outputs: a list containing all the 2D values of a given main parameter that has an
additional increment parameter. Special values will be given the values 'missing', 'assumed', or 'knownbad'
"""
retList = []
nrow = self.getNrow()
for i in range(nrow):
value = self.get2D(code,i)
if type(value) != bytes:
# subtract off additional increment part
addIncr = value % scaleFactor
if value < 0:
addIncr = -1.0 * (scaleFactor - addIncr)
value = value - addIncr
retList.append(value)
return(retList)
def __get2DIncrValueList__(self, code, scaleFactor):
"""__get2DIncrValueList__ returns a list containing all the additional increment 2D values of a given main parameter.
Inputs:
parm - parameter code (integer). Must
be a parameter with an additional increment parameter.
Outputs: a list containing all the additional increment 2D values of a given main parameter.
Special values will be given the values 'missing', 'assumed', or 'knownbad'
"""
retList = []
nrow = self.getNrow()
for i in range(nrow):
value = self.get2D(code,i)
if type(value) != bytes:
# get additional increment part
incr = value % scaleFactor
if value < 0:
incr = -1.0 * (scaleFactor - incr)
value = incr
retList.append(value)
return(retList)
def __str__(self):
""" returns a string representation of a MadrigalDataRecord """
retStr = 'Data record:\n'
retStr += 'kinst = %i (%s)\n' % (self.getKinst(), self._instrumentName)
retStr += 'kindat = %i\n' % (self.getKindat())
startTimeList = self.getStartTimeList()
endTimeList = self.getEndTimeList()
retStr += 'record start: %04i-%02i-%02i %02i:%02i:%02i.%02i\n' % (tuple(startTimeList))
retStr += 'record end: %04i-%02i-%02i %02i:%02i:%02i.%02i\n' % (tuple(endTimeList))
retStr += 'one-dim parameters:\n'
for parm in self._oneDList:
retStr += '\t%s\n' % (str(parm))
try:
retStr += '%s\n' % (str(self._oneDData))
except AttributeError:
pass # there may not be oneDData
retStr += 'two-dim parameters:\n'
for parm in self._twoDList:
retStr += '\t%s\n' % (str(parm))
try:
retStr += '%s\n' % (str(self._twoDData))
except AttributeError:
pass # there may not be twoDData
return(retStr)
def __cmp__(self, other):
"""cmpRecords compares two cedar records to allow them to be sorted
"""
if other is None:
return(1)
# compare record start times
fList = self.getStartTimeList()
sList = other.getStartTimeList()
fDT = datetime.datetime(*fList)
sDT = datetime.datetime(*sList)
result = cmp(fDT, sDT)
if result:
return(result)
# compare record type
typeList = ('catalog', 'header', 'data')
fType = self.getType()
sType = other.getType()
result = cmp(typeList.index(fType), typeList.index(sType))
if result:
return(result)
# compare record stop times
fList = self.getEndTimeList()
sList = other.getEndTimeList()
fDT = datetime.datetime(*fList)
sDT = datetime.datetime(*sList)
result = cmp(fDT, sDT)
if result:
return(result)
# compare kindat if both data
if fType == 'data' and sType == 'data':
result = cmp(self.getKindat(), other.getKindat())
if result:
return(result)
return(0)
class MadrigalCatalogRecord:
"""MadrigalCatalogRecord holds all the information in a Cedar catalog record."""
def __init__(self,kinst = None,
modexp = None,
sYear = None, sMonth = None, sDay = None,
sHour = None, sMin = None, sSec = None, sCentisec = None,
eYear = None, eMonth = None, eDay = None,
eHour = None, eMin = None, eSec = None, eCentisec = None,
text = None,
madInstObj = None, modexpDesc = '',
expNotesLines=None):
"""__init__ creates a MadrigalCatalogRecord.
Note: all inputs have default values because there are two ways to populate this structure:
1) with all inputs from kinst to text when new data is being created, or
2) with catalog line list from existing Hdf5 file Experiment Notes metadata, plus non-default inputs
Inputs:
kinst - the kind of instrument code. A warning will be raised if not in instTab.txt.
modexp - Code to indicate experimental mode employed. Must be a non-negative integer.
sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec - experiment start time. sCentisec must be 0-99
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec - experiment end time. eCentisec must be 0-99
text - string containing text in catalog record. Length must be divisible by 80. No linefeeds
allowed.
madInstObj - a madrigal.metadata.MadrigalInstrument object. If None, one will be created.
Used to verify kinst.
modexpDesc - string describing the modexp
expNotesList - a list of all lines in an existing catalog section "Experiment Notes"
metadata table. All the above attributes are parsed from these lines.
Outputs: None
Returns: None
"""
# create any needed Madrigal objects, if not passed in
if madInstObj is None:
self._madInstObj = madrigal.metadata.MadrigalInstrument()
else:
self._madInstObj = madInstObj
if expNotesLines != None:
# get all information from this dataset
self._parseExpNotesLines(expNotesLines)
if not kinst is None:
# kinst set via catalog record overrides kinst argument
try:
self.getKinst()
except AttributeError:
self.setKinst(kinst)
# verify kinst set, or raise error
try:
self.getKinst()
except AttributeError:
raise ValueError('kinst not set when MadrigalCatalogRecord created - required')
if not modexp is None:
self.setModexp(modexp)
if len(modexpDesc) > 0:
self.setModexpDesc(modexpDesc)
try:
self.setTimeLists(sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec,
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec)
except:
pass
if not text is None:
self.setText(text)
def getType(self):
""" returns the type 'catalog'"""
return 'catalog'
def getKinst(self):
"""getKinst returns the kind of instrument code (int) for a given catalog record.
Inputs: None
Outputs: the kind of instrument code (int) for a given catalog record.
"""
return(self._kinst)
def setKinst(self, kinst):
"""setKinst sets the kind of instrument code (int) for a given catalog record.
Inputs: kind of instrument code (integer)
Outputs: None
Affects: sets the kind of instrument code (int) (self._kinst) for a given catalog record.
Prints warning if kinst not found in instTab.txt
"""
kinst = int(kinst)
# verify and set kinst
instList = self._madInstObj.getInstrumentList()
found = False
for inst in instList:
if inst[2] == kinst:
self._instrumentName = inst[0]
found = True
break
if found == False:
self._instrumentName = 'Unknown instrument'
sys.stderr.write('Warning: kinst %i not found in instTab.txt\n' % (kinst))
self._kinst = kinst
def getModexp(self):
"""getModexp returns the mode of experiment code (int) for a given catalog record.
Inputs: None
Outputs: the mode of experiment code (int) for a given catalog record. Returns -1 if not set
"""
try:
return(self._modexp)
except AttributeError:
return(-1)
def setModexp(self, modexp):
"""setModexp sets the mode of experiment code (int) for a given catalog record.
Inputs: the mode of experiment code (int)
Outputs: None
Affects: sets the mode of experiment code (int) (self._modexp)
"""
self._modexp = int(modexp)
def getModexpDesc(self):
"""getModexp returns the description of the mode of experiment code for a given catalog record.
Inputs: None
Outputs: the description of the mode of experiment code for a given catalog record (string).
Returns empty string if not set
"""
try:
return(self._modexpDesc)
except AttributeError:
return('')
def setModexpDesc(self, modexpDesc):
"""setModexpDesc sets the description of the mode of experiment code for a given catalog record.
Inputs: the description mode of experiment code (string)
Outputs: None
Affects: sets the description of the mode of experiment code (string) (self._modexpDesc)
"""
self._modexpDesc = str(modexpDesc)
def getText(self):
"""getText returns the catalog text.
Inputs: None
Outputs: the catalog text.
"""
return(self._text)
def getTextLineCount(self):
"""getTextLineCount returns the number of 80 character lines in self._text
"""
return(len(self._text) / 80)
def setText(self, text):
"""setText sets the catalog text.
Inputs: text: text to be set. Must be length divisible by 80, and not contain line feeds.
Outputs: None.
Affects: sets self._text
Raise TypeError if problem with test
"""
if type(text) != str:
raise TypeError('text must be of type string')
if len(text) % 80 != 0:
raise TypeError('text length must be divisible by 80: len is %i' % (len(text)))
if text.find('\n') != -1:
raise TypeError('text must not contain linefeed character')
self._text = text
def getStartTimeList(self):
"""getStartTimeList returns a tuple containing sYear, sMonth, sDay, sHour, sMin, sSec, and sCentisec
Inputs: None
Outputs: a tuple containing sYear, sMonth, sDay, sHour, sMin, sSec, and sCentisec.
"""
return((self._sYear,
self._sMonth,
self._sDay,
self._sHour,
self._sMin,
self._sSec,
self._sCentisec))
def getEndTimeList(self):
"""getEndTimeList returns a tuple containing eYear, eMonth, eDay, eHour, eMin, eSec, and eCentisec
Inputs: None
Outputs: a tuple containing eYear, eMonth, eDay, eHour, eMin, eSec, and eCentisec.
"""
return((self._eYear,
self._eMonth,
self._eDay,
self._eHour,
self._eMin,
self._eSec,
self._eCentisec))
def setTimeLists(self, sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec,
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec):
"""setTimeList resets start and end times
Inputs:
sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec - experiment start time. sCentisec must be 0-99
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec - experiment end time. eCentisec must be 0-99
Outputs: None
Affects: sets all time attributes (see code).
Exceptions: Raises ValueError if startTime > endTime
"""
# verify times
sTime = datetime.datetime(sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec*10000)
eTime = datetime.datetime(eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec*10000)
if eTime < sTime:
raise ValueError('Starting time cannot be after ending time')
self._sTime = madrigal.metadata.getMadrigalUTFromDT(sTime)
self._eTime = madrigal.metadata.getMadrigalUTFromDT(eTime)
self._sYear = sYear
self._sMonth = sMonth
self._sDay = sDay
self._sHour = sHour
self._sMin = sMin
self._sSec = sSec
self._sCentisec = sCentisec
self._eYear = eYear
self._eMonth = eMonth
self._eDay = eDay
self._eHour = eHour
self._eMin = eMin
self._eSec = eSec
self._eCentisec = eCentisec
def getLines(self):
"""getLines returns a numpy recarray of the format expected by the "Experiment Notes" dataset
"""
# templates
kreccStr = 'KRECC 2001 Catalogue Record, Version 1'
kinstTempStr = 'KINSTE %i %s'
modExpTempStr = 'MODEXP %i %s'
byearTempStr = 'IBYRT %04i Beginning year'
bmdTempStr = 'IBDTT %04i Beginning month and day'
bhmTempStr = 'IBHMT %04i Beginning UT hour and minute'
bcsTempStr = 'IBCST %04i Beginning centisecond'
eyearTempStr = 'IEYRT %04i Ending year'
emdTempStr = 'IEDTT %04i Ending month and day'
ehmTempStr = 'IEHMT %04i Ending UT hour and minute'
ecsTempStr = 'IECST %04i Ending centisecond'
numLines = int(self.getTextLineCount() + 12) # 8 times lines, KRECC, KINSTE, MODEXP, and final blank
textArr = numpy.recarray((numLines,), dtype=[('File Notes', h5py.special_dtype(vlen=str))])
for i in range(numLines-9):
if i == 0:
textArr[i]['File Notes'] = kreccStr + ' ' * (80 - len(kreccStr))
elif i == 1:
kinstName = self._madInstObj.getInstrumentName(self.getKinst())
kinstStr = kinstTempStr % (self.getKinst(), kinstName)
if len(kinstStr) > 80:
kinstStr = kinstStr[:80]
textArr[i]['File Notes'] = kinstStr + ' ' * (80 - len(kinstStr))
elif i == 2:
modExpStr = modExpTempStr % (self.getModexp(), self.getModexpDesc())
if len(modExpStr) > 80:
modExpStr = modExpStr[:80]
textArr[i]['File Notes'] = modExpStr + ' ' * (80 - len(modExpStr))
else:
textArr[i]['File Notes'] = self.getText()[(i-3)*80:(i-2)*80]
# finally add time lines
sYear, sMonth, sDay, sHour, sMin, sSec, sCentisec = self.getStartTimeList()
eYear, eMonth, eDay, eHour, eMin, eSec, eCentisec = self.getEndTimeList()
ibdtt = sMonth*100 + sDay
ibhmt = sHour*100 + sMin
ibcst = sSec*100 + sCentisec
iedtt = eMonth*100 + eDay
iehmt = eHour*100 + eMin
iecst = eSec*100 + eCentisec
sYearStr = byearTempStr % (sYear)
textArr[i+1]['File Notes'] = sYearStr + ' ' * (80 - len(sYearStr))
sMDStr = bmdTempStr % (ibdtt)
textArr[i+2]['File Notes'] = sMDStr + ' ' * (80 - len(sMDStr))
sHMStr = bhmTempStr % (ibhmt)
textArr[i+3]['File Notes'] = sHMStr + ' ' * (80 - len(sHMStr))
sCSStr = bcsTempStr % (ibcst)
textArr[i+4]['File Notes'] = sCSStr + ' ' * (80 - len(sCSStr))
eYearStr = eyearTempStr % (eYear)
textArr[i+5]['File Notes'] = eYearStr + ' ' * (80 - len(eYearStr))
eMDStr = emdTempStr % (iedtt)
textArr[i+6]['File Notes'] = eMDStr + ' ' * (80 - len(eMDStr))
eHMStr = ehmTempStr % (iehmt)
textArr[i+7]['File Notes'] = eHMStr + ' ' * (80 - len(eHMStr))
eCSStr = ecsTempStr % (iecst)
textArr[i+8]['File Notes'] = eCSStr + ' ' * (80 - len(eCSStr))
textArr[i+9]['File Notes'] = ' ' * 80
return(textArr)
def _parseExpNotesLines(self, expNotesLines):
"""_parseExpNotesLines populates all attributes in MadrigalCatalogRecord
from text from metadata table "Experiment Notes"
"""
if len(expNotesLines) % 80 != 0:
raise ValueError('Len of expNotesLines must be divisible by 80, len %i is not' % (len(expNotesLines)))
self._text = '' # init to empty
self._modexpDesc = ''
self._modexp = 0
delimiter = ' '
# default times
bsec = 0
bcsec = 0
esec = 0
ecsec = 0
for i in range(int(len(expNotesLines) / 80)):
line = expNotesLines[i*80:(i+1)*80]
items = line.split()
if len(items) == 0:
# blank line
self.setText(self.getText() + line)
continue
elif items[0].upper() == 'KRECC':
# ignore
continue
elif items[0].upper() == 'KINSTE':
self.setKinst(int(items[1]))
elif items[0].upper() == 'MODEXP':
try:
self.setModexp(int(items[1]))
except:
self.setModexp(0)
if len(items) > 2:
self.setModexpDesc(delimiter.join(items[2:]))
# start time
elif items[0].upper() == 'IBYRE':
byear = int(items[1])
elif items[0].upper() == 'IBDTE':
ibdte = int(items[1])
bmonth = ibdte / 100
bday = ibdte % 100
elif items[0].upper() == 'IBHME':
ibhme = int(items[1])
bhour = ibhme / 100
bmin = ibhme % 100
elif items[0].upper() == 'IBCSE':
ibcse = int(float(items[1]))
bsec = ibcse / 100
bcsec = ibcse % 100
# end time
elif items[0].upper() == 'IEYRE':
eyear = int(items[1])
elif items[0].upper() == 'IEDTE':
iedte = int(items[1])
emonth = iedte / 100
eday = iedte % 100
elif items[0].upper() == 'IEHME':
iehme = int(items[1])
ehour = iehme / 100
emin = iehme % 100
elif items[0].upper() == 'IECSE':
iecse = int(float(items[1]))
esec = iecse / 100
ecsec = iecse % 100
else:
self.setText(self.getText() + line)
try:
# set times
self.setTimeLists(byear, bmonth, bday, bhour, bmin, bsec, bcsec,
eyear, emonth, eday, ehour, emin, esec, ecsec)
except:
pass
def __str__(self):
""" returns a string representation of a MadrigalCatalogRecord """
retStr = 'Catalog Record:\n'
retStr += 'kinst = %i (%s)\n' % (self._kinst, self._instrumentName)
retStr += 'modexp = %i\n' % (self._modexp)
retStr += 'record start: %04i-%02i-%02i %02i:%02i:%02i.%02i\n' % (self._sYear,
self._sMonth,
self._sDay,
self._sHour,
self._sMin,
self._sSec,
self._sCentisec)
retStr += 'record end: %04i-%02i-%02i %02i:%02i:%02i.%02i\n' % (self._eYear,
self._eMonth,
self._eDay,
self._eHour,
self._eMin,
self._eSec,
self._eCentisec)
for i in range(0, len(self._text) -1, 80):
retStr += '%s\n' % (self._text[i:i+80])
return(retStr)
def __cmp__(self, other):
"""cmpRecords compares two cedar records to allow them to be sorted
"""
if other is None:
return(1)
# compare record start times
fList = self.getStartTimeList()
sList = other.getStartTimeList()
fDT = datetime.datetime(*fList)
sDT = datetime.datetime(*sList)
result = cmp(fDT, sDT)
if result:
return(result)
# compare record type
typeList = ('catalog', 'header', 'data')
fType = self.getType()
sType = other.getType()
result = cmp(typeList.index(fType), typeList.index(sType))
if result:
return(result)
# compare record stop times
fList = self.getEndTimeList()
sList = other.getEndTimeList()
fDT = datetime.datetime(*fList)
sDT = datetime.datetime(*sList)
result = cmp(fDT, sDT)
if result:
return(result)
# compare kindat if both data
if fType == 'data' and sType == 'data':
result = cmp(self.getKindat(), other.getKindat())
if result:
return(result)
return(0)
class MadrigalHeaderRecord:
"""MadrigalHeaderRecord holds all the information in a Cedar header record."""
def __init__(self, kinst = None,
kindat = None,
sYear = None, sMonth = None, sDay = None,
sHour = None, sMin = None, sSec = None, sCentisec = None,
eYear = None, eMonth = None, eDay = None,
eHour = None, eMin = None, eSec = None, eCentisec = None,
jpar = None, mpar = None,
text = None,
madInstObj = None, madKindatObj = None,
expNotesLines=None):
"""__init__ creates a MadrigalCatalogRecord.
Note: all inputs have default values because there are two ways to populate this structure:
1) with all inputs from kinst to text when new data is being created, or
2) with catalog line list from existing Hdf5 file Experiment Notes metadata, plus non-default inputs
Inputs:
kinst - the kind of instrument code. A warning will be raised if not in instTab.txt.
kindat - kind of data code. Must be a non-negative integer.
sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec - experiment start time. sCentisec must be 0-99
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec - experiment end time. eCentisec must be 0-99
jpar - the number of 1d parameters in the following data records
mpar - the number of 2d parameters in the following data records
text - string containing text in catalog record. Length must be divisible by 80. No linefeeds
allowed.
madInstObj - a madrigal.metadata.MadrigalInstrument object. If None, one will be created.
Used to verify kinst.
madKindatObj - a madrigal.metadata.MadrigalKindat onject. If none, one will be created.
Used to verify kindat.
expNotesList - a list of all lines in an existing header section in "Experiment Notes"
metadata table. All the above attributes are parsed from these lines.
Outputs: None
Returns: None
"""
# create any needed Madrigal objects, if not passed in
if madInstObj is None:
self._madInstObj = madrigal.metadata.MadrigalInstrument()
else:
self._madInstObj = madInstObj
if madKindatObj is None:
self._madKindatObj = madrigal.metadata.MadrigalKindat()
else:
self._madKindatObj = madKindatObj
if expNotesLines != None:
# get all information from this dataset
self._parseExpNotesLines(expNotesLines)
if not kinst is None:
# kinst set via header record overrides kinst argument
try:
self.getKinst()
except AttributeError:
self.setKinst(kinst)
# verify kinst set, or raise error
try:
self.getKinst()
except AttributeError:
raise ValueError('kinst not set when MadrigalHeaderRecord created - required')
if not kindat is None:
# kindat set via header record overrides kindat argument
try:
self.getKindat()
except AttributeError:
self.setKindat(kindat)
try:
self.setTimeLists(sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec,
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec)
except:
pass
if not jpar is None:
self.setJpar(jpar)
if not mpar is None:
self.setMpar(mpar)
if not text is None:
self.setText(text)
def getType(self):
""" returns the type 'header'"""
return 'header'
def getKinst(self):
"""getKinst returns the kind of instrument code (int) for a given header record.
Inputs: None
Outputs: the kind of instrument code (int) for a given header record.
"""
return(self._kinst)
def setKinst(self, kinst):
"""setKinst sets the kind of instrument code (int) for a given header record.
Inputs: kind of instrument code (integer)
Outputs: None
Affects: sets the kind of instrument code (int) (self._kinst) for a given header record.
Prints warning if kinst not found in instTab.txt
"""
kinst = int(kinst)
# verify and set kinst
instList = self._madInstObj.getInstrumentList()
found = False
for inst in instList:
if inst[2] == kinst:
self._instrumentName = inst[0]
found = True
break
if found == False:
self._instrumentName = 'Unknown instrument'
sys.stderr.write('Warning: kinst %i not found in instTab.txt\n' % (kinst))
self._kinst = kinst
def getKindat(self):
"""getKindat returns the kind of data code (int) for a given header record.
Inputs: None
Outputs: the kind of data code (int) for a given header record.
"""
return(self._kindat)
def setKindat(self, kindat):
"""setKindat sets the mode of kind of data code (int) for a given header record.
Inputs: the kind of data code (int)
Outputs: None
Affects: sets the kind of data code (int) (self._kindat)
Exceptions: Raises ValueError if kindat less than 0
"""
self._kindat = int(kindat)
if self._kindat < 0:
raise ValueError('kindat must not be less than 0, not %i' % (self._kindat))
def getText(self):
"""getText returns the header text.
Inputs: None
Outputs: the header text.
"""
return(self._text)
def getTextLineCount(self):
"""getTextLineCount returns the number of 80 character lines in self._text
"""
if len(self._text) % 80 == 0:
return(int(len(self._text) / 80))
else:
return(int(1 + int(len(self._text) / 80)))
def setText(self, text):
"""setText sets the header text.
Inputs: text: text to be set. Must be length divisible by 80, and not contain line feeds.
For now, must not exceed 2^16 - 80 bytes to be able to be handled by Cedar format.
Outputs: None.
Affects: sets self._text
Raises TypeError if problem with text
"""
textTypes = [str]
if type(text) not in textTypes:
raise TypeError('text must be of type string')
if len(text) % 80 != 0:
raise TypeError('text length must be divisible by 80: len is %i' % (len(text)))
if text.find('\n') != -1:
raise TypeError('text must not contain linefeed character')
if len(text) > 65536 - 80:
raise TypeError('text exceeds ability of Cedar format to store')
self._text = text
def getJpar(self):
"""returns the number of one-dimensional parameters in the associated data records.
"""
return self._jpar
def setJpar(self, jpar):
""" set the number of one-dimensional parameters in the associated data records.
Must not be negative.
"""
self._jpar = int(jpar)
if self._jpar < 0:
raise TypeError('jpar must not be less than 0')
def getMpar(self):
"""returns the number of two-dimensional parameters in the associated data records.
"""
return self._mpar
def setMpar(self, mpar):
""" set the number of two-dimensional parameters in the associated data records.
Must not be negative.
"""
self._mpar = int(mpar)
if self._mpar < 0:
raise TypeError('mpar must not be less than 0')
def getStartTimeList(self):
"""getStartTimeList returns a tuple containing sYear, sMonth, sDay, sHour, sMin, sSec, and sCentisec
Inputs: None
Outputs: a tuple containing sYear, sMonth, sDay, sHour, sMin, sSec, and sCentisec.
"""
return((self._sYear,
self._sMonth,
self._sDay,
self._sHour,
self._sMin,
self._sSec,
self._sCentisec))
def getEndTimeList(self):
"""getEndTimeList returns a tuple containing eYear, eMonth, eDay, eHour, eMin, eSec, and eCentisec
Inputs: None
Outputs: a tuple containing eYear, eMonth, eDay, eHour, eMin, eSec, and eCentisec.
"""
return((self._eYear,
self._eMonth,
self._eDay,
self._eHour,
self._eMin,
self._eSec,
self._eCentisec))
def getLines(self):
"""getLines returns a numpy recarray of the format expected by the "Experiment Notes" dataset
"""
# templates
krechStr = 'KRECH 3002 Header Record, Version 3'
kinstTempStr = 'KINST %i %s'
kindatTempStr = 'KINDAT %i %s'
byearTempStr = 'IBYRT %04i Beginning year'
bmdTempStr = 'IBDTT %04i Beginning month and day'
bhmTempStr = 'IBHMT %04i Beginning UT hour and minute'
bcsTempStr = 'IBCST %04i Beginning centisecond'
eyearTempStr = 'IEYRT %04i Ending year'
emdTempStr = 'IEDTT %04i Ending month and day'
ehmTempStr = 'IEHMT %04i Ending UT hour and minute'
ecsTempStr = 'IECST %04i Ending centisecond'
numLines = int(self.getTextLineCount() + 12) # 8 times lines, KRECH, KINST, KINDAT, and final blank
textArr = numpy.recarray((numLines,), dtype=[('File Notes', h5py.special_dtype(vlen=str))])
for i in range(numLines-9):
if i == 0:
textArr[i]['File Notes'] = krechStr + ' ' * (80 - len(krechStr))
elif i == 1:
kinstName = self._madInstObj.getInstrumentName(self.getKinst())
kinstStr = kinstTempStr % (self.getKinst(), kinstName)
if len(kinstStr) > 80:
kinstStr = kinstStr[:80]
textArr[i]['File Notes'] = kinstStr + ' ' * (80 - len(kinstStr))
elif i == 2:
kindatStr = kindatTempStr % (self.getKindat(),
self._madKindatObj.getKindatDescription(self.getKindat(),
self.getKinst()))
if len(kindatStr) > 80:
kindatStr = kindatStr[:80]
textArr[i]['File Notes'] = kindatStr + ' ' * (80 - len(kindatStr))
else:
textArr[i]['File Notes'] = self.getText()[(i-3)*80:(i-2)*80]
# finally add time lines
sYear, sMonth, sDay, sHour, sMin, sSec, sCentisec = self.getStartTimeList()
eYear, eMonth, eDay, eHour, eMin, eSec, eCentisec = self.getEndTimeList()
ibdtt = sMonth*100 + sDay
ibhmt = sHour*100 + sMin
ibcst = sSec*100 + sCentisec
iedtt = eMonth*100 + eDay
iehmt = eHour*100 + eMin
iecst = eSec*100 + eCentisec
sYearStr = byearTempStr % (sYear)
textArr[i+1]['File Notes'] = sYearStr + ' ' * (80 - len(sYearStr))
sMDStr = bmdTempStr % (ibdtt)
textArr[i+2]['File Notes'] = sMDStr + ' ' * (80 - len(sMDStr))
sHMStr = bhmTempStr % (ibhmt)
textArr[i+3]['File Notes'] = sHMStr + ' ' * (80 - len(sHMStr))
sCSStr = bcsTempStr % (ibcst)
textArr[i+4]['File Notes'] = sCSStr + ' ' * (80 - len(sCSStr))
eYearStr = eyearTempStr % (eYear)
textArr[i+5]['File Notes'] = eYearStr + ' ' * (80 - len(eYearStr))
eMDStr = emdTempStr % (iedtt)
textArr[i+6]['File Notes'] = eMDStr + ' ' * (80 - len(eMDStr))
eHMStr = ehmTempStr % (iehmt)
textArr[i+7]['File Notes'] = eHMStr + ' ' * (80 - len(eHMStr))
eCSStr = ecsTempStr % (iecst)
textArr[i+8]['File Notes'] = eCSStr + ' ' * (80 - len(eCSStr))
textArr[i+9]['File Notes'] = ' ' * 80
return(textArr)
def setTimeLists(self, sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec,
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec):
"""setTimeList resets start and end times
Inputs:
sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec - experiment start time. sCentisec must be 0-99
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec - experiment end time. eCentisec must be 0-99
Outputs: None
Affects: sets all time attributes (see code).
Exceptions: Raises ValueError if startTime > endTime
"""
# verify times
sTime = datetime.datetime(sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec*10000)
eTime = datetime.datetime(eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec*10000)
if eTime < sTime:
raise ValueError('Starting time cannot be after ending time')
self._sTime = madrigal.metadata.getMadrigalUTFromDT(sTime)
self._eTime = madrigal.metadata.getMadrigalUTFromDT(eTime)
self._sYear = sYear
self._sMonth = sMonth
self._sDay = sDay
self._sHour = sHour
self._sMin = sMin
self._sSec = sSec
self._sCentisec = sCentisec
self._eYear = eYear
self._eMonth = eMonth
self._eDay = eDay
self._eHour = eHour
self._eMin = eMin
self._eSec = eSec
self._eCentisec = eCentisec
def _parseExpNotesLines(self, expNotesLines):
"""_parseExpNotesLines populates all attributes in MadrigalHeaderRecord
from text from metadata table "Experiment Notes"
"""
if len(expNotesLines) % 80 != 0:
raise ValueError('Len of expNotesLines must be divisible by 80, len %i is not' % (len(expNotesLines)))
self._text = '' # init to empty
delimiter = ' '
# default times
byear = None # to verify lines found
addItem = 0 # check for the case where there is a addition item in front of date field
bsec = 0
bcsec = 0
esec = 0
ecsec = 0
for i in range(int(len(expNotesLines) / 80)):
line = expNotesLines[i*80:(i+1)*80]
items = line.split()
if len(items) == 0:
# blank line
self.setText(self.getText() + line)
continue
elif items[0].upper() == 'KRECH':
# ignore
continue
elif items[0].upper() == 'KINST':
if int(items[1]) != 3:
self.setKinst(int(items[1]))
else:
self.setKinst(int(items[2]))
elif items[0].upper() == 'KINDAT':
try:
if int(items[1]) != 4:
self.setKindat(int(items[1]))
else:
self.setKindat(int(items[2]))
except:
self.setKindat(0)
# start time
elif items[0].upper() == 'IBYRT':
byear = int(items[1+addItem])
if byear < 1950:
# wrong column parsed
addItem = 1
byear = int(items[1+addItem])
elif items[0].upper() in ('IBDTT', 'IBDT'):
ibdte = int(items[1+addItem])
bmonth = ibdte / 100
bday = ibdte % 100
elif items[0].upper() == 'IBHMT':
ibhme = int(items[1+addItem])
bhour = ibhme / 100
bmin = ibhme % 100
elif items[0].upper() == 'IBCST':
ibcse = int(float(items[1+addItem]))
bsec = ibcse / 100
bcsec = ibcse % 100
# end time
elif items[0].upper() == 'IEYRT':
eyear = int(items[1+addItem])
elif items[0].upper() in ('IEDTT', 'IEDT'):
iedte = int(items[1+addItem])
emonth = iedte / 100
eday = iedte % 100
elif items[0].upper() == 'IEHMT':
iehme = int(items[1+addItem])
ehour = iehme / 100
emin = iehme % 100
elif items[0].upper() == 'IECST':
iecse = int(float(items[1+addItem]))
esec = iecse / 100
ecsec = iecse % 100
else:
self.setText(self.getText() + line)
try:
# set times
self.setTimeLists(byear, bmonth, bday, bhour, bmin, bsec, bcsec,
eyear, emonth, eday, ehour, emin, esec, ecsec)
except:
pass
def __str__(self):
""" returns a string representation of a MadrigalHeaderRecord """
retStr = 'Header Record:\n'
retStr += 'kinst = %i (%s)\n' % (self._kinst, self._instrumentName)
retStr += 'kindat = %i\n' % (self._kindat)
retStr += 'record start: %04i-%02i-%02i %02i:%02i:%02i.%02i\n' % (self._sYear,
self._sMonth,
self._sDay,
self._sHour,
self._sMin,
self._sSec,
self._sCentisec)
retStr += 'record end: %04i-%02i-%02i %02i:%02i:%02i.%02i\n' % (self._eYear,
self._eMonth,
self._eDay,
self._eHour,
self._eMin,
self._eSec,
self._eCentisec)
retStr += 'jpar = %i, mpar = %i' % (self._jpar, self._mpar)
for i in range(0, len(self._text) -1, 80):
retStr += '%s\n' % (self._text[i:i+80])
return(retStr)
def __cmp__(self, other):
"""cmpRecords compares two cedar records to allow them to be sorted
"""
if other is None:
return(1)
# compare record start times
fList = self.getStartTimeList()
sList = other.getStartTimeList()
fDT = datetime.datetime(*fList)
sDT = datetime.datetime(*sList)
result = cmp(fDT, sDT)
if result:
return(result)
# compare record type
typeList = ('catalog', 'header', 'data')
fType = self.getType()
sType = other.getType()
result = cmp(typeList.index(fType), typeList.index(sType))
if result:
return(result)
# compare record stop times
fList = self.getEndTimeList()
sList = other.getEndTimeList()
fDT = datetime.datetime(*fList)
sDT = datetime.datetime(*sList)
result = cmp(fDT, sDT)
if result:
return(result)
# compare kindat if both data
if fType == 'data' and sType == 'data':
result = cmp(self.getKindat(), other.getKindat())
if result:
return(result)
return(0)
class CatalogHeaderCreator:
"""CatalogHeaderCreator is a class that automates the creation of catalog and header records
This class creates and adds catalog and header records that meet the Cedar standards. It does this
by examining the input Cedar file for all summary information possible. The user needs only
add text that describes their experiment. A Cedar file must already be written to disk before
this class is created.
"""
def __init__(self, madFilename):
"""__init__ reads in all summary information about madFilename using madrigal.data
"""
self._madFilename = madFilename
self._summary = madrigal.data.MadrigalFile(self._madFilename)
self._cedar = MadrigalCedarFile(madFilename, maxRecords=3) # parse small part of file into MadrigalCedarFile object
# create default header and catalog records
self._header = None
self._catalog = None
self._lineLen = 80
def createCatalog(self, principleInvestigator=None,
expPurpose=None,
expMode=None,
cycleTime=None,
correlativeExp=None,
sciRemarks=None,
instRemarks=None):
"""createCatalog will create a catalog record appropriate for this file. The additional
information fields are all optional, and are all simple text strings (except for
cycleTime, which is in minutes). If the text contains line feeds, those will be used
as line breaks in the catalog record.
The descriptions of these fields all come from Barbara Emery's documentation
cedarFormat.pdf
Inputs:
principleInvestigator - Names of responsible Principal Investigator(s) or others knowledgeable
about the experiment.
expPurpose - Brief description of the experiment purpose
expMode - Further elaboration of meaning of MODEXP; e.g. antenna patterns and
pulse sequences.
cycleTime - Minutes for one full measurement cycle
correlativeExp - Correlative experiments (experiments with related data)
sciRemarks - scientific remarks
instRemarks - instrument remarks
Returns: None
Affects: sets self._catalog
"""
# the first step is to create the text part
text = ''
# start with parameter summary lines
if cycleTime != None:
text += 'TIMCY %9i minutes' % (int(cycleTime))
text = self._padStr(text, self._lineLen)
text += self._createMaxMinSummaryLines()
# add the time lines
text += self._createCatalogTimeSection()[0]
# then add any text from input arguments
if expMode != None:
text += self._createCedarLines('CMODEXP ', expMode)
if expPurpose != None:
text += self._createCedarLines('CPURP ', expPurpose)
if correlativeExp != None:
text += self._createCedarLines('CCOREXP ', correlativeExp)
if sciRemarks != None:
text += self._createCedarLines('CSREM ', sciRemarks)
if instRemarks != None:
text += self._createCedarLines('CIREM ', instRemarks)
if principleInvestigator != None:
text += self._createCedarLines('CPI ', principleInvestigator)
# get some other metadata
kinst = self._summary.getKinstList()[0]
modexp = self._summary.getKindatList()[0]
sYear,sMonth,sDay,sHour,sMin,sSec = self._summary.getEarliestTime()
sCentisec = 0
eYear,eMonth,eDay,eHour,eMin,eSec = self._summary.getLatestTime()
eCentisec = 0
# now create the catalog record
self._catalog = MadrigalCatalogRecord(kinst, modexp,
sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec,
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec,
text)
def createHeader(self, kindatDesc=None,
analyst=None,
comments=None,
history=None):
"""createHeader will create a header record appropriate for this file. The additional
information fields are all optional, and are all simple text strings. If the text contains l
ine feeds, those will be used as line breaks in the header record.
Inputs:
kindatDesc - description of how this data was analyzed (the kind of data)
analyst - name of person who analyzed this data
comments - additional comments about data (describe any instrument-specific parameters)
history - a description of the history of the processing of this file
Returns: None
Affects: sets self._header
"""
# the first step is to create the text part
text = ''
if kindatDesc != None:
text += self._createCedarLines('CKINDAT', kindatDesc)
if history != None:
text += self._createCedarLines('CHIST ', history)
# add the time lines
text += self._createHeaderTimeSection()[0]
# add the KOD linesfrom the last record of file (must be data record)
text += self._cedar[-1].getHeaderKodLines()
if comments != None:
text += self._createCedarLines('C ', comments)
if analyst != None:
text += self._createCedarLines('CANALYST', analyst)
# last - time of analysis line
now = datetime.datetime.utcnow()
nowStr = now.strftime('%a %b %d %H:%M:%S %Y')
text += 'CANDATE %s UT' % (nowStr)
text = self._padStr(text, self._lineLen)
# get some other metadata
kinst = self._summary.getKinstList()[0]
kindat = self._summary.getKindatList()[0]
sYear,sMonth,sDay,sHour,sMin,sSec = self._summary.getEarliestTime()
sCentisec = 0
eYear,eMonth,eDay,eHour,eMin,eSec = self._summary.getLatestTime()
eCentisec = 0
jpar = len(self._cedar[-1].get1DParms())
mpar = len(self._cedar[-1].get2DParms())
self._header = MadrigalHeaderRecord(kinst, kindat,
sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec,
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec,
jpar, mpar, text)
def write(self, newFilename=None):
"""write will output the new file with prepended catalog and header records
Raises an IOError if no new catalog or header records to prepend
Inputs:
newFilename - if None, overwrite original file
"""
if self._catalog is None and self._header is None:
raise IOError('Does not make sense to save a new file if no catalog or header has been added')
if self._header != None:
self._cedar.insert(0, self._header)
if self._catalog != None:
self._cedar.insert(0, self._catalog)
if newFilename is None:
newFilename = self._madFilename
else:
shutil.copy(self._madFilename, newFilename)
# open file for appening
with h5py.File(newFilename, 'a') as f:
metadata = f['Metadata']
self._cedar.writeExperimentNotes(metadata, False)
def _createCatalogTimeSection(self):
"""_createCatalogTimeSection will return all the lines in the catalog record that
describe the start and end time of the data records.
Inputs: None
Returns: a tuple with three items 1) a string in the format of the time section of a
catalog record, 2) earliest datetime, 3) latest datetime
"""
earliestStartTimeList = self._summary.getEarliestTime()
earliestStartTime = datetime.datetime(*earliestStartTimeList)
latestEndTimeList = self._summary.getLatestTime()
latestEndTime = datetime.datetime(*latestEndTimeList)
sy = 'IBYRE %4s Beginning year' % (str(earliestStartTime.year))
sd = 'IBDTE %4s Beginning month and day' % (str(earliestStartTime.month*100 + \
earliestStartTime.day))
sh = 'IBHME %4s Beginning UT hour and minute' % (str(earliestStartTime.hour*100 + \
earliestStartTime.minute))
totalCS = earliestStartTime.second*100 + (earliestStartTime.microsecond/10000)
ss = 'IBCSE %4s Beginning centisecond' % (str(totalCS))
ey = 'IEYRE %4s Ending year' % (str(latestEndTime.year))
ed = 'IEDTE %4s Ending month and day' % (str(latestEndTime.month*100 + \
latestEndTime.day))
eh = 'IEHME %4s Ending UT hour and minute' % (str(latestEndTime.hour*100 + \
latestEndTime.minute))
totalCS = latestEndTime.second*100 + (latestEndTime.microsecond/10000)
es = 'IECSE %4s Ending centisecond' % (str(totalCS))
retStr = ''
retStr += sy + (80-len(sy))*' '
retStr += sd + (80-len(sd))*' '
retStr += sh + (80-len(sh))*' '
retStr += ss + (80-len(ss))*' '
retStr += ey + (80-len(ey))*' '
retStr += ed + (80-len(ed))*' '
retStr += eh + (80-len(eh))*' '
retStr += es + (80-len(es))*' '
return((retStr, earliestStartTime, latestEndTime))
def _createHeaderTimeSection(self, dataRecList=None):
"""_createHeaderTimeSection will return all the lines in the header record that
describe the start and end time of the data records.
Inputs:
dataRecList - if given, examine only those MadrigalDataRecords in dataRecList.
If None (the default), examine all MadrigalDataRecords in this
MadrigalCedarFile
Returns: a tuple with three items 1) a string in the format of the time section of a
header record, 2) earliest datetime, 3) latest datetime
"""
earliestStartTimeList = self._summary.getEarliestTime()
earliestStartTime = datetime.datetime(*earliestStartTimeList)
latestEndTimeList = self._summary.getLatestTime()
latestEndTime = datetime.datetime(*latestEndTimeList)
sy = 'IBYRT %4s Beginning year' % (str(earliestStartTime.year))
sd = 'IBDTT %4s Beginning month and day' % (str(earliestStartTime.month*100 + \
earliestStartTime.day))
sh = 'IBHMT %4s Beginning UT hour and minute' % (str(earliestStartTime.hour*100 + \
earliestStartTime.minute))
totalCS = earliestStartTime.second*100 + (earliestStartTime.microsecond/10000)
ss = 'IBCST %4s Beginning centisecond' % (str(totalCS))
ey = 'IEYRT %4s Ending year' % (str(latestEndTime.year))
ed = 'IEDTT %4s Ending month and day' % (str(latestEndTime.month*100 + \
latestEndTime.day))
eh = 'IEHMT %4s Ending UT hour and minute' % (str(latestEndTime.hour*100 + \
latestEndTime.minute))
totalCS = latestEndTime.second*100 + (latestEndTime.microsecond/10000)
es = 'IECST %4s Ending centisecond' % (str(totalCS))
retStr = ''
retStr += sy + (80-len(sy))*' '
retStr += sd + (80-len(sd))*' '
retStr += sh + (80-len(sh))*' '
retStr += ss + (80-len(ss))*' '
retStr += ey + (80-len(ey))*' '
retStr += ed + (80-len(ed))*' '
retStr += eh + (80-len(eh))*' '
retStr += es + (80-len(es))*' '
return((retStr, earliestStartTime, latestEndTime))
def _createMaxMinSummaryLines(self):
"""_createMaxMinSummaryLines is a private method that creates the max and min summary
lines (e.g., alt, gdlat, etc)
"""
alt1 = 'ALT1 %11i km. Lowest altitude measured'
alt2 = 'ALT2 %11i km. Highest altitude measured'
lat1 = 'GGLAT1 %6i degrees. Lowest geographic latitude measured'
lat2 = 'GGLAT2 %6i degrees. Highest geographic latitude measured'
lon1 = 'GGLON1 %6i degrees. Westmost geographic longitude measured'
lon2 = 'GGLON2 %6i degrees. Eastmost geographic longitude measured'
pl1 = 'PL1 %11i Shortest radar pulse length'
pl2 = 'PL2 %11i Longest radar pulse length'
retStr = ''
minAlt = self._summary.getMinValidAltitude()
maxAlt = self._summary.getMaxValidAltitude()
if minAlt > 0 and minAlt < 1E9:
retStr += alt1 % (int(minAlt))
retStr = self._padStr(retStr, self._lineLen)
retStr += alt2 % (int(maxAlt))
retStr = self._padStr(retStr, self._lineLen)
minLat = self._summary.getMinLatitude()
maxLat = self._summary.getMaxLatitude()
if minLat > -91 and minLat < 91:
retStr += lat1 % (int(minLat))
retStr = self._padStr(retStr, self._lineLen)
retStr += lat2 % (int(maxLat))
retStr = self._padStr(retStr, self._lineLen)
minLon = self._summary.getMinLongitude()
maxLon = self._summary.getMaxLongitude()
if minLon > -181 and minLon < 360:
retStr += lon1 % (int(minLon))
retStr = self._padStr(retStr, self._lineLen)
retStr += lon2 % (int(maxLon))
retStr = self._padStr(retStr, self._lineLen)
minPl = self._summary.getMinPulseLength()
maxPl = self._summary.getMaxPulseLength()
if minPl > 0.001 and minPl < 10E9:
retStr += pl1 % (int(minPl))
retStr = self._padStr(retStr, self._lineLen)
retStr += pl2 % (int(maxPl))
retStr = self._padStr(retStr, self._lineLen)
return(retStr)
def _createCedarLines(self, prefix, text):
"""_createCedarLines is a private method that returns a string which is a multiple of
80 characters (no line feeds) where each 80 character block starts with prefix,then a space,
and then the next part of text that fits on line, padded with spaces
"""
lineLen = self._lineLen
if len(prefix) > self._lineLen/2:
raise IOError('Too long prefix %s' % (str(prefix)))
retStr = ''
# first check for line feeds
lines = text.split('\n')
for line in lines:
# now split by words
words = line.split()
for word in words:
# see if this word can fit on one line
if len(word) + 1 > lineLen - len(prefix) + 1:
raise IOError('Can not fit the word <%s> in a Cedar text record' % (word))
# see if there's room for this word
if (lineLen - (len(retStr) % lineLen) <= len(word) + 1) or \
(len(retStr) % lineLen == 0):
retStr = self._padStr(retStr, lineLen)
retStr += '%s ' % (prefix)
retStr += '%s ' % (word)
# at line break, we always pad
retStr = self._padStr(retStr, lineLen)
return(retStr)
def _padStr(self, thisStr, lineLen):
"""_padStr is a private method that pads a string with spaces so its length is module lineLen
"""
spacesToPad = lineLen - (len(thisStr) % lineLen)
if spacesToPad == lineLen:
return(thisStr)
thisStr += ' ' * spacesToPad
return(thisStr)
class CedarParameter:
"""CedarParameter is a class with attributes code, mnemonic, and description, and isInt"""
def __init__(self, code, mnemonic, description, isInt):
self.code = int(code)
self.mnemonic = str(mnemonic)
self.description = str(description)
self.isInt = bool(isInt)
def __str__(self):
return('%6i: %20s: %s, isInt=%s' % (self.code, self.mnemonic, self.description, str(self.isInt)))
class convertToNetCDF4:
def __init__(self, inputHdf5, outputNC):
"""convertToNetCDF4 converts a Madrigal HDF5 file to netCDF4 using Array Layout
rather than using Table Layout as cedar module does. Can handle large Hdf5 file
without large memory footprint, and is much faster than reading in using
madrigal.cedar.MadrigalCedarFile
Inputs:
inputHdf5 - filename of input Madrigal Hdf5 file
outputNC - output netCDF4 file
"""
madParmObj = madrigal.data.MadrigalParameters()
self._fi = h5py.File(inputHdf5, 'r')
if 'Array Layout' not in self._fi['Data']:
if os.path.getsize(inputHdf5) < 50000000:
# for smaller files we simply go through the slower full cedar conversion
cedarObj = MadrigalCedarFile(inputHdf5)
cedarObj.write('netCDF4', outputNC)
return
else:
# file is to big to load into memory at once, read only 10 records at once at write to file
# parm IndexDict is a dictionary with key = timestamps and ind spatial parm names,
# value = dictionary of keys = unique values, value = index
# temp only
total = 0
t = time.time()
parmIndexDict = self._getParmIndexDict()
self._fi.close()
madCedarObj = madrigal.cedar.MadrigalCedarFile(inputHdf5, maxRecords=10)
madCedarObj.dump('netCDF4', outputNC, parmIndexDict)
total += 10
while (True):
# temp only
print('%i done so far in %f secs' % (total, time.time()-t))
newRecs, isComplete = madCedarObj.loadNextRecords(10)
if isComplete:
break
madCedarObj.dump('netCDF4', outputNC, parmIndexDict)
if newRecs < 10:
break
total += newRecs
# compress
filename, file_extension = os.path.splitext(outputNC)
# tmp file name to use to run h5repack
tmpFile = filename + '_tmp' + file_extension
cmd = 'h5repack -i %s -o %s --filter=GZIP=4' % (outputNC, tmpFile)
try:
subprocess.check_call(shlex.split(cmd))
except:
traceback.print_exc()
return
shutil.move(tmpFile, outputNC)
return
self._fo = netCDF4.Dataset(outputNC, 'w', format='NETCDF4')
self._fo.catalog_text = self.getCatalogText()
self._fo.header_text = self.getHeaderText()
# write Experiment Parameters
experimentParameters = self._fi['Metadata']['Experiment Parameters']
for i in range(len(experimentParameters)):
name = experimentParameters['name'][i]
if type(name) in (bytes, numpy.bytes_):
name = name.decode("utf8")
# make text acceptable attribute names
name = name.replace(' ', '_')
name = name.replace('(s)', '')
self._fo.setncattr(name, experimentParameters['value'][i])
indParmList = [parm[0].lower() for parm in self._fi['Metadata']['Independent Spatial Parameters']]
# split parms - if any
has_split = 'Parameters Used to Split Array Data' in list(self._fi['Metadata'].keys())
arraySplittingMnemonics = []
if has_split:
arraySplittingParms = self._fi['Metadata']['Parameters Used to Split Array Data']
arrSplitParmDesc = ''
for i in range(len(arraySplittingParms)):
arrSplitParmDesc += '%s: ' % (arraySplittingParms[i]['mnemonic'].lower())
arrSplitParmDesc += '%s' % (arraySplittingParms[i]['description'].lower())
arraySplittingMnemonics.append(arraySplittingParms[i]['mnemonic'].lower())
if arraySplittingParms[i] != arraySplittingParms[-1]:
arrSplitParmDesc += ' -- '
self._fo.parameters_used_to_split_data = arrSplitParmDesc
if has_split:
names = list(self._fi['Data']['Array Layout'].keys())
groups = [self._fi['Data']['Array Layout'][name] for name in names]
else:
names = [None]
groups = [self._fi['Data']['Array Layout']]
# loop through each split array (or just top level, if none
for i in range(len(groups)):
name = names[i]
if not name is None:
nc_name = name.strip().replace(' ', '_')
thisGroup = self._fo.createGroup(nc_name)
hdf5Group = self._fi['Data']['Array Layout'][name]
else:
thisGroup = self._fo
hdf5Group = self._fi['Data']['Array Layout']
times = hdf5Group['timestamps']
# next step - create dimensions
dims = []
# first time dim
thisGroup.createDimension("timestamps", len(times))
timeVar = thisGroup.createVariable("timestamps", 'f8', ("timestamps",),
zlib=True)
timeVar.units = 'Unix seconds'
timeVar.description = 'Number of seconds since UT midnight 1970-01-01'
timeVar[:] = times
dims.append("timestamps")
# next ind parms, because works well with ncview that way
for indParm in indParmList:
if type(indParm) == bytes:
indParmString = indParm.decode('utf8')
else:
indParmString = indParm
if indParmString in arraySplittingMnemonics:
continue
thisGroup.createDimension(indParmString, len(hdf5Group[indParmString]))
if madParmObj.isInteger(indParmString):
thisVar = thisGroup.createVariable(indParmString, 'i8', (indParmSting,),
zlib=True)
thisVar[:] = hdf5Group[indParmString]
elif madParmObj.isString(indParmString):
slen = len(hdf5Group[indParmString][0])
dtype = 'S%i' % (slen)
thisVar = thisGroup.createVariable(indParmString, dtype, (indParmString,),
zlib=True)
for i in range(len(hdf5Group[indParmString])):
thisVar[i] = str(hdf5Group[indParmString][i])
else:
thisVar = thisGroup.createVariable(indParmString, 'f8', (indParmString,),
zlib=True)
thisVar[:] = hdf5Group[indParmString]
thisVar.units = madParmObj.getParmUnits(indParmString)
thisVar.description = madParmObj.getSimpleParmDescription(indParmString)
dims.append(indParmString)
# get all one d data
oneDParms = list(hdf5Group['1D Parameters'].keys())
for oneDParm in oneDParms:
if oneDParm in indParmList:
if oneDParm not in arraySplittingMnemonics:
continue
if oneDParm.find('Data Parameters') != -1:
continue
if madParmObj.isInteger(oneDParm):
oneDVar = thisGroup.createVariable(oneDParm, 'i8', (dims[0],),
zlib=True)
elif madParmObj.isString(oneDParm):
slen = len(hdf5Group['1D Parameters'][oneDParm][0])
dtype = 'S%i' % (slen)
oneDVar = thisGroup.createVariable(oneDParm, dtype, (dims[0],),
zlib=True)
else:
oneDVar = thisGroup.createVariable(oneDParm, 'f8', (dims[0],),
zlib=True)
oneDVar.units = madParmObj.getParmUnits(oneDParm)
oneDVar.description = madParmObj.getSimpleParmDescription(oneDParm)
try:
oneDVar[:] = hdf5Group['1D Parameters'][oneDParm]
except:
oneDVar[:] = hdf5Group['1D Parameters'][oneDParm][()]
# get all two d data
twoDParms = list(hdf5Group['2D Parameters'].keys())
for twoDParm in twoDParms:
if twoDParm.find('Data Parameters') != -1:
continue
if twoDParm in indParmList:
if twoDParm not in arraySplittingMnemonics:
continue
if madParmObj.isInteger(twoDParm):
twoDVar = thisGroup.createVariable(twoDParm, 'i8', dims,
zlib=True)
elif madParmObj.isString(twoDParm):
slen = len(hdf5Group['2D Parameters'][twoDParm][0])
dtype = 'S%i' % (slen)
twoDVar = thisGroup.createVariable(twoDParm, dtype, dims,
zlib=True)
else:
twoDVar = thisGroup.createVariable(twoDParm, 'f8', dims,
zlib=True)
twoDVar.units = madParmObj.getParmUnits(twoDParm)
twoDVar.description = madParmObj.getSimpleParmDescription(twoDParm)
# move the last dim in Hdf5 (time) to be the first now
reshape = list(range(len(dims)))
newShape = reshape[-1:] + reshape[0:-1]
data = numpy.transpose(hdf5Group['2D Parameters'][twoDParm], newShape)
twoDVar[:] = data
data = None
self._fo.close()
self._fi.close()
def getCatalogText(self):
"""getCatalogText returns the catalog record text as a string
"""
if not 'Experiment Notes' in list(self._fi['Metadata'].keys()):
return('')
notes = self._fi['Metadata']['Experiment Notes']
retStr = ''
for substr in notes:
if substr[0].find(b'Header information') != -1:
break
retStr += substr[0].decode('utf-8')
return(retStr)
def getHeaderText(self):
"""getHeaderText returns the header record text as a string
"""
if not 'Experiment Notes' in list(self._fi['Metadata'].keys()):
return('')
notes = self._fi['Metadata']['Experiment Notes']
retStr = ''
headerFound = False
for substr in notes:
if substr[0].find(b'Header information') != -1:
headerFound = True
if headerFound:
retStr += substr[0].decode('utf-8')
return(retStr)
def _getParmIndexDict(self):
"""_getParmIndexDict returns a dictionary with key = timestamps and ind spatial parm names,
value = dictionary of keys = unique values, value = index of that value
"""
retDict = {}
parmList = ['ut1_unix'] + [parm[0].lower() for parm in self._fi['Metadata']['Independent Spatial Parameters']]
for parm in parmList:
if type(parm) == bytes:
parm = parm.decode('utf-8')
values = self._fi['Data']['Table Layout'][parm]
unique_values = numpy.unique(values)
sorted_values = numpy.sort(unique_values)
retDict[parm] = collections.OrderedDict()
for value, key in numpy.ndenumerate(sorted_values):
if type(key) in (numpy.bytes_, bytes):
key = key.decode('utf-8')
retDict[parm][key] = value[0]
return(retDict)
class convertToText:
def __init__(self, inputHdf5, outputTxt, summary='plain', showHeaders=False,
filterList=None, missing=None, assumed=None, knownbad=None):
"""convertToText converts a Madrigal HDF5 file to a text file. Designed to be able
to handle large files without a large memory footprint
Inputs:
inputHdf5 - filename of input Madrigal Hdf5 file
outputTxt - output text file
summary - type of summary line to print at top. Allowed values are:
'plain' - text only mnemonic names, but only if not showHeaders
'html' - mnemonic names wrapped in standard javascript code to allow descriptive popups
'summary' - print overview of file and filters used. Also text only mnemonic names,
but only if not showHeaders
None - no summary line
showHeaders - if True, print header in format for each record. If False, the default,
do not.
filterList - a list of madrigal.derivation.MadrigalFilter objects to be described in the
summary. Default is None, in which case not described in summary. Ignored if summary
is not 'summary'
missing, assumed, knownbad - how to print Cedar special values. Default is None for
all, so that value printed in value in numpy table as per spec.
"""
madCedarObj = madrigal.cedar.MadrigalCedarFile(inputHdf5, maxRecords=10)
madCedarObj.writeText(outputTxt, summary=summary, showHeaders=showHeaders, filterList=filterList,
missing=missing, assumed=assumed, knownbad=knownbad, append=True,
firstWrite=True)
while (True):
newRecs, isComplete = madCedarObj.loadNextRecords(10)
if isComplete:
break
madCedarObj.writeText(outputTxt, summary=summary, showHeaders=showHeaders,
missing=missing, assumed=assumed, knownbad=knownbad,
append=True, firstWrite=False)
if newRecs < 10:
break
class _TableSubset:
"""_TableSubset is a private class which defines a subset of a Table Layout created by
one combination of array splitting parameter values
"""
def __init__(self, arraySplittingParms,
arraySplittingValues,
fullTable):
"""TableSubset creates a TableSubset based on input parameters
Inputs:
arraySplittingParms - ordered list of mnemonics used to split table
arraySplittingValues - values used for this subset
fullTable - the full table to take a subset of
Creates attributes:
self.arraySplittingParms - input parm
self.arraySplittingValues - input parm
self.table - table subset
self.oneDIndices - a numpy int array of indicies of one D values (one index per time/record)
"""
self.arraySplittingValues = arraySplittingValues
self.arraySplittingParms = []
for parm in arraySplittingParms:
if type(parm) == bytes:
self.arraySplittingParms.append(parm.decode('utf8'))
else:
self.arraySplittingParms.append(parm)
if len(self.arraySplittingParms) != len(self.arraySplittingValues):
raise ValueError('Two input list must have equal length, not %i and %i' % \
(len(self.arraySplittingParms), len(self.arraySplittingValues)))
if len(self.arraySplittingParms) == 0:
self.table = fullTable
# get oneDIndices
a1 = numpy.concatenate(([-1], self.table['recno']))
a2 = numpy.concatenate((self.table['recno'], [-1]))
self.oneDIndices = numpy.where(a1 != a2)
self.oneDIndices = self.oneDIndices[0][:-1]
return
indices = None
trueArr = numpy.ones((len(fullTable),), dtype=bool)
falseArr = numpy.zeros((len(fullTable),), dtype=bool)
if len(self.arraySplittingParms) == 1:
indices = numpy.where(fullTable[arraySplittingParms[0]]==arraySplittingValues[0],
trueArr, falseArr)
self.table = fullTable[indices]
# I can only figure out how to get numpy to AND two conditions at once, so do this as a loop
else:
for i in range(len(self.arraySplittingParms) - 1):
if indices is None:
indices = numpy.where(numpy.logical_and(fullTable[self.arraySplittingParms[i]]==arraySplittingValues[i],
fullTable[self.arraySplittingParms[i+1]]==arraySplittingValues[i+1]),
trueArr, falseArr)
else:
indices = numpy.where(numpy.logical_and(indices,
fullTable[self.arraySplittingParms[i+1]]==arraySplittingValues[i+1]),
trueArr, falseArr)
self.table = fullTable[indices]
# get oneDIndices
a1 = numpy.concatenate(([-1], self.table['recno']))
a2 = numpy.concatenate((self.table['recno'], [-1]))
self.oneDIndices = numpy.where(a1 != a2)
self.oneDIndices = self.oneDIndices[0][:-1]
# verify the number of records == number of times
if len(numpy.unique(self.table['ut1_unix'])) != len(self.oneDIndices):
raise ValueError('Number of times %i not equal to number of records %i' % (len(numpy.unique(self.table['ut1_unix'])), len(self.oneDIndices)))
def getGroupName(self):
"""getGroupName returns the group name in the form
If no arraySplittingParms, returns None
"""
arraySplittingParms = []
for parm in self.arraySplittingParms:
if type(parm) != str:
arraySplittingParms.append(parm.decode('utf8'))
else:
arraySplittingParms.append(parm)
if len(arraySplittingParms) == 0:
return(None)
groupName = 'Array with '
for i, parm in enumerate(arraySplittingParms):
groupName += '%s=%s ' % (parm, str(self.arraySplittingValues[i]))
if i < len(self.arraySplittingValues)-1:
groupName += 'and '
return(groupName)
def compareParms(parm):
"""compareParms is used internally by getHeaderKodLines to order the parameters
Inputs:
parm - tuple of (code, parameter description, units)
Returns float(absolute value of code) + 0.5 if less than zero. Else returns float(code)
which results in positive code coming before negative of same absolute value
"""
if parm[0] < 0:
return(float(abs(parm[0])) + 0.5)
else:
return(float(abs(parm[0])))
if __name__ == '__main__':
cedarObj = MadrigalCedarFile('/home/grail/brideout/madroot/experiments/1998/mlh/20jan98/mlh980120g.001')
print('len of cedarObj is %i' % (len(cedarObj)))
for i in range(2):
print(cedarObj[i])
cedarObj.write('Madrigal', '/home/grail/brideout/junk.001')
newCedarObj = MadrigalCedarFile('/home/grail/brideout/madroot/experiments/1998/mlh/20jan98/mlh980120g.002', True)
dataObj = MadrigalDataRecord(31,
1000,
2001,1,1,0,0,0,0,
2001,1,1,0,2,59,99,
['azm', 'elm', 'rgate'],
['range', 'ti', 'drange'],
4)
dataObj.set1D('azm',45.0)
dataObj.set1D('elm',85.0)
dataObj.set2D(-120,2,'assumed')
newCedarObj.append(dataObj)
print(len(newCedarObj))
print(newCedarObj)
print(dataObj.get1D('azm'))
print(dataObj.get1D('elm'))
print(dataObj.get1D('rgate'))
print(dataObj.get2D('range', 2))
print(dataObj.get2D('drange', 2))
dataObj.set2D('range', 0, 100)
dataObj.set2D('range', 1, 150)
dataObj.set2D('range', 2, 200)
dataObj.set2D('range', 3, 250)
print('kinst is %i' % (dataObj.getKinst()))
print('kindat is %i' % (dataObj.getKindat()))
oneDList = dataObj.get1DParms()
print('The following are 1D parms:')
for parm in oneDList:
print(parm)
print('now removing 1d rgate')
dataObj.delete1D('rgate')
oneDList = dataObj.get1DParms()
print('The following are now the 1D parms:')
for parm in oneDList:
print(parm)
twoDList = dataObj.get2DParms()
print('\nThe following are 2D parms:')
for parm in twoDList:
print(parm)
print(dataObj)
print('now deleting drange')
dataObj.delete2DParm('drange')
print(dataObj)
print('now deleting 2nd and 3rd row')
dataObj.delete2DRows((1,2))
print(dataObj)
Module variables
var assumed
var knownbad
var missing
var timeParms
Functions
def compareParms(
parm)
compareParms is used internally by getHeaderKodLines to order the parameters
Inputs:
parm - tuple of (code, parameter description, units)
Returns float(absolute value of code) + 0.5 if less than zero. Else returns float(code) which results in positive code coming before negative of same absolute value
def compareParms(parm):
"""compareParms is used internally by getHeaderKodLines to order the parameters
Inputs:
parm - tuple of (code, parameter description, units)
Returns float(absolute value of code) + 0.5 if less than zero. Else returns float(code)
which results in positive code coming before negative of same absolute value
"""
if parm[0] < 0:
return(float(abs(parm[0])) + 0.5)
else:
return(float(abs(parm[0])))
def getLowerCaseList(
l)
getLowerCaseList returns a new list that is all lowercase given an input list that may have upper case strings
def getLowerCaseList(l):
"""getLowerCaseList returns a new list that is all lowercase given an input list
that may have upper case strings
"""
return([s.lower() for s in l])
def listRecords(
hdf5Filename, newFilename=None, addedLinkFormat=None)
listRecords outputs a summary of records in the hdf5Filename. Is lower memory footprint than loading full file into memory, then calling loadNextRecords. However, if the file is already fully in memory, than classing the MadrigalCedarFile method loadNextRecords is faster.
Inputs:
hdf5Filename - input Madrigal hdf5 file to use
newFilename - name of new file to create and write to. If None, the default, write to stdout
addedLinkFormat - if not None, add link to end of each record with value addedLinkFormat % (recno).
If None (the default) no added link. Must contain one and only one integer format to be
filled in by recno.
def listRecords(hdf5Filename, newFilename=None, addedLinkFormat=None):
"""listRecords outputs a summary of records in the hdf5Filename. Is lower memory footprint than
loading full file into memory, then calling loadNextRecords. However, if the file is already fully
in memory, than classing the MadrigalCedarFile method loadNextRecords is faster.
Inputs:
hdf5Filename - input Madrigal hdf5 file to use
newFilename - name of new file to create and write to. If None, the default, write to stdout
addedLinkFormat - if not None, add link to end of each record with value addedLinkFormat % (recno).
If None (the default) no added link. Must contain one and only one integer format to be
filled in by recno.
"""
madParmObj = madrigal.data.MadrigalParameters()
madDataObj = madrigal.data.MadrigalFile(hdf5Filename)
formatStr = '%6i: %s %s'
headerStr = ' record start_time end_time'
parms = []
kinstList = madDataObj.getKinstList()
if len(kinstList) > 1:
formatStr += ' %i'
headerStr += ' kinst'
parms.append('kinst')
kindatList = madDataObj.getKindatList()
if len(kindatList) > 1:
formatStr += ' %i'
headerStr += ' kindat'
parms.append('kindat')
if newFilename is not None:
f = open(newFilename, 'w')
else:
f = sys.stdout
if not addedLinkFormat is None:
formatStr += ' ' + addedLinkFormat
headerStr += ' record_plot'
formatStr += '\n'
f.write('%s\n' % headerStr)
# read in data from file
with h5py.File(hdf5Filename, 'r') as fi:
table = fi['Data']['Table Layout']
recno = table['recno']
ut1_unix = table['ut1_unix']
ut2_unix = table['ut2_unix']
if 'kinst' in parms:
kinst = table['kinst']
if 'kindat' in parms:
kindat = table['kindat']
max_recno = int(recno[-1])
for index in range(max_recno + 1):
i = numpy.searchsorted(recno, index)
this_ut1_unix = ut1_unix[i]
this_ut2_unix = ut2_unix[i]
if 'kinst' in parms:
this_kinst = kinst[i]
if 'kindat' in parms:
this_kindat = kindat[i]
sDT = datetime.datetime.utcfromtimestamp(this_ut1_unix)
sDTStr = sDT.strftime('%Y-%m-%d %H:%M:%S')
eDT = datetime.datetime.utcfromtimestamp(this_ut2_unix)
eDTStr = eDT.strftime('%Y-%m-%d %H:%M:%S')
data = [index, sDTStr, eDTStr]
if 'kinst' in parms:
data.append(this_kinst)
if 'kindat' in parms:
data.append(this_kindat)
if not addedLinkFormat is None:
data += [index]
f.write(formatStr % tuple(data))
if f != sys.stdout:
f.close()
def parseArraySplittingParms(
hdf5Filename)
parseArraySplittingParms returns a (possibly empty) list of parameter mnemonic used to split the array layout for a Madrigal Hdf5 file
Input: hdf5Filename - Madrigal Hdf5 filename
Raise IOError if not a valid Madrigal Hdf5 file
def parseArraySplittingParms(hdf5Filename):
"""parseArraySplittingParms returns a (possibly empty) list of parameter mnemonic used to split
the array layout for a Madrigal Hdf5 file
Input: hdf5Filename - Madrigal Hdf5 filename
Raise IOError if not a valid Madrigal Hdf5 file
"""
with h5py.File(hdf5Filename, 'r') as f:
try:
dataGroup = f['Data']
except:
raise IOError('Hdf5 file %s does not have required top level Data group' % (hdf5Filename))
retList = []
if 'Array Layout' not in list(dataGroup.keys()):
return(retList) # no array layout in this file
arrGroup = dataGroup['Array Layout']
for key in list(arrGroup.keys()):
if key.find('Array with') != -1:
items = key.split()
for item in items:
if item.find('=') != -1:
subitems = item.split('=')
parm = subitems[0].lower()
parm = parm.encode('ascii','ignore')
retList.append(parm)
return(retList)
# None found
return(retList)
Classes
class CatalogHeaderCreator
CatalogHeaderCreator is a class that automates the creation of catalog and header records
This class creates and adds catalog and header records that meet the Cedar standards. It does this by examining the input Cedar file for all summary information possible. The user needs only add text that describes their experiment. A Cedar file must already be written to disk before this class is created.
class CatalogHeaderCreator:
"""CatalogHeaderCreator is a class that automates the creation of catalog and header records
This class creates and adds catalog and header records that meet the Cedar standards. It does this
by examining the input Cedar file for all summary information possible. The user needs only
add text that describes their experiment. A Cedar file must already be written to disk before
this class is created.
"""
def __init__(self, madFilename):
"""__init__ reads in all summary information about madFilename using madrigal.data
"""
self._madFilename = madFilename
self._summary = madrigal.data.MadrigalFile(self._madFilename)
self._cedar = MadrigalCedarFile(madFilename, maxRecords=3) # parse small part of file into MadrigalCedarFile object
# create default header and catalog records
self._header = None
self._catalog = None
self._lineLen = 80
def createCatalog(self, principleInvestigator=None,
expPurpose=None,
expMode=None,
cycleTime=None,
correlativeExp=None,
sciRemarks=None,
instRemarks=None):
"""createCatalog will create a catalog record appropriate for this file. The additional
information fields are all optional, and are all simple text strings (except for
cycleTime, which is in minutes). If the text contains line feeds, those will be used
as line breaks in the catalog record.
The descriptions of these fields all come from Barbara Emery's documentation
cedarFormat.pdf
Inputs:
principleInvestigator - Names of responsible Principal Investigator(s) or others knowledgeable
about the experiment.
expPurpose - Brief description of the experiment purpose
expMode - Further elaboration of meaning of MODEXP; e.g. antenna patterns and
pulse sequences.
cycleTime - Minutes for one full measurement cycle
correlativeExp - Correlative experiments (experiments with related data)
sciRemarks - scientific remarks
instRemarks - instrument remarks
Returns: None
Affects: sets self._catalog
"""
# the first step is to create the text part
text = ''
# start with parameter summary lines
if cycleTime != None:
text += 'TIMCY %9i minutes' % (int(cycleTime))
text = self._padStr(text, self._lineLen)
text += self._createMaxMinSummaryLines()
# add the time lines
text += self._createCatalogTimeSection()[0]
# then add any text from input arguments
if expMode != None:
text += self._createCedarLines('CMODEXP ', expMode)
if expPurpose != None:
text += self._createCedarLines('CPURP ', expPurpose)
if correlativeExp != None:
text += self._createCedarLines('CCOREXP ', correlativeExp)
if sciRemarks != None:
text += self._createCedarLines('CSREM ', sciRemarks)
if instRemarks != None:
text += self._createCedarLines('CIREM ', instRemarks)
if principleInvestigator != None:
text += self._createCedarLines('CPI ', principleInvestigator)
# get some other metadata
kinst = self._summary.getKinstList()[0]
modexp = self._summary.getKindatList()[0]
sYear,sMonth,sDay,sHour,sMin,sSec = self._summary.getEarliestTime()
sCentisec = 0
eYear,eMonth,eDay,eHour,eMin,eSec = self._summary.getLatestTime()
eCentisec = 0
# now create the catalog record
self._catalog = MadrigalCatalogRecord(kinst, modexp,
sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec,
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec,
text)
def createHeader(self, kindatDesc=None,
analyst=None,
comments=None,
history=None):
"""createHeader will create a header record appropriate for this file. The additional
information fields are all optional, and are all simple text strings. If the text contains l
ine feeds, those will be used as line breaks in the header record.
Inputs:
kindatDesc - description of how this data was analyzed (the kind of data)
analyst - name of person who analyzed this data
comments - additional comments about data (describe any instrument-specific parameters)
history - a description of the history of the processing of this file
Returns: None
Affects: sets self._header
"""
# the first step is to create the text part
text = ''
if kindatDesc != None:
text += self._createCedarLines('CKINDAT', kindatDesc)
if history != None:
text += self._createCedarLines('CHIST ', history)
# add the time lines
text += self._createHeaderTimeSection()[0]
# add the KOD linesfrom the last record of file (must be data record)
text += self._cedar[-1].getHeaderKodLines()
if comments != None:
text += self._createCedarLines('C ', comments)
if analyst != None:
text += self._createCedarLines('CANALYST', analyst)
# last - time of analysis line
now = datetime.datetime.utcnow()
nowStr = now.strftime('%a %b %d %H:%M:%S %Y')
text += 'CANDATE %s UT' % (nowStr)
text = self._padStr(text, self._lineLen)
# get some other metadata
kinst = self._summary.getKinstList()[0]
kindat = self._summary.getKindatList()[0]
sYear,sMonth,sDay,sHour,sMin,sSec = self._summary.getEarliestTime()
sCentisec = 0
eYear,eMonth,eDay,eHour,eMin,eSec = self._summary.getLatestTime()
eCentisec = 0
jpar = len(self._cedar[-1].get1DParms())
mpar = len(self._cedar[-1].get2DParms())
self._header = MadrigalHeaderRecord(kinst, kindat,
sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec,
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec,
jpar, mpar, text)
def write(self, newFilename=None):
"""write will output the new file with prepended catalog and header records
Raises an IOError if no new catalog or header records to prepend
Inputs:
newFilename - if None, overwrite original file
"""
if self._catalog is None and self._header is None:
raise IOError('Does not make sense to save a new file if no catalog or header has been added')
if self._header != None:
self._cedar.insert(0, self._header)
if self._catalog != None:
self._cedar.insert(0, self._catalog)
if newFilename is None:
newFilename = self._madFilename
else:
shutil.copy(self._madFilename, newFilename)
# open file for appening
with h5py.File(newFilename, 'a') as f:
metadata = f['Metadata']
self._cedar.writeExperimentNotes(metadata, False)
def _createCatalogTimeSection(self):
"""_createCatalogTimeSection will return all the lines in the catalog record that
describe the start and end time of the data records.
Inputs: None
Returns: a tuple with three items 1) a string in the format of the time section of a
catalog record, 2) earliest datetime, 3) latest datetime
"""
earliestStartTimeList = self._summary.getEarliestTime()
earliestStartTime = datetime.datetime(*earliestStartTimeList)
latestEndTimeList = self._summary.getLatestTime()
latestEndTime = datetime.datetime(*latestEndTimeList)
sy = 'IBYRE %4s Beginning year' % (str(earliestStartTime.year))
sd = 'IBDTE %4s Beginning month and day' % (str(earliestStartTime.month*100 + \
earliestStartTime.day))
sh = 'IBHME %4s Beginning UT hour and minute' % (str(earliestStartTime.hour*100 + \
earliestStartTime.minute))
totalCS = earliestStartTime.second*100 + (earliestStartTime.microsecond/10000)
ss = 'IBCSE %4s Beginning centisecond' % (str(totalCS))
ey = 'IEYRE %4s Ending year' % (str(latestEndTime.year))
ed = 'IEDTE %4s Ending month and day' % (str(latestEndTime.month*100 + \
latestEndTime.day))
eh = 'IEHME %4s Ending UT hour and minute' % (str(latestEndTime.hour*100 + \
latestEndTime.minute))
totalCS = latestEndTime.second*100 + (latestEndTime.microsecond/10000)
es = 'IECSE %4s Ending centisecond' % (str(totalCS))
retStr = ''
retStr += sy + (80-len(sy))*' '
retStr += sd + (80-len(sd))*' '
retStr += sh + (80-len(sh))*' '
retStr += ss + (80-len(ss))*' '
retStr += ey + (80-len(ey))*' '
retStr += ed + (80-len(ed))*' '
retStr += eh + (80-len(eh))*' '
retStr += es + (80-len(es))*' '
return((retStr, earliestStartTime, latestEndTime))
def _createHeaderTimeSection(self, dataRecList=None):
"""_createHeaderTimeSection will return all the lines in the header record that
describe the start and end time of the data records.
Inputs:
dataRecList - if given, examine only those MadrigalDataRecords in dataRecList.
If None (the default), examine all MadrigalDataRecords in this
MadrigalCedarFile
Returns: a tuple with three items 1) a string in the format of the time section of a
header record, 2) earliest datetime, 3) latest datetime
"""
earliestStartTimeList = self._summary.getEarliestTime()
earliestStartTime = datetime.datetime(*earliestStartTimeList)
latestEndTimeList = self._summary.getLatestTime()
latestEndTime = datetime.datetime(*latestEndTimeList)
sy = 'IBYRT %4s Beginning year' % (str(earliestStartTime.year))
sd = 'IBDTT %4s Beginning month and day' % (str(earliestStartTime.month*100 + \
earliestStartTime.day))
sh = 'IBHMT %4s Beginning UT hour and minute' % (str(earliestStartTime.hour*100 + \
earliestStartTime.minute))
totalCS = earliestStartTime.second*100 + (earliestStartTime.microsecond/10000)
ss = 'IBCST %4s Beginning centisecond' % (str(totalCS))
ey = 'IEYRT %4s Ending year' % (str(latestEndTime.year))
ed = 'IEDTT %4s Ending month and day' % (str(latestEndTime.month*100 + \
latestEndTime.day))
eh = 'IEHMT %4s Ending UT hour and minute' % (str(latestEndTime.hour*100 + \
latestEndTime.minute))
totalCS = latestEndTime.second*100 + (latestEndTime.microsecond/10000)
es = 'IECST %4s Ending centisecond' % (str(totalCS))
retStr = ''
retStr += sy + (80-len(sy))*' '
retStr += sd + (80-len(sd))*' '
retStr += sh + (80-len(sh))*' '
retStr += ss + (80-len(ss))*' '
retStr += ey + (80-len(ey))*' '
retStr += ed + (80-len(ed))*' '
retStr += eh + (80-len(eh))*' '
retStr += es + (80-len(es))*' '
return((retStr, earliestStartTime, latestEndTime))
def _createMaxMinSummaryLines(self):
"""_createMaxMinSummaryLines is a private method that creates the max and min summary
lines (e.g., alt, gdlat, etc)
"""
alt1 = 'ALT1 %11i km. Lowest altitude measured'
alt2 = 'ALT2 %11i km. Highest altitude measured'
lat1 = 'GGLAT1 %6i degrees. Lowest geographic latitude measured'
lat2 = 'GGLAT2 %6i degrees. Highest geographic latitude measured'
lon1 = 'GGLON1 %6i degrees. Westmost geographic longitude measured'
lon2 = 'GGLON2 %6i degrees. Eastmost geographic longitude measured'
pl1 = 'PL1 %11i Shortest radar pulse length'
pl2 = 'PL2 %11i Longest radar pulse length'
retStr = ''
minAlt = self._summary.getMinValidAltitude()
maxAlt = self._summary.getMaxValidAltitude()
if minAlt > 0 and minAlt < 1E9:
retStr += alt1 % (int(minAlt))
retStr = self._padStr(retStr, self._lineLen)
retStr += alt2 % (int(maxAlt))
retStr = self._padStr(retStr, self._lineLen)
minLat = self._summary.getMinLatitude()
maxLat = self._summary.getMaxLatitude()
if minLat > -91 and minLat < 91:
retStr += lat1 % (int(minLat))
retStr = self._padStr(retStr, self._lineLen)
retStr += lat2 % (int(maxLat))
retStr = self._padStr(retStr, self._lineLen)
minLon = self._summary.getMinLongitude()
maxLon = self._summary.getMaxLongitude()
if minLon > -181 and minLon < 360:
retStr += lon1 % (int(minLon))
retStr = self._padStr(retStr, self._lineLen)
retStr += lon2 % (int(maxLon))
retStr = self._padStr(retStr, self._lineLen)
minPl = self._summary.getMinPulseLength()
maxPl = self._summary.getMaxPulseLength()
if minPl > 0.001 and minPl < 10E9:
retStr += pl1 % (int(minPl))
retStr = self._padStr(retStr, self._lineLen)
retStr += pl2 % (int(maxPl))
retStr = self._padStr(retStr, self._lineLen)
return(retStr)
def _createCedarLines(self, prefix, text):
"""_createCedarLines is a private method that returns a string which is a multiple of
80 characters (no line feeds) where each 80 character block starts with prefix,then a space,
and then the next part of text that fits on line, padded with spaces
"""
lineLen = self._lineLen
if len(prefix) > self._lineLen/2:
raise IOError('Too long prefix %s' % (str(prefix)))
retStr = ''
# first check for line feeds
lines = text.split('\n')
for line in lines:
# now split by words
words = line.split()
for word in words:
# see if this word can fit on one line
if len(word) + 1 > lineLen - len(prefix) + 1:
raise IOError('Can not fit the word <%s> in a Cedar text record' % (word))
# see if there's room for this word
if (lineLen - (len(retStr) % lineLen) <= len(word) + 1) or \
(len(retStr) % lineLen == 0):
retStr = self._padStr(retStr, lineLen)
retStr += '%s ' % (prefix)
retStr += '%s ' % (word)
# at line break, we always pad
retStr = self._padStr(retStr, lineLen)
return(retStr)
def _padStr(self, thisStr, lineLen):
"""_padStr is a private method that pads a string with spaces so its length is module lineLen
"""
spacesToPad = lineLen - (len(thisStr) % lineLen)
if spacesToPad == lineLen:
return(thisStr)
thisStr += ' ' * spacesToPad
return(thisStr)
Ancestors (in MRO)
- CatalogHeaderCreator
- builtins.object
Static methods
def __init__(
self, madFilename)
init reads in all summary information about madFilename using madrigal.data
def __init__(self, madFilename):
"""__init__ reads in all summary information about madFilename using madrigal.data
"""
self._madFilename = madFilename
self._summary = madrigal.data.MadrigalFile(self._madFilename)
self._cedar = MadrigalCedarFile(madFilename, maxRecords=3) # parse small part of file into MadrigalCedarFile object
# create default header and catalog records
self._header = None
self._catalog = None
self._lineLen = 80
def createCatalog(
self, principleInvestigator=None, expPurpose=None, expMode=None, cycleTime=None, correlativeExp=None, sciRemarks=None, instRemarks=None)
createCatalog will create a catalog record appropriate for this file. The additional information fields are all optional, and are all simple text strings (except for cycleTime, which is in minutes). If the text contains line feeds, those will be used as line breaks in the catalog record.
The descriptions of these fields all come from Barbara Emery's documentation cedarFormat.pdf
Inputs:
principleInvestigator - Names of responsible Principal Investigator(s) or others knowledgeable
about the experiment.
expPurpose - Brief description of the experiment purpose
expMode - Further elaboration of meaning of MODEXP; e.g. antenna patterns and
pulse sequences.
cycleTime - Minutes for one full measurement cycle
correlativeExp - Correlative experiments (experiments with related data)
sciRemarks - scientific remarks
instRemarks - instrument remarks
Returns: None
Affects: sets self._catalog
def createCatalog(self, principleInvestigator=None,
expPurpose=None,
expMode=None,
cycleTime=None,
correlativeExp=None,
sciRemarks=None,
instRemarks=None):
"""createCatalog will create a catalog record appropriate for this file. The additional
information fields are all optional, and are all simple text strings (except for
cycleTime, which is in minutes). If the text contains line feeds, those will be used
as line breaks in the catalog record.
The descriptions of these fields all come from Barbara Emery's documentation
cedarFormat.pdf
Inputs:
principleInvestigator - Names of responsible Principal Investigator(s) or others knowledgeable
about the experiment.
expPurpose - Brief description of the experiment purpose
expMode - Further elaboration of meaning of MODEXP; e.g. antenna patterns and
pulse sequences.
cycleTime - Minutes for one full measurement cycle
correlativeExp - Correlative experiments (experiments with related data)
sciRemarks - scientific remarks
instRemarks - instrument remarks
Returns: None
Affects: sets self._catalog
"""
# the first step is to create the text part
text = ''
# start with parameter summary lines
if cycleTime != None:
text += 'TIMCY %9i minutes' % (int(cycleTime))
text = self._padStr(text, self._lineLen)
text += self._createMaxMinSummaryLines()
# add the time lines
text += self._createCatalogTimeSection()[0]
# then add any text from input arguments
if expMode != None:
text += self._createCedarLines('CMODEXP ', expMode)
if expPurpose != None:
text += self._createCedarLines('CPURP ', expPurpose)
if correlativeExp != None:
text += self._createCedarLines('CCOREXP ', correlativeExp)
if sciRemarks != None:
text += self._createCedarLines('CSREM ', sciRemarks)
if instRemarks != None:
text += self._createCedarLines('CIREM ', instRemarks)
if principleInvestigator != None:
text += self._createCedarLines('CPI ', principleInvestigator)
# get some other metadata
kinst = self._summary.getKinstList()[0]
modexp = self._summary.getKindatList()[0]
sYear,sMonth,sDay,sHour,sMin,sSec = self._summary.getEarliestTime()
sCentisec = 0
eYear,eMonth,eDay,eHour,eMin,eSec = self._summary.getLatestTime()
eCentisec = 0
# now create the catalog record
self._catalog = MadrigalCatalogRecord(kinst, modexp,
sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec,
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec,
text)
def createHeader(
self, kindatDesc=None, analyst=None, comments=None, history=None)
createHeader will create a header record appropriate for this file. The additional information fields are all optional, and are all simple text strings. If the text contains l ine feeds, those will be used as line breaks in the header record.
Inputs:
kindatDesc - description of how this data was analyzed (the kind of data)
analyst - name of person who analyzed this data
comments - additional comments about data (describe any instrument-specific parameters)
history - a description of the history of the processing of this file
Returns: None
Affects: sets self._header
def createHeader(self, kindatDesc=None,
analyst=None,
comments=None,
history=None):
"""createHeader will create a header record appropriate for this file. The additional
information fields are all optional, and are all simple text strings. If the text contains l
ine feeds, those will be used as line breaks in the header record.
Inputs:
kindatDesc - description of how this data was analyzed (the kind of data)
analyst - name of person who analyzed this data
comments - additional comments about data (describe any instrument-specific parameters)
history - a description of the history of the processing of this file
Returns: None
Affects: sets self._header
"""
# the first step is to create the text part
text = ''
if kindatDesc != None:
text += self._createCedarLines('CKINDAT', kindatDesc)
if history != None:
text += self._createCedarLines('CHIST ', history)
# add the time lines
text += self._createHeaderTimeSection()[0]
# add the KOD linesfrom the last record of file (must be data record)
text += self._cedar[-1].getHeaderKodLines()
if comments != None:
text += self._createCedarLines('C ', comments)
if analyst != None:
text += self._createCedarLines('CANALYST', analyst)
# last - time of analysis line
now = datetime.datetime.utcnow()
nowStr = now.strftime('%a %b %d %H:%M:%S %Y')
text += 'CANDATE %s UT' % (nowStr)
text = self._padStr(text, self._lineLen)
# get some other metadata
kinst = self._summary.getKinstList()[0]
kindat = self._summary.getKindatList()[0]
sYear,sMonth,sDay,sHour,sMin,sSec = self._summary.getEarliestTime()
sCentisec = 0
eYear,eMonth,eDay,eHour,eMin,eSec = self._summary.getLatestTime()
eCentisec = 0
jpar = len(self._cedar[-1].get1DParms())
mpar = len(self._cedar[-1].get2DParms())
self._header = MadrigalHeaderRecord(kinst, kindat,
sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec,
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec,
jpar, mpar, text)
def write(
self, newFilename=None)
write will output the new file with prepended catalog and header records
Raises an IOError if no new catalog or header records to prepend
Inputs:
newFilename - if None, overwrite original file
def write(self, newFilename=None):
"""write will output the new file with prepended catalog and header records
Raises an IOError if no new catalog or header records to prepend
Inputs:
newFilename - if None, overwrite original file
"""
if self._catalog is None and self._header is None:
raise IOError('Does not make sense to save a new file if no catalog or header has been added')
if self._header != None:
self._cedar.insert(0, self._header)
if self._catalog != None:
self._cedar.insert(0, self._catalog)
if newFilename is None:
newFilename = self._madFilename
else:
shutil.copy(self._madFilename, newFilename)
# open file for appening
with h5py.File(newFilename, 'a') as f:
metadata = f['Metadata']
self._cedar.writeExperimentNotes(metadata, False)
class CedarParameter
CedarParameter is a class with attributes code, mnemonic, and description, and isInt
class CedarParameter:
"""CedarParameter is a class with attributes code, mnemonic, and description, and isInt"""
def __init__(self, code, mnemonic, description, isInt):
self.code = int(code)
self.mnemonic = str(mnemonic)
self.description = str(description)
self.isInt = bool(isInt)
def __str__(self):
return('%6i: %20s: %s, isInt=%s' % (self.code, self.mnemonic, self.description, str(self.isInt)))
Ancestors (in MRO)
- CedarParameter
- builtins.object
Static methods
def __init__(
self, code, mnemonic, description, isInt)
Initialize self. See help(type(self)) for accurate signature.
def __init__(self, code, mnemonic, description, isInt):
self.code = int(code)
self.mnemonic = str(mnemonic)
self.description = str(description)
self.isInt = bool(isInt)
Instance variables
var code
var description
var isInt
var mnemonic
class MadrigalCatalogRecord
MadrigalCatalogRecord holds all the information in a Cedar catalog record.
class MadrigalCatalogRecord:
"""MadrigalCatalogRecord holds all the information in a Cedar catalog record."""
def __init__(self,kinst = None,
modexp = None,
sYear = None, sMonth = None, sDay = None,
sHour = None, sMin = None, sSec = None, sCentisec = None,
eYear = None, eMonth = None, eDay = None,
eHour = None, eMin = None, eSec = None, eCentisec = None,
text = None,
madInstObj = None, modexpDesc = '',
expNotesLines=None):
"""__init__ creates a MadrigalCatalogRecord.
Note: all inputs have default values because there are two ways to populate this structure:
1) with all inputs from kinst to text when new data is being created, or
2) with catalog line list from existing Hdf5 file Experiment Notes metadata, plus non-default inputs
Inputs:
kinst - the kind of instrument code. A warning will be raised if not in instTab.txt.
modexp - Code to indicate experimental mode employed. Must be a non-negative integer.
sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec - experiment start time. sCentisec must be 0-99
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec - experiment end time. eCentisec must be 0-99
text - string containing text in catalog record. Length must be divisible by 80. No linefeeds
allowed.
madInstObj - a madrigal.metadata.MadrigalInstrument object. If None, one will be created.
Used to verify kinst.
modexpDesc - string describing the modexp
expNotesList - a list of all lines in an existing catalog section "Experiment Notes"
metadata table. All the above attributes are parsed from these lines.
Outputs: None
Returns: None
"""
# create any needed Madrigal objects, if not passed in
if madInstObj is None:
self._madInstObj = madrigal.metadata.MadrigalInstrument()
else:
self._madInstObj = madInstObj
if expNotesLines != None:
# get all information from this dataset
self._parseExpNotesLines(expNotesLines)
if not kinst is None:
# kinst set via catalog record overrides kinst argument
try:
self.getKinst()
except AttributeError:
self.setKinst(kinst)
# verify kinst set, or raise error
try:
self.getKinst()
except AttributeError:
raise ValueError('kinst not set when MadrigalCatalogRecord created - required')
if not modexp is None:
self.setModexp(modexp)
if len(modexpDesc) > 0:
self.setModexpDesc(modexpDesc)
try:
self.setTimeLists(sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec,
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec)
except:
pass
if not text is None:
self.setText(text)
def getType(self):
""" returns the type 'catalog'"""
return 'catalog'
def getKinst(self):
"""getKinst returns the kind of instrument code (int) for a given catalog record.
Inputs: None
Outputs: the kind of instrument code (int) for a given catalog record.
"""
return(self._kinst)
def setKinst(self, kinst):
"""setKinst sets the kind of instrument code (int) for a given catalog record.
Inputs: kind of instrument code (integer)
Outputs: None
Affects: sets the kind of instrument code (int) (self._kinst) for a given catalog record.
Prints warning if kinst not found in instTab.txt
"""
kinst = int(kinst)
# verify and set kinst
instList = self._madInstObj.getInstrumentList()
found = False
for inst in instList:
if inst[2] == kinst:
self._instrumentName = inst[0]
found = True
break
if found == False:
self._instrumentName = 'Unknown instrument'
sys.stderr.write('Warning: kinst %i not found in instTab.txt\n' % (kinst))
self._kinst = kinst
def getModexp(self):
"""getModexp returns the mode of experiment code (int) for a given catalog record.
Inputs: None
Outputs: the mode of experiment code (int) for a given catalog record. Returns -1 if not set
"""
try:
return(self._modexp)
except AttributeError:
return(-1)
def setModexp(self, modexp):
"""setModexp sets the mode of experiment code (int) for a given catalog record.
Inputs: the mode of experiment code (int)
Outputs: None
Affects: sets the mode of experiment code (int) (self._modexp)
"""
self._modexp = int(modexp)
def getModexpDesc(self):
"""getModexp returns the description of the mode of experiment code for a given catalog record.
Inputs: None
Outputs: the description of the mode of experiment code for a given catalog record (string).
Returns empty string if not set
"""
try:
return(self._modexpDesc)
except AttributeError:
return('')
def setModexpDesc(self, modexpDesc):
"""setModexpDesc sets the description of the mode of experiment code for a given catalog record.
Inputs: the description mode of experiment code (string)
Outputs: None
Affects: sets the description of the mode of experiment code (string) (self._modexpDesc)
"""
self._modexpDesc = str(modexpDesc)
def getText(self):
"""getText returns the catalog text.
Inputs: None
Outputs: the catalog text.
"""
return(self._text)
def getTextLineCount(self):
"""getTextLineCount returns the number of 80 character lines in self._text
"""
return(len(self._text) / 80)
def setText(self, text):
"""setText sets the catalog text.
Inputs: text: text to be set. Must be length divisible by 80, and not contain line feeds.
Outputs: None.
Affects: sets self._text
Raise TypeError if problem with test
"""
if type(text) != str:
raise TypeError('text must be of type string')
if len(text) % 80 != 0:
raise TypeError('text length must be divisible by 80: len is %i' % (len(text)))
if text.find('\n') != -1:
raise TypeError('text must not contain linefeed character')
self._text = text
def getStartTimeList(self):
"""getStartTimeList returns a tuple containing sYear, sMonth, sDay, sHour, sMin, sSec, and sCentisec
Inputs: None
Outputs: a tuple containing sYear, sMonth, sDay, sHour, sMin, sSec, and sCentisec.
"""
return((self._sYear,
self._sMonth,
self._sDay,
self._sHour,
self._sMin,
self._sSec,
self._sCentisec))
def getEndTimeList(self):
"""getEndTimeList returns a tuple containing eYear, eMonth, eDay, eHour, eMin, eSec, and eCentisec
Inputs: None
Outputs: a tuple containing eYear, eMonth, eDay, eHour, eMin, eSec, and eCentisec.
"""
return((self._eYear,
self._eMonth,
self._eDay,
self._eHour,
self._eMin,
self._eSec,
self._eCentisec))
def setTimeLists(self, sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec,
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec):
"""setTimeList resets start and end times
Inputs:
sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec - experiment start time. sCentisec must be 0-99
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec - experiment end time. eCentisec must be 0-99
Outputs: None
Affects: sets all time attributes (see code).
Exceptions: Raises ValueError if startTime > endTime
"""
# verify times
sTime = datetime.datetime(sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec*10000)
eTime = datetime.datetime(eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec*10000)
if eTime < sTime:
raise ValueError('Starting time cannot be after ending time')
self._sTime = madrigal.metadata.getMadrigalUTFromDT(sTime)
self._eTime = madrigal.metadata.getMadrigalUTFromDT(eTime)
self._sYear = sYear
self._sMonth = sMonth
self._sDay = sDay
self._sHour = sHour
self._sMin = sMin
self._sSec = sSec
self._sCentisec = sCentisec
self._eYear = eYear
self._eMonth = eMonth
self._eDay = eDay
self._eHour = eHour
self._eMin = eMin
self._eSec = eSec
self._eCentisec = eCentisec
def getLines(self):
"""getLines returns a numpy recarray of the format expected by the "Experiment Notes" dataset
"""
# templates
kreccStr = 'KRECC 2001 Catalogue Record, Version 1'
kinstTempStr = 'KINSTE %i %s'
modExpTempStr = 'MODEXP %i %s'
byearTempStr = 'IBYRT %04i Beginning year'
bmdTempStr = 'IBDTT %04i Beginning month and day'
bhmTempStr = 'IBHMT %04i Beginning UT hour and minute'
bcsTempStr = 'IBCST %04i Beginning centisecond'
eyearTempStr = 'IEYRT %04i Ending year'
emdTempStr = 'IEDTT %04i Ending month and day'
ehmTempStr = 'IEHMT %04i Ending UT hour and minute'
ecsTempStr = 'IECST %04i Ending centisecond'
numLines = int(self.getTextLineCount() + 12) # 8 times lines, KRECC, KINSTE, MODEXP, and final blank
textArr = numpy.recarray((numLines,), dtype=[('File Notes', h5py.special_dtype(vlen=str))])
for i in range(numLines-9):
if i == 0:
textArr[i]['File Notes'] = kreccStr + ' ' * (80 - len(kreccStr))
elif i == 1:
kinstName = self._madInstObj.getInstrumentName(self.getKinst())
kinstStr = kinstTempStr % (self.getKinst(), kinstName)
if len(kinstStr) > 80:
kinstStr = kinstStr[:80]
textArr[i]['File Notes'] = kinstStr + ' ' * (80 - len(kinstStr))
elif i == 2:
modExpStr = modExpTempStr % (self.getModexp(), self.getModexpDesc())
if len(modExpStr) > 80:
modExpStr = modExpStr[:80]
textArr[i]['File Notes'] = modExpStr + ' ' * (80 - len(modExpStr))
else:
textArr[i]['File Notes'] = self.getText()[(i-3)*80:(i-2)*80]
# finally add time lines
sYear, sMonth, sDay, sHour, sMin, sSec, sCentisec = self.getStartTimeList()
eYear, eMonth, eDay, eHour, eMin, eSec, eCentisec = self.getEndTimeList()
ibdtt = sMonth*100 + sDay
ibhmt = sHour*100 + sMin
ibcst = sSec*100 + sCentisec
iedtt = eMonth*100 + eDay
iehmt = eHour*100 + eMin
iecst = eSec*100 + eCentisec
sYearStr = byearTempStr % (sYear)
textArr[i+1]['File Notes'] = sYearStr + ' ' * (80 - len(sYearStr))
sMDStr = bmdTempStr % (ibdtt)
textArr[i+2]['File Notes'] = sMDStr + ' ' * (80 - len(sMDStr))
sHMStr = bhmTempStr % (ibhmt)
textArr[i+3]['File Notes'] = sHMStr + ' ' * (80 - len(sHMStr))
sCSStr = bcsTempStr % (ibcst)
textArr[i+4]['File Notes'] = sCSStr + ' ' * (80 - len(sCSStr))
eYearStr = eyearTempStr % (eYear)
textArr[i+5]['File Notes'] = eYearStr + ' ' * (80 - len(eYearStr))
eMDStr = emdTempStr % (iedtt)
textArr[i+6]['File Notes'] = eMDStr + ' ' * (80 - len(eMDStr))
eHMStr = ehmTempStr % (iehmt)
textArr[i+7]['File Notes'] = eHMStr + ' ' * (80 - len(eHMStr))
eCSStr = ecsTempStr % (iecst)
textArr[i+8]['File Notes'] = eCSStr + ' ' * (80 - len(eCSStr))
textArr[i+9]['File Notes'] = ' ' * 80
return(textArr)
def _parseExpNotesLines(self, expNotesLines):
"""_parseExpNotesLines populates all attributes in MadrigalCatalogRecord
from text from metadata table "Experiment Notes"
"""
if len(expNotesLines) % 80 != 0:
raise ValueError('Len of expNotesLines must be divisible by 80, len %i is not' % (len(expNotesLines)))
self._text = '' # init to empty
self._modexpDesc = ''
self._modexp = 0
delimiter = ' '
# default times
bsec = 0
bcsec = 0
esec = 0
ecsec = 0
for i in range(int(len(expNotesLines) / 80)):
line = expNotesLines[i*80:(i+1)*80]
items = line.split()
if len(items) == 0:
# blank line
self.setText(self.getText() + line)
continue
elif items[0].upper() == 'KRECC':
# ignore
continue
elif items[0].upper() == 'KINSTE':
self.setKinst(int(items[1]))
elif items[0].upper() == 'MODEXP':
try:
self.setModexp(int(items[1]))
except:
self.setModexp(0)
if len(items) > 2:
self.setModexpDesc(delimiter.join(items[2:]))
# start time
elif items[0].upper() == 'IBYRE':
byear = int(items[1])
elif items[0].upper() == 'IBDTE':
ibdte = int(items[1])
bmonth = ibdte / 100
bday = ibdte % 100
elif items[0].upper() == 'IBHME':
ibhme = int(items[1])
bhour = ibhme / 100
bmin = ibhme % 100
elif items[0].upper() == 'IBCSE':
ibcse = int(float(items[1]))
bsec = ibcse / 100
bcsec = ibcse % 100
# end time
elif items[0].upper() == 'IEYRE':
eyear = int(items[1])
elif items[0].upper() == 'IEDTE':
iedte = int(items[1])
emonth = iedte / 100
eday = iedte % 100
elif items[0].upper() == 'IEHME':
iehme = int(items[1])
ehour = iehme / 100
emin = iehme % 100
elif items[0].upper() == 'IECSE':
iecse = int(float(items[1]))
esec = iecse / 100
ecsec = iecse % 100
else:
self.setText(self.getText() + line)
try:
# set times
self.setTimeLists(byear, bmonth, bday, bhour, bmin, bsec, bcsec,
eyear, emonth, eday, ehour, emin, esec, ecsec)
except:
pass
def __str__(self):
""" returns a string representation of a MadrigalCatalogRecord """
retStr = 'Catalog Record:\n'
retStr += 'kinst = %i (%s)\n' % (self._kinst, self._instrumentName)
retStr += 'modexp = %i\n' % (self._modexp)
retStr += 'record start: %04i-%02i-%02i %02i:%02i:%02i.%02i\n' % (self._sYear,
self._sMonth,
self._sDay,
self._sHour,
self._sMin,
self._sSec,
self._sCentisec)
retStr += 'record end: %04i-%02i-%02i %02i:%02i:%02i.%02i\n' % (self._eYear,
self._eMonth,
self._eDay,
self._eHour,
self._eMin,
self._eSec,
self._eCentisec)
for i in range(0, len(self._text) -1, 80):
retStr += '%s\n' % (self._text[i:i+80])
return(retStr)
def __cmp__(self, other):
"""cmpRecords compares two cedar records to allow them to be sorted
"""
if other is None:
return(1)
# compare record start times
fList = self.getStartTimeList()
sList = other.getStartTimeList()
fDT = datetime.datetime(*fList)
sDT = datetime.datetime(*sList)
result = cmp(fDT, sDT)
if result:
return(result)
# compare record type
typeList = ('catalog', 'header', 'data')
fType = self.getType()
sType = other.getType()
result = cmp(typeList.index(fType), typeList.index(sType))
if result:
return(result)
# compare record stop times
fList = self.getEndTimeList()
sList = other.getEndTimeList()
fDT = datetime.datetime(*fList)
sDT = datetime.datetime(*sList)
result = cmp(fDT, sDT)
if result:
return(result)
# compare kindat if both data
if fType == 'data' and sType == 'data':
result = cmp(self.getKindat(), other.getKindat())
if result:
return(result)
return(0)
Ancestors (in MRO)
- MadrigalCatalogRecord
- builtins.object
Static methods
def __init__(
self, kinst=None, modexp=None, sYear=None, sMonth=None, sDay=None, sHour=None, sMin=None, sSec=None, sCentisec=None, eYear=None, eMonth=None, eDay=None, eHour=None, eMin=None, eSec=None, eCentisec=None, text=None, madInstObj=None, modexpDesc='', expNotesLines=None)
init creates a MadrigalCatalogRecord.
Note: all inputs have default values because there are two ways to populate this structure: 1) with all inputs from kinst to text when new data is being created, or 2) with catalog line list from existing Hdf5 file Experiment Notes metadata, plus non-default inputs
Inputs:
kinst - the kind of instrument code. A warning will be raised if not in instTab.txt.
modexp - Code to indicate experimental mode employed. Must be a non-negative integer.
sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec - experiment start time. sCentisec must be 0-99
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec - experiment end time. eCentisec must be 0-99
text - string containing text in catalog record. Length must be divisible by 80. No linefeeds
allowed.
madInstObj - a madrigal.metadata.MadrigalInstrument object. If None, one will be created.
Used to verify kinst.
modexpDesc - string describing the modexp
expNotesList - a list of all lines in an existing catalog section "Experiment Notes"
metadata table. All the above attributes are parsed from these lines.
Outputs: None
Returns: None
def __init__(self,kinst = None,
modexp = None,
sYear = None, sMonth = None, sDay = None,
sHour = None, sMin = None, sSec = None, sCentisec = None,
eYear = None, eMonth = None, eDay = None,
eHour = None, eMin = None, eSec = None, eCentisec = None,
text = None,
madInstObj = None, modexpDesc = '',
expNotesLines=None):
"""__init__ creates a MadrigalCatalogRecord.
Note: all inputs have default values because there are two ways to populate this structure:
1) with all inputs from kinst to text when new data is being created, or
2) with catalog line list from existing Hdf5 file Experiment Notes metadata, plus non-default inputs
Inputs:
kinst - the kind of instrument code. A warning will be raised if not in instTab.txt.
modexp - Code to indicate experimental mode employed. Must be a non-negative integer.
sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec - experiment start time. sCentisec must be 0-99
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec - experiment end time. eCentisec must be 0-99
text - string containing text in catalog record. Length must be divisible by 80. No linefeeds
allowed.
madInstObj - a madrigal.metadata.MadrigalInstrument object. If None, one will be created.
Used to verify kinst.
modexpDesc - string describing the modexp
expNotesList - a list of all lines in an existing catalog section "Experiment Notes"
metadata table. All the above attributes are parsed from these lines.
Outputs: None
Returns: None
"""
# create any needed Madrigal objects, if not passed in
if madInstObj is None:
self._madInstObj = madrigal.metadata.MadrigalInstrument()
else:
self._madInstObj = madInstObj
if expNotesLines != None:
# get all information from this dataset
self._parseExpNotesLines(expNotesLines)
if not kinst is None:
# kinst set via catalog record overrides kinst argument
try:
self.getKinst()
except AttributeError:
self.setKinst(kinst)
# verify kinst set, or raise error
try:
self.getKinst()
except AttributeError:
raise ValueError('kinst not set when MadrigalCatalogRecord created - required')
if not modexp is None:
self.setModexp(modexp)
if len(modexpDesc) > 0:
self.setModexpDesc(modexpDesc)
try:
self.setTimeLists(sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec,
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec)
except:
pass
if not text is None:
self.setText(text)
def getEndTimeList(
self)
getEndTimeList returns a tuple containing eYear, eMonth, eDay, eHour, eMin, eSec, and eCentisec
Inputs: None
Outputs: a tuple containing eYear, eMonth, eDay, eHour, eMin, eSec, and eCentisec.
def getEndTimeList(self):
"""getEndTimeList returns a tuple containing eYear, eMonth, eDay, eHour, eMin, eSec, and eCentisec
Inputs: None
Outputs: a tuple containing eYear, eMonth, eDay, eHour, eMin, eSec, and eCentisec.
"""
return((self._eYear,
self._eMonth,
self._eDay,
self._eHour,
self._eMin,
self._eSec,
self._eCentisec))
def getKinst(
self)
getKinst returns the kind of instrument code (int) for a given catalog record.
Inputs: None
Outputs: the kind of instrument code (int) for a given catalog record.
def getKinst(self):
"""getKinst returns the kind of instrument code (int) for a given catalog record.
Inputs: None
Outputs: the kind of instrument code (int) for a given catalog record.
"""
return(self._kinst)
def getLines(
self)
getLines returns a numpy recarray of the format expected by the "Experiment Notes" dataset
def getLines(self):
"""getLines returns a numpy recarray of the format expected by the "Experiment Notes" dataset
"""
# templates
kreccStr = 'KRECC 2001 Catalogue Record, Version 1'
kinstTempStr = 'KINSTE %i %s'
modExpTempStr = 'MODEXP %i %s'
byearTempStr = 'IBYRT %04i Beginning year'
bmdTempStr = 'IBDTT %04i Beginning month and day'
bhmTempStr = 'IBHMT %04i Beginning UT hour and minute'
bcsTempStr = 'IBCST %04i Beginning centisecond'
eyearTempStr = 'IEYRT %04i Ending year'
emdTempStr = 'IEDTT %04i Ending month and day'
ehmTempStr = 'IEHMT %04i Ending UT hour and minute'
ecsTempStr = 'IECST %04i Ending centisecond'
numLines = int(self.getTextLineCount() + 12) # 8 times lines, KRECC, KINSTE, MODEXP, and final blank
textArr = numpy.recarray((numLines,), dtype=[('File Notes', h5py.special_dtype(vlen=str))])
for i in range(numLines-9):
if i == 0:
textArr[i]['File Notes'] = kreccStr + ' ' * (80 - len(kreccStr))
elif i == 1:
kinstName = self._madInstObj.getInstrumentName(self.getKinst())
kinstStr = kinstTempStr % (self.getKinst(), kinstName)
if len(kinstStr) > 80:
kinstStr = kinstStr[:80]
textArr[i]['File Notes'] = kinstStr + ' ' * (80 - len(kinstStr))
elif i == 2:
modExpStr = modExpTempStr % (self.getModexp(), self.getModexpDesc())
if len(modExpStr) > 80:
modExpStr = modExpStr[:80]
textArr[i]['File Notes'] = modExpStr + ' ' * (80 - len(modExpStr))
else:
textArr[i]['File Notes'] = self.getText()[(i-3)*80:(i-2)*80]
# finally add time lines
sYear, sMonth, sDay, sHour, sMin, sSec, sCentisec = self.getStartTimeList()
eYear, eMonth, eDay, eHour, eMin, eSec, eCentisec = self.getEndTimeList()
ibdtt = sMonth*100 + sDay
ibhmt = sHour*100 + sMin
ibcst = sSec*100 + sCentisec
iedtt = eMonth*100 + eDay
iehmt = eHour*100 + eMin
iecst = eSec*100 + eCentisec
sYearStr = byearTempStr % (sYear)
textArr[i+1]['File Notes'] = sYearStr + ' ' * (80 - len(sYearStr))
sMDStr = bmdTempStr % (ibdtt)
textArr[i+2]['File Notes'] = sMDStr + ' ' * (80 - len(sMDStr))
sHMStr = bhmTempStr % (ibhmt)
textArr[i+3]['File Notes'] = sHMStr + ' ' * (80 - len(sHMStr))
sCSStr = bcsTempStr % (ibcst)
textArr[i+4]['File Notes'] = sCSStr + ' ' * (80 - len(sCSStr))
eYearStr = eyearTempStr % (eYear)
textArr[i+5]['File Notes'] = eYearStr + ' ' * (80 - len(eYearStr))
eMDStr = emdTempStr % (iedtt)
textArr[i+6]['File Notes'] = eMDStr + ' ' * (80 - len(eMDStr))
eHMStr = ehmTempStr % (iehmt)
textArr[i+7]['File Notes'] = eHMStr + ' ' * (80 - len(eHMStr))
eCSStr = ecsTempStr % (iecst)
textArr[i+8]['File Notes'] = eCSStr + ' ' * (80 - len(eCSStr))
textArr[i+9]['File Notes'] = ' ' * 80
return(textArr)
def getModexp(
self)
getModexp returns the mode of experiment code (int) for a given catalog record.
Inputs: None
Outputs: the mode of experiment code (int) for a given catalog record. Returns -1 if not set
def getModexp(self):
"""getModexp returns the mode of experiment code (int) for a given catalog record.
Inputs: None
Outputs: the mode of experiment code (int) for a given catalog record. Returns -1 if not set
"""
try:
return(self._modexp)
except AttributeError:
return(-1)
def getModexpDesc(
self)
getModexp returns the description of the mode of experiment code for a given catalog record.
Inputs: None
Outputs: the description of the mode of experiment code for a given catalog record (string). Returns empty string if not set
def getModexpDesc(self):
"""getModexp returns the description of the mode of experiment code for a given catalog record.
Inputs: None
Outputs: the description of the mode of experiment code for a given catalog record (string).
Returns empty string if not set
"""
try:
return(self._modexpDesc)
except AttributeError:
return('')
def getStartTimeList(
self)
getStartTimeList returns a tuple containing sYear, sMonth, sDay, sHour, sMin, sSec, and sCentisec
Inputs: None
Outputs: a tuple containing sYear, sMonth, sDay, sHour, sMin, sSec, and sCentisec.
def getStartTimeList(self):
"""getStartTimeList returns a tuple containing sYear, sMonth, sDay, sHour, sMin, sSec, and sCentisec
Inputs: None
Outputs: a tuple containing sYear, sMonth, sDay, sHour, sMin, sSec, and sCentisec.
"""
return((self._sYear,
self._sMonth,
self._sDay,
self._sHour,
self._sMin,
self._sSec,
self._sCentisec))
def getText(
self)
getText returns the catalog text.
Inputs: None
Outputs: the catalog text.
def getText(self):
"""getText returns the catalog text.
Inputs: None
Outputs: the catalog text.
"""
return(self._text)
def getTextLineCount(
self)
getTextLineCount returns the number of 80 character lines in self._text
def getTextLineCount(self):
"""getTextLineCount returns the number of 80 character lines in self._text
"""
return(len(self._text) / 80)
def getType(
self)
returns the type 'catalog'
def getType(self):
""" returns the type 'catalog'"""
return 'catalog'
def setKinst(
self, kinst)
setKinst sets the kind of instrument code (int) for a given catalog record.
Inputs: kind of instrument code (integer)
Outputs: None
Affects: sets the kind of instrument code (int) (self._kinst) for a given catalog record. Prints warning if kinst not found in instTab.txt
def setKinst(self, kinst):
"""setKinst sets the kind of instrument code (int) for a given catalog record.
Inputs: kind of instrument code (integer)
Outputs: None
Affects: sets the kind of instrument code (int) (self._kinst) for a given catalog record.
Prints warning if kinst not found in instTab.txt
"""
kinst = int(kinst)
# verify and set kinst
instList = self._madInstObj.getInstrumentList()
found = False
for inst in instList:
if inst[2] == kinst:
self._instrumentName = inst[0]
found = True
break
if found == False:
self._instrumentName = 'Unknown instrument'
sys.stderr.write('Warning: kinst %i not found in instTab.txt\n' % (kinst))
self._kinst = kinst
def setModexp(
self, modexp)
setModexp sets the mode of experiment code (int) for a given catalog record.
Inputs: the mode of experiment code (int)
Outputs: None
Affects: sets the mode of experiment code (int) (self._modexp)
def setModexp(self, modexp):
"""setModexp sets the mode of experiment code (int) for a given catalog record.
Inputs: the mode of experiment code (int)
Outputs: None
Affects: sets the mode of experiment code (int) (self._modexp)
"""
self._modexp = int(modexp)
def setModexpDesc(
self, modexpDesc)
setModexpDesc sets the description of the mode of experiment code for a given catalog record.
Inputs: the description mode of experiment code (string)
Outputs: None
Affects: sets the description of the mode of experiment code (string) (self._modexpDesc)
def setModexpDesc(self, modexpDesc):
"""setModexpDesc sets the description of the mode of experiment code for a given catalog record.
Inputs: the description mode of experiment code (string)
Outputs: None
Affects: sets the description of the mode of experiment code (string) (self._modexpDesc)
"""
self._modexpDesc = str(modexpDesc)
def setText(
self, text)
setText sets the catalog text.
Inputs: text: text to be set. Must be length divisible by 80, and not contain line feeds.
Outputs: None.
Affects: sets self._text
Raise TypeError if problem with test
def setText(self, text):
"""setText sets the catalog text.
Inputs: text: text to be set. Must be length divisible by 80, and not contain line feeds.
Outputs: None.
Affects: sets self._text
Raise TypeError if problem with test
"""
if type(text) != str:
raise TypeError('text must be of type string')
if len(text) % 80 != 0:
raise TypeError('text length must be divisible by 80: len is %i' % (len(text)))
if text.find('\n') != -1:
raise TypeError('text must not contain linefeed character')
self._text = text
def setTimeLists(
self, sYear, sMonth, sDay, sHour, sMin, sSec, sCentisec, eYear, eMonth, eDay, eHour, eMin, eSec, eCentisec)
setTimeList resets start and end times
Inputs:
sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec - experiment start time. sCentisec must be 0-99
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec - experiment end time. eCentisec must be 0-99
Outputs: None
Affects: sets all time attributes (see code).
Exceptions: Raises ValueError if startTime > endTime
def setTimeLists(self, sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec,
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec):
"""setTimeList resets start and end times
Inputs:
sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec - experiment start time. sCentisec must be 0-99
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec - experiment end time. eCentisec must be 0-99
Outputs: None
Affects: sets all time attributes (see code).
Exceptions: Raises ValueError if startTime > endTime
"""
# verify times
sTime = datetime.datetime(sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec*10000)
eTime = datetime.datetime(eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec*10000)
if eTime < sTime:
raise ValueError('Starting time cannot be after ending time')
self._sTime = madrigal.metadata.getMadrigalUTFromDT(sTime)
self._eTime = madrigal.metadata.getMadrigalUTFromDT(eTime)
self._sYear = sYear
self._sMonth = sMonth
self._sDay = sDay
self._sHour = sHour
self._sMin = sMin
self._sSec = sSec
self._sCentisec = sCentisec
self._eYear = eYear
self._eMonth = eMonth
self._eDay = eDay
self._eHour = eHour
self._eMin = eMin
self._eSec = eSec
self._eCentisec = eCentisec
class MadrigalCedarFile
MadrigalCedarFile is an object that allows the creation and editing of Cedar files.
This class emulates a python list, and so users may treat it just like a python list. The restriction enforced is that all items in the list must be either MadrigalCatalogRecords, MadrigalHeaderRecords, or MadrigalDataRecords (all also defined in the madrigal.cedar module). Each of these three classes supports the method getType(), which returns 'catalog', 'header', and 'data', respectively.
There are two programming patterns to choice from when using this module. For smaller input or output files, read in files using the default maxRecords=None, so that the entire file is read into memory. Output files using write for hdf5 or netCDF4 output, or writeText with the default append=False for text files. This will be somewhat faster than using the larger file pattern below.
For larger files, init the reading with maxRecords = some value, and then read in the rest using loadNextRecords. Write hdf5 file with a series of dumps, and then close with close(). Write text files using writeText with append = True. Write large netCDF4 files by first creating a large Hdf5 file as above, and then use convertToNetCDF4 to create the large netCDF4 file. This approach is somewhat slower, but has a limited memory footprint.
Usage example::
# the following example inserts a catalog record at the beginning of an existing file
import madrigal.cedar.MadrigalCedarFile, time
cedarObj = madrigal.cedar.MadrigalCedarFile('/opt/madrigal/experiments/1998/mlh/20jan98/mil980120g.003.hdf5')
startTime = time.mktime((1998,1,20,0,0,0,0,0,0)) - time.timezone
endTime = time.mktime((1998,1,21,23,59,59,0,0,0)) - time.timezone
# catLines is a list of 80 character lines to be included in catalog record
catObj = madrigal.cedar.MadrigalCatalogRecord(31, 1000, 1998,1,20,0,0,0,0,
1998,1,21,23,59,59,99, catLines)
cedarObj.insert(0, catObj)
cedarObj.write()
Non-standard Python modules used: None
Change history:
Major rewrite in Jan 2013 as moved to Hdf5 and Madrigal 3.0
Written by "Bill Rideout":mailto:wrideout@haystack.mit.edu April. 6, 2005
class MadrigalCedarFile:
"""MadrigalCedarFile is an object that allows the creation and editing of Cedar files.
This class emulates a python list, and so users may treat it just like a python list. The
restriction enforced is that all items in the list must be either MadrigalCatalogRecords,
MadrigalHeaderRecords, or MadrigalDataRecords (all also defined in the madrigal.cedar module).
Each of these three classes supports the method getType(), which returns 'catalog', 'header',
and 'data', respectively.
There are two programming patterns to choice from when using this module. For smaller input or output files,
read in files using the default maxRecords=None, so that the entire file is read into memory. Output files
using write for hdf5 or netCDF4 output, or writeText with the default append=False for text files. This
will be somewhat faster than using the larger file pattern below.
For larger files, init the reading with maxRecords = some value, and then read in the rest using loadNextRecords.
Write hdf5 file with a series of dumps, and then close with close(). Write text files using writeText with append
= True. Write large netCDF4 files by first creating a large Hdf5 file as above, and then use convertToNetCDF4
to create the large netCDF4 file. This approach is somewhat slower, but has a limited memory footprint.
Usage example::
# the following example inserts a catalog record at the beginning of an existing file
import madrigal.cedar.MadrigalCedarFile, time
cedarObj = madrigal.cedar.MadrigalCedarFile('/opt/madrigal/experiments/1998/mlh/20jan98/mil980120g.003.hdf5')
startTime = time.mktime((1998,1,20,0,0,0,0,0,0)) - time.timezone
endTime = time.mktime((1998,1,21,23,59,59,0,0,0)) - time.timezone
# catLines is a list of 80 character lines to be included in catalog record
catObj = madrigal.cedar.MadrigalCatalogRecord(31, 1000, 1998,1,20,0,0,0,0,
1998,1,21,23,59,59,99, catLines)
cedarObj.insert(0, catObj)
cedarObj.write()
Non-standard Python modules used: None
Change history:
Major rewrite in Jan 2013 as moved to Hdf5 and Madrigal 3.0
Written by "Bill Rideout":mailto:wrideout@haystack.mit.edu April. 6, 2005
"""
# cedar special values
missing = numpy.nan
missing_int = numpy.iinfo(numpy.int64).min
requiredFields = ('year', 'month', 'day', 'hour', 'min', 'sec', 'recno', 'kindat', 'kinst',
'ut1_unix', 'ut2_unix')
def __init__(self, fullFilename,
createFlag=False,
startDatetime=None,
endDatetime=None,
maxRecords=None,
recDset=None,
arraySplitParms=None,
skipArray=False):
"""__init__ initializes MadrigalCedarFile by reading in existing file, if any.
Inputs:
fullFilename - either the existing Cedar file in Hdf5 format,
or a file to be created. May also be None if this
data is simply derived parameters that be written to stdout.
createFlag - tells whether this is a file to be created. If False and
fullFilename cannot be read, an error is raised. If True and
fullFilename already exists, or fullFilename cannot be created,
an error is raised.
startDatetime - if not None (the default), reject all input records where
record end time < startDatetime (datetime.datetime object).
Ignored if createFlag == True
endDatetime - if not None (the default), reject all input records where
record start time > endDatetime (datetime.datetime object)
Ignored if createFlag == True
maxRecords - the maximum number of records to read into memory
Ignored if createFlag == True
recDset - a numpy recarray with column names the names of all parameters, starting
with requiredFields. Values are 1 for 1D (all the required parms are 1D), 2 for
dependent 2D, and 3 for independent spatial 2D parameters. If None, self._recDset
not set until first data record appended.
arraySplitParms - if None (the default), read in arraySplitParms from the exiting file.
Otherwise set self._arraySplitParms to arraySplitParms, which is a list of 1D or 2D parms
where each unique set of value of the parms in this list will be used to split the full
data into separate arrays in Hdf5 or netCDF4 files. For example arraySplitParms=['beamid']
would split the data into separate arrays for each beamid. If None and new file is being
created, no array splitting (self._arraySplitParms = []).
skipArray - if False and any 2D parms, create array layout (the default). If True, skip array
layout (typically used when there are too many ind parm value combinations - generally not recommended).
Affects: populates self._privList if file exists. self._privList is the underlying
list of MadrigalDataRecords, MadrigalCatalogRecords, and MadrigalHeaderRecords.
Also populates:
self._tableDType - the numpy dtype to use to build the table layout
self._nextRecord - the index of the next record to read from the input file. Not used if
createFlag = True
(The following are the input arguments described above)
self._fullFilename
self._createFlag
self._startDatetime
self._endDatetime
self._maxRecords
self._totalDataRecords - number of data records appended (may differ from len(self._privList)
if dump called).
self._minMaxParmDict - a dictionary with key = parm mnems, value = tuple of
min, max values (may be nan)
self._arrDict - a dictionary with key = list of array split parm values found in file,
('' if no spliting), and values = dict of key = 'ut1_unix' and ind 2d parm names (excluding
array splitting parms, if also ind 2D parm), and values
= python set of all unique values. Populated only if createFlag=True. Used to create
Array Layout
self._recIndexList - a list of (startIndex, endIndex) for each data record added. Used to slice out
data records from Table Layout
self._num2DSplit - Number of arraySplitParms that are 2D
self._closed - a boolean used to determine if the file being created was already closed
Returns: void
"""
self._privList = []
self._fullFilename = fullFilename
self._startDatetime = startDatetime
self._endDatetime = endDatetime
self._maxRecords = maxRecords
self._totalDataRecords = 0
self._nextRecord = 0
self._tableDType = None # will be set to the dtype of Table Layout
self._oneDList = None # will be set when first data record appended
self._twoDList = None # will be set when first data record appended
self._ind2DList = None # will be set when first data record appended
self._arraySplitParms = arraySplitParms
self._skipArray = bool(skipArray)
if createFlag:
self._closed = False
else:
self._closed = True # no need to close file only being read
self._hdf5Extensions = ('.hdf5', '.h5', '.hdf')
# keep track of earliest and latest record times
self._earliestDT = None
self._latestDT = None
# summary info
self._experimentParameters = None
self._kinstList = [] # a list of all kinsts integers in file
self._kindatList = [] # a list of all kindats integers in file
self._status = 'Unknown' # can be externally set
self._format = None # used to check that partial writes via dump are consistent
if createFlag not in (True, False):
raise ValueError('in MadrigalCedarFile, createFlag must be either True or False')
self._createFlag = createFlag
if createFlag == False:
if not os.access(fullFilename, os.R_OK):
raise ValueError('in MadrigalCedarFile, fullFilename %s does not exist' % (str(fullFilename)))
if not fullFilename.endswith(self._hdf5Extensions):
raise IOError('MadrigalCedarFile can only read in CEDAR Hdf5 files, not %s' % (fullFilename))
if createFlag == True:
if fullFilename != None: # then this data will never be persisted - only written to stdout
if os.access(fullFilename, os.R_OK):
raise ValueError('in MadrigalCedarFile, fullFilename %s already exists' % (str(fullFilename)))
if not os.access(os.path.dirname(fullFilename), os.W_OK):
raise ValueError('in MadrigalCedarFile, fullFilename %s cannot be created' % (str(fullFilename)))
if not fullFilename.endswith(self._hdf5Extensions):
raise IOError('All Madrigal files must end with hdf5 extension, <%s> does not' % (str(fullFilename)))
if self._arraySplitParms is None:
self._arraySplitParms = []
# create needed Madrigal objects
self._madDBObj = madrigal.metadata.MadrigalDB()
self._madInstObj = madrigal.metadata.MadrigalInstrument(self._madDBObj)
self._madParmObj = madrigal.data.MadrigalParameters(self._madDBObj)
self._madKindatObj = madrigal.metadata.MadrigalKindat(self._madDBObj)
if not self._arraySplitParms is None:
self._arraySplitParms = [self._madParmObj.getParmMnemonic(p).lower() for p in self._arraySplitParms]
self._minMaxParmDict = {}
self._arrDict = {}
self._num2DSplit = None # will be set to bool when first record added
self._recIndexList = []
if recDset is not None:
self._recDset = recDset
else:
self._recDset = None
if createFlag == False:
self.loadNextRecords(self._maxRecords)
def loadNextRecords(self, numRecords=None, removeExisting=True):
"""loadNextRecords loads a maximum of numRecords. Returns tuple of the the number of records loaded, and boolean of whether complete.
May be less than numRecords if not enough records in the input file. Returns 0 if no records left.
Inputs:
numRecords - number of records to try to load. If None, load all remaining records
removeExisting - if True (the default), remove existing records before loading new
ones. If False, append new records to existing records.
Returns:
tuple of the the number of records loaded, and boolean of whether complete.
May be less than numRecords if not enough records.
Raises error if file opened with createFlag = True
"""
if self._createFlag:
raise IOError('Cannot call loadNextRecords when creating a new MadrigalCedarFile')
if removeExisting:
self._privList = []
isComplete = False
hdfFile = h5py.File(self._fullFilename, 'r')
tableDset = hdfFile["Data"]["Table Layout"]
metadataGroup = hdfFile["Metadata"]
recDset = metadataGroup["_record_layout"]
if self._nextRecord == 0:
if self._recDset is None:
self._recDset = recDset[()]
elif self._recDset != recDset:
raise IOError('recDset in first record <%s> does not match expected recDset <%s>' % \
(str(recDset), str(self._recDset)))
self._verifyFormat(tableDset, recDset)
self._tableDType = tableDset.dtype
self._experimentParameters = numpy.array(hdfFile["Metadata"]['Experiment Parameters'])
self._kinstList = self._getKinstList(self._experimentParameters)
self._kindatList = self._getKindatList(self._experimentParameters)
if self._arraySplitParms is None:
self._arraySplitParms = self._getArraySplitParms(hdfFile["Metadata"])
if 'Experiment Notes' in list(hdfFile["Metadata"].keys()):
self._appendCatalogRecs(hdfFile["Metadata"]['Experiment Notes'])
self._appendHeaderRecs(hdfFile["Metadata"]['Experiment Notes'])
if self._ind2DList is not None:
parmObjList = (self._oneDList, self._twoDList, self._ind2DList) # used for performance in load
else:
parmObjList = None
# get indices for each record
recLoaded = 0
recTested = 0
if not hasattr(self, 'recnoArr'):
self.recnoArr = tableDset['recno']
# read all the records in at once for performance
if not numRecords is None:
indices = numpy.searchsorted(self.recnoArr, numpy.array([self._nextRecord, self._nextRecord + numRecords]))
tableIndices = numpy.arange(indices[0], indices[1])
if len(tableIndices) > 0:
fullTableSlice = tableDset[tableIndices[0]:tableIndices[-1]+1]
fullRecnoArr = fullTableSlice['recno']
else:
fullTableSlice = tableDset
fullRecnoArr = self.recnoArr
while(True):
if not numRecords is None:
if len(tableIndices) == 0:
isComplete = True
break
if numRecords:
if recTested >= numRecords:
break
# get slices of tableDset and recDset to create next MadrigalDataRecord
indices = numpy.searchsorted(fullRecnoArr, numpy.array([self._nextRecord, self._nextRecord + 1]))
tableIndices = numpy.arange(indices[0], indices[1])
if len(tableIndices) == 0:
isComplete = True
break
tableSlice = fullTableSlice[tableIndices[0]:tableIndices[-1]+1]
self._recIndexList.append((tableIndices[0],tableIndices[-1]+1))
self._nextRecord += 1
firstRow = tableSlice[0]
startDT = datetime.datetime.utcfromtimestamp(firstRow['ut1_unix'])
stopDT = datetime.datetime.utcfromtimestamp(firstRow['ut2_unix'])
if firstRow['kinst'] not in self._kinstList:
self._kinstList.append(firstRow['kinst'])
if firstRow['kindat'] not in self._kindatList:
self._kindatList.append(firstRow['kindat'])
# find earliest and latest times
if self._earliestDT is None:
self._earliestDT = startDT
self._latestDT = stopDT
else:
if startDT < self._earliestDT:
self._earliestDT = startDT
if stopDT > self._latestDT:
self._latestDT = stopDT
recTested += 1 # increment here because the next step may reject it
# check if datetime filter should be applied
if not self._startDatetime is None or not self._endDatetime is None:
if not self._startDatetime is None:
if stopDT < self._startDatetime:
continue
if not self._endDatetime is None:
if startDT > self._endDatetime:
isComplete = True
break
if self._ind2DList is None:
try:
indParmList = metadataGroup['Independent Spatial Parameters']['mnemonic']
indParms = [item.decode('utf-8') for item in indParmList]
except:
indParms = []
else:
indParms = self._ind2DList
newMadDataRec = MadrigalDataRecord(madInstObj=self._madInstObj, madParmObj=self._madParmObj,
dataset=tableSlice, recordSet=self._recDset,
parmObjList=parmObjList, ind2DList=indParms)
if self._ind2DList is None:
self._oneDList = newMadDataRec.get1DParms()
self._twoDList = newMadDataRec.get2DParms()
self._ind2DList = newMadDataRec.getInd2DParms()
parmObjList = (self._oneDList, self._twoDList, self._ind2DList) # used for performance in load
# set self._num2DSplit
twoDSet = set([o.mnemonic for o in self._twoDList])
arraySplitSet = set(self._arraySplitParms)
self._num2DSplit = len(twoDSet.intersection(arraySplitSet))
self._privList.append(newMadDataRec)
recLoaded += 1
hdfFile.close()
# update minmax
if self._totalDataRecords > 0:
self.updateMinMaxParmDict()
return((recLoaded, isComplete))
def write(self, format='hdf5', newFilename=None, refreshCatHeadTimes=True,
arraySplittingParms=None, skipArrayLayout=False, overwrite=False):
"""write persists a MadrigalCedarFile to file.
Note: There are two ways to write to a MadrigalCedarFile. Either this method (write) is called after all the
records have been appended to the MadrigalCedarFile, or dump is called after a certain number of records are appended,
and then at the end dump is called a final time if there were any records not yet dumped, followed by close.
The __del__ method will automatically call close if needed, and print a warning that the user should add it to
their code.
write has the advantage of being simplier, but has the disadvantage for larger files of keeping all those records
in memory. dump/close has the advantage of significantly reducing the memory footprint, but is somewhat more complex.
Inputs:
format - a format to save the file in. For now, the allowed values are
'hdf5' and 'netCDF4'. Defaults to 'hdf5'. Use writeText method to get text output.
newFilename - a filename to save to. Defaults to self._fullFilename passed into initializer if not given.
refreshCatHeadTimes - if True (the default), update start and and times in the catalog and header
records to represent the times in the data. If False, use existing times in those records.
skipArrayLayout - if True, do not include Array Layout even if there are independent spatial
parameters. If False (the default) write Array Layout if there are independent spatial
parameters and format = 'hdf5'
arraySplittingParms - a list of parameters as mnemonics used to split
arrays into subarrays. For example, beamcode would split data with separate beamcodes
into separate arrays. The number of separate arrays will be up to the product of the number of
unique values found for each parameter, with the restriction that combinations with no records will
not create a separate array. If default None passed in, then set to self._arraySplitParms,
set when CEDAR file read in.
overwrite - if False (the default) do not overwrite existing file. If True, overwrite file is it already exists.
Outputs: None
Affects: writes a MadrigalCedarFile to file
"""
if self._format != None:
raise ValueError('Cannot call write method after calling dump method')
if newFilename is None:
newFilename = self._fullFilename
if format not in ('hdf5', 'netCDF4'):
raise ValueError('Illegal format <%s> - must be hdf5 or netCDF4' % (format))
if os.access(newFilename, os.R_OK) and not overwrite:
raise IOError('newFilename <%s> already exists' % (newFilename))
self._format = format
if arraySplittingParms is None:
arraySplittingParms = self._arraySplitParms
if arraySplittingParms is None:
arraySplittingParms = []
if self._format == 'hdf5':
if not newFilename.endswith(self._hdf5Extensions):
raise IOError('filename must end with %s, <%s> does not' % (str(self._hdf5Extensions), newFilename))
try:
# we need to make sure this file is closed and then deleted if an error
f = None # used if next line fails
f = h5py.File(newFilename, 'w')
self._writeHdf5Metadata(f, refreshCatHeadTimes)
self._writeHdf5Data(f)
if len(self.getIndSpatialParms()) > 0:
self._createArrayLayout(f, arraySplittingParms)
f.close()
except:
# on any error, close and delete file, then reraise error
if f:
f.close()
if os.access(newFilename, os.R_OK):
os.remove(newFilename)
raise
elif self._format == 'netCDF4':
try:
# we need to make sure this file is closed and then deleted if an error
f = None # used if next line fails
f = netCDF4.Dataset(newFilename, 'w', format='NETCDF4')
self._writeNetCDF4(f, arraySplittingParms)
f.close()
except:
# on any error, close and delete file, then reraise error
if f:
f.close()
if os.access(newFilename, os.R_OK):
os.remove(newFilename)
raise
self._closed = True # write ends with closed file
def dump(self, format='hdf5', newFilename=None, parmIndexDict=None):
"""dump appends all the present records in MadrigalCedarFile to file, and removes present data records from MadrigalCedarFile.
Can be used to append records to a file. Catalog and header records are maintaained.
Typically close is called after all calls to dump. The __del__ method will automatically call
close if needed, and print a warning that the user should add it to their code.
Inputs:
format - a format to save the file in. The format argument only exists for backwards
compatibility - only hdf5 is allowed. IOError raised is any other argument given.
newFilename - a filename to save to. Defaults to self._fullFilename passed into initializer if not given.
parmIndexDict - used only for dumping netCDF4
Outputs: None
Affects: writes a MadrigalCedarFile to file
"""
if self._format != None:
if self._format != format:
raise ValueError('Previous dump format was %s, cannot now use %s' % (str(self._format), str(format)))
if format not in ('hdf5', 'netCDF4'):
raise ValueError('Format must be hdf5 for dump, not %s' % (str(format)))
if newFilename is None:
newFilename = self._fullFilename
if self._format is None:
# first write - run checks, and create all possible metadata and data
if os.access(newFilename, os.R_OK):
raise IOError('newFilename <%s> already exists' % (newFilename))
if format == 'hdf5':
if not newFilename.endswith(tuple(list(self._hdf5Extensions) + ['.nc'])):
raise IOError('filename must end with %s, <%s> does not' % (str(tuple(list(self._hdf5Extensions) + ['.nc'])), newFilename))
elif format == 'netCDF4':
if not newFilename.endswith('.nc'):
raise IOError('filename must end with %s, <%s> does not' % ('.nc', newFilename))
if len(self._privList) == 0:
# nothing to dump
return
if format == 'hdf5':
try:
# we need to make sure this file is closed and then deleted if an error
f = None # used if next line fails
f = h5py.File(newFilename, 'a')
self._closed = False
if self.hasArray(f):
raise IOError('Cannot call dump for hdf5 after write or close')
self._writeHdf5Data(f)
f.close()
except:
# on any error, close and delete file, then reraise error
if f:
f.close()
if os.access(newFilename, os.R_OK):
os.remove(newFilename)
raise
elif format == 'netCDF4':
if len(self._arraySplitParms) != 0:
raise IOError('Cannot dump netCDF4 files with arraySplitParms - write to Hdf5 and then convert')
if self._format is None:
# first write
f = netCDF4.Dataset(newFilename, 'w', format='NETCDF4')
self._firstDumpNetCDF4(f, parmIndexDict)
f.close()
else:
f = netCDF4.Dataset(newFilename, 'a', format='NETCDF4')
self._appendNetCDF4(f, parmIndexDict)
f.close()
self._format = format
# dump data records out of memory
self._privList = [rec for rec in self._privList if not rec.getType() == 'data']
def close(self):
"""close closes an open MadrigalCedarFile. It calls _writeHdf5Metadata and _addArray if ind parms.
Most be called directly when dump used.
"""
if self._closed:
# nothing to do
return
with h5py.File(self._fullFilename, 'a') as f:
self._writeHdf5Metadata(f, refreshCatHeadTimes=True)
if len(self.getIndSpatialParms()) > 0:
if not self._skipArray:
self._addArrayDump()
self._closed = True
def writeText(self, newFilename=None, summary='plain', showHeaders=False, selectParms=None,
filterList=None, missing=None, assumed=None, knownbad=None, append=False,
firstWrite=False):
"""writeText writes text to new filename
Inputs:
newFilename - name of new file to create and write to. If None, the default, write to stdout
summary - type of summary line to print at top. Allowed values are:
'plain' - text only mnemonic names, but only if not showHeaders
'html' - mnemonic names wrapped in standard javascript code to allow descriptive popups
'summary' - print overview of file and filters used. Also text only mnemonic names,
but only if not showHeaders
None - no summary line
showHeaders - if True, print header in format for each record. If False, the default,
do not.
selectParms - If None, simply print all parms that are in the file. If a list
of parm mnemonics, print only those parms in the order specified.
filterList - a list of madrigal.derivation.MadrigalFilter objects to be described in the
summary. Default is None, in which case not described in summary. Ignored if summary
is not 'summary'
missing, assumed, knownbad - how to print Cedar special values. Default is None for
all, so that value printed in value in numpy table as per spec.
append - if True, open newFilename in append mode, and dump records after writing. If False,
open in write mode. Used to allow writing in conjuction with loadNextRecords.
firstWrite - True if this is the first group of records added, and append mode is True.
Used to know whether to write summary lines. If False and append is True, no summary
lines are added; if True and append is True, summary lines are added. If append is not
True, this argument ignored.
"""
# constants
_underscore = 95 # used to indicate missing character
if newFilename is not None:
if append:
f = open(newFilename, 'a')
else:
f = open(newFilename, 'w')
else:
f = sys.stdout
if summary not in ('plain', 'summary', 'html', None):
raise ValueError('Illegal summary value <%s>' % (str(summary)))
# cache information needed to replace special values if needed
# helps performance when replacing
if missing is not None:
missing = str(missing)
missing_len = len(missing)
missing_search = '\ ' * max(0, missing_len-3) + 'nan'
if missing_len < 3:
missing = ' ' * (3-missing_len) + missing
if assumed is not None:
assumed = str(assumed)
assumed_len = len(assumed)
assumed_search = '\ ' * max(0, assumed_len-3) + 'inf'
if assumed_len < 3:
assumed = ' ' * (3-assumed_len) + assumed
if knownbad is not None:
knownbad = str(knownbad)
knownbad_len = len(knownbad)
knownbad_search = '\ ' * max(0, knownbad_len-4) + '-inf'
if knownbad_len < 4:
knownbad = ' ' * (4-knownbad_len) + knownbad
# create format string and header strings
formatStr = ''
parmStr = ''
if not selectParms is None:
names = selectParms
else:
names = self._tableDType.names
for parm in names:
parm = parm.upper()
format = self._madParmObj.getParmFormat(parm)
try:
# first handle float formats
dataWidth = int(format[1:format.find('.')])
# make sure width is big enough for special values
newDataWidth = dataWidth
if missing is not None:
newDataWidth = max(newDataWidth, len(missing)+1)
if self._madParmObj.isError(parm):
if assumed is not None:
newDataWidth = max(newDataWidth, dataWidth, len(assumed)+1)
if knownbad is not None:
newDataWidth = max(newDataWidth, dataWidth, len(knownbad)+1)
if newDataWidth > dataWidth:
# we need to expand format
format = '%%%i%s' % (newDataWidth, format[format.find('.'):])
dataWidth = newDataWidth
except ValueError:
# now handle integer or string formats - assumed to never be error values
if format.find('i') != -1:
if len(format) == 2:
# we need to insert a length
format = '%%%ii' % (self._madParmObj.getParmWidth(parm)-1)
dataWidth = self._madParmObj.getParmWidth(parm)
else:
dataWidth = int(format[1:-1])
elif format.find('S') != -1 or format.find('s') != -1:
dataWidth = int(format[1:-1])
else:
raise
width = max(self._madParmObj.getParmWidth(parm), dataWidth)
formatStr += '%s' % (format)
formatStr += ' ' * (max(1, width-dataWidth)) # sets spacing between numbers
if len(parm) >= width-1:
# need to truncate name
if summary != 'html':
parmStr += parm[:width-1] + ' '
else:
parmStr += "%s " % (parm[:width-1].upper(),
self._madParmObj.getSimpleParmDescription(parm),
self._madParmObj.getParmUnits(parm),
parm[:width-1].upper())
else:
# pad evenly on both sides
firstHalfSpace = int((width-len(parm))/2)
secHalfSpace = int((width-len(parm)) - firstHalfSpace)
if summary != 'html':
parmStr += ' ' * firstHalfSpace + parm.upper() + ' ' * secHalfSpace
else:
parmStr += ' ' * firstHalfSpace
parmStr += "%s" % (parm[:width-1].upper(),
self._madParmObj.getSimpleParmDescription(parm),
self._madParmObj.getParmUnits(parm),
parm[:width-1].upper())
parmStr += ' ' * secHalfSpace
formatStr += '\n'
firstHeaderPrinted = False # state variable for adding extra space between lines
if summary == 'summary':
if not append or (append and firstWrite):
self._printSummary(f, filterList)
if summary in ('plain', 'summary', 'html') and not showHeaders:
if not append or (append and firstWrite):
# print single header at top
f.write('%s\n' % (parmStr))
if summary == 'html':
f.write('
\n')
if len(self._privList) == 0:
# nothing more to write
if f != sys.stdout:
f.close()
return
# see if only 1D parms are selected, which implies printing only a single line per record
is1D = False
if not selectParms is None:
#make sure its a lowercase list
selectParms = list(selectParms)
selectParms = getLowerCaseList(selectParms)
# see if only 1D parameters are being printed, so that we should only print the first row
is1D = True
recordset = self.getRecordset()
for parm in selectParms:
if recordset[parm][0] != 1:
is1D = False
break
for rec in self._privList:
if rec.getType() != 'data':
continue
if showHeaders:
kinst = rec.getKinst()
instDesc = self._madInstObj.getInstrumentName(kinst)
sDT = rec.getStartDatetime()
sDTStr = sDT.strftime('%Y-%m-%d %H%M:%S')
eDT = rec.getEndDatetime()
eDTStr = eDT.strftime('%H%M:%S')
headerStr = '%s: %s-%s\n' % (instDesc, sDTStr, eDTStr)
if firstHeaderPrinted or summary is None:
f.write('\n%s' % (headerStr))
else:
f.write('%s' % (headerStr))
firstHeaderPrinted = True
f.write('%s\n' % (parmStr))
dataset = rec.getDataset()
if not selectParms is None:
recnoSet = dataset['recno'].copy() # used to see if we are at a new record
dataset_view = dataset[selectParms].copy()
else:
dataset_view = dataset
# modify special values if required
if assumed is not None or knownbad is not None:
for name in dataset_view.dtype.names:
if self._madParmObj.isError(name) and not self.parmIsInt(name):
if assumed is not None:
# set all -1 values to inf
assumedIndices = numpy.where(dataset_view[name] == -1.0)
if len(assumedIndices):
dataset_view[name][assumedIndices] = numpy.Inf
if knownbad is not None:
# set all -2 values to ninf
knownbadIndices = numpy.where(dataset_view[name] == -2.0)
if len(knownbadIndices):
dataset_view[name][knownbadIndices] = numpy.NINF
lastRecno = None
for i in range(len(dataset_view)):
if not selectParms is None:
thisRecno = recnoSet[i]
if is1D and (thisRecno == lastRecno):
continue
data = tuple(list(dataset_view[i]))
try:
text = formatStr % data
except:
# something bad happened - give up and just convert data to a string
textList = [str(item) for item in data]
delimiter = ' '
text = delimiter.join(textList) + '\n'
# modify special values if required
if missing is not None:
if text.find('nan') != -1:
text = re.sub(missing_search, missing, text)
if knownbad is not None:
if text.find('-inf') != -1:
text = re.sub(knownbad_search, knownbad, text)
if assumed is not None:
if text.find('inf') != -1:
text = re.sub(assumed_search, assumed, text)
if summary != 'html':
f.write(text)
else:
f.write(text.replace(' ', ' '))
if summary == 'html':
f.write('
\n')
if not selectParms is None:
lastRecno = thisRecno
if f != sys.stdout:
f.close()
if append:
# remove all records
self._privList = []
def getDType(self):
"""getDType returns the dtype of the table array in this file
"""
return(self._tableDType)
def setDType(self, dtype):
"""setDType sets the dtype of the table array
"""
self._tableDType = dtype
def getRecDType(self):
"""getRecDType returns the dtype of _record_layout
"""
return(self._recDset.dtype)
def getRecordset(self):
"""getRecordset returns the recordset array from the first data record.
Raises IOError if None
"""
if self._recDset is None:
raise IOError('self._recDset is None')
return(self._recDset)
def get1DParms(self):
"""get1DParms returns a list of mnemonics of 1D parms
in file. May be empty if none.
Raises ValueError if self._oneDList is None, since parameters unknown
"""
if self._oneDList is None:
raise ValueError('get1DParms cannot be called before any data records added to this file')
retList = []
for parm in self._oneDList:
retList.append(parm.mnemonic)
return(retList)
def get2DParms(self):
"""get2DParms returns a list of mnemonics of dependent 2D parms
in file. May be empty if none.
Raises ValueError if self._twoDList is None, since parameters unknown
"""
if self._twoDList is None:
raise ValueError('get2DParms cannot be called before any data records added to this file')
retList = []
for parm in self._twoDList:
retList.append(parm.mnemonic)
return(retList)
def getIndSpatialParms(self):
"""getIndSpatialParms returns a list of mnemonics of independent spatial parameters
in file. May be empty if none.
Raises ValueError if self._ind2DList is None, since parameters unknown
"""
if self._ind2DList is None:
raise ValueError('getIndSpatialParms cannot be called before any data records added to this file')
retList = []
for parm in self._ind2DList:
retList.append(parm.mnemonic)
return(retList)
def getArraySplitParms(self):
"""getArraySplitParms returns a list of mnemonics of parameters used to split array. May be empty or None.
"""
return(self._arraySplitParms)
def getParmDim(self, parm):
"""getParmDim returns the dimension (1,2, or 3 for independent spatial parms) of input parm
Raises ValueError if no data records yet.
Raise KeyError if that parameter not found in file
"""
if self._ind2DList is None:
raise ValueError('getParmDim cannot be called before any data records added to this file')
for obj in self._oneDList:
if obj.mnemonic.lower() == parm.lower():
return(1)
# do ind 2D next since they are in both lists
for obj in self._ind2DList:
if obj.mnemonic.lower() == parm.lower():
return(3)
for obj in self._twoDList:
if obj.mnemonic.lower() == parm.lower():
return(2)
raise KeyError('Parm <%s> not found in data' % (str(parm)))
def getStatus(self):
"""getStatus returns the status string
"""
return(self._status)
def setStatus(self, status):
"""setStatus sets the status string
"""
self._status = str(status)
def getEarliestDT(self):
"""getEarliestDT returns the earliest datetime found in file, or None if no data
"""
return(self._earliestDT)
def getLatestDT(self):
"""getLatestDT returns the latest datetime found in file, or None if no data
"""
return(self._latestDT)
def getKinstList(self):
"""getKinstList returns the list of kinst integers in the file
"""
return(self._kinstList)
def getKindatList(self):
"""getKindatList returns the list of kindat integers in the file
"""
return(self._kindatList)
def getRecIndexList(self):
"""getRecIndexList returns a list of record indexes into Table Layout
"""
return(self._recIndexList)
def parmIsInt(self, parm):
"""parmIsInt returns True if this parm (mnemonic) is integer type, False if not
Raises ValueError if parm not in record. or table dtype not yet set
"""
if self._tableDType is None:
raise ValueError('Cannot call parmIsInt until a data record is added')
try:
typeStr = str(self._tableDType[parm.lower()])
except KeyError:
raise ValueError('Parm <%s> not found in file' % (str(parm)))
if typeStr.find('int') != -1:
return(True)
else:
return(False)
def parmIsString(self, parm):
"""parmIsString returns True if this parm (mnemonic) is string type, False if not
Raises ValueError if parm not in record. or table dtype not yet set
"""
if self._tableDType is None:
raise ValueError('Cannot call parmIsInt until a data record is added')
try:
typeStr = str(self._tableDType[parm.lower()])
except KeyError:
raise ValueError('Parm <%s> not found in file' % (str(parm)))
if typeStr.lower().find('s') == -1:
return(False)
else:
return(True)
def getStringFormat(self, parm):
"""getStringFormat returns string format string. Raises error if not string type,
or parm not in record. or table dtype not yet set
"""
if not self.parmIsString(parm):
raise ValueError('parm %s not a string, cannot call getStringFormat' % (str(parm)))
return(str(self._tableDType[parm.lower()]))
def hasArray(self, f):
"""hasArray returns True in f['Data']['Array Layout'] exists, False otherwise
"""
if 'Data' in list(f.keys()):
if 'Array Layout' in list(f['Data'].keys()):
return(True)
return(False)
def getMaxMinValues(self, mnemonic, verifyValid=False):
"""getMaxMinValues returns a tuple of (minimum value, maximum value) of the value
of parm in this file. If verifyValid is True, then only lines with valid 2D data
are included. If no valid values, returns (NaN, NaN). Also updates self._minMaxParmDict
Raise IOError if parm not found
"""
parm = mnemonic.lower()
# for string data, always return (Nan, Nan)
if self._madParmObj.isString(parm):
self._minMaxParmDict[parm] = [numpy.NaN, numpy.NaN]
return((numpy.NaN, numpy.NaN))
# create a merged dataset
datasetList = []
for rec in self._privList:
if rec.getType() == 'data':
datasetList.append(rec._dataset)
if len(datasetList) == 0:
if parm in self._minMaxParmDict:
return(self._minMaxParmDict[parm])
else:
raise IOError('No data records in file')
merged_dataset = numpy.concatenate(datasetList)
if not verifyValid:
# veru simple - just jusr numpy methods
try:
data = merged_dataset[parm]
except:
raise IOError('parm %s not found in file' % (parm))
with warnings.catch_warnings():
warnings.simplefilter("ignore")
minValue = numpy.nanmin(data)
maxValue = numpy.nanmax(data)
if parm not in self._minMaxParmDict:
self._minMaxParmDict[parm] = [minValue, maxValue]
else:
orgMin, orgMax = self._minMaxParmDict[parm]
self._minMaxParmDict[parm] = [min(minValue, orgMin), max(maxValue, orgMax)]
return((minValue, maxValue))
# we need to find the minimum and maximum for only valid data
# first sort by parm so we just need to walk until we find a valid row starting at the top and bottom
sorted_indices = numpy.argsort(merged_dataset[parm])
# find min
minValue = None
for i in sorted_indices:
if numpy.isnan(merged_dataset[parm][i]):
continue
for twoDParm in self.get2DParms():
if self._madParmObj.isString(twoDParm):
continue
if numpy.isnan(merged_dataset[twoDParm][i]):
continue
# make sure its not a special error value
if self._madParmObj.isError(twoDParm) and merged_dataset[twoDParm][i] < 0:
continue
# minimum found
minValue = merged_dataset[parm][i]
break
if not minValue is None:
break
# find max
maxValue = None
for i in reversed(sorted_indices):
if numpy.isnan(merged_dataset[parm][i]):
continue
for twoDParm in self.get2DParms():
if self._madParmObj.isString(twoDParm):
continue
if numpy.isnan(merged_dataset[twoDParm][i]):
continue
# make sure its not a special error value
if self._madParmObj.isError(twoDParm) and merged_dataset[twoDParm][i] < 0:
continue
# minimum found
maxValue = merged_dataset[parm][i]
break
if not maxValue is None:
break
if minValue is None:
minValue = numpy.nan
if maxValue is None:
maxValue = numpy.nan
if parm not in self._minMaxParmDict:
self._minMaxParmDict[parm] = [minValue, maxValue]
else:
orgMin, orgMax = self._minMaxParmDict[parm]
self._minMaxParmDict[parm] = [min(minValue, orgMin), max(maxValue, orgMax)]
return((minValue, maxValue))
def refreshSummary(self):
"""refreshSummary rebuilds the recarray self._experimentParameters
"""
inst = int(self.getKinstList()[0])
delimiter = ','
kinstCodes = []
kinstNames = []
for code in self.getKinstList():
kinstCodes.append(str(int(code)))
kinstNames.append(str(self._madInstObj.getInstrumentName(int(code))))
instrumentCodes = delimiter.join(kinstCodes)
instrumentName = delimiter.join(kinstNames)
categoryStr = self._madInstObj.getCategory(inst)
piStr = self._madInstObj.getContactName(inst)
piEmailStr = self._madInstObj.getContactEmail(inst)
startDateStr = self.getEarliestDT().strftime('%Y-%m-%d %H:%M:%S UT')
endDateStr = self.getLatestDT().strftime('%Y-%m-%d %H:%M:%S UT')
cedarFileName = str(os.path.basename(self._fullFilename))
statusDesc = self._status
instLat = self._madInstObj.getLatitude(inst)
instLon = self._madInstObj.getLongitude(inst)
instAlt = self._madInstObj.getAltitude(inst)
# create kindat description based on all kindats
kindatList = self.getKindatList()
kindatDesc = ''
kindatListStr = ''
if len(kindatList) > 1:
kindatDesc = 'This experiment has %i kinds of data. They are:' % (len(kindatList))
for i, kindat in enumerate(kindatList):
thisKindatDesc = self._madKindatObj.getKindatDescription(kindat, inst)
if not thisKindatDesc:
raise IOError('kindat %i undefined - please add to typeTab.txt' % (kindat))
thisKindatDesc = thisKindatDesc.strip()
kindatDesc += ' %i) %s (code %i)' % (i+1, thisKindatDesc, kindat)
kindatListStr += '%i' % (kindat)
if i < len(kindatList) - 1:
kindatDesc += ', '
kindatListStr += ', '
else:
kindatDesc = self._madKindatObj.getKindatDescription(kindatList[0], inst)
if not kindatDesc:
raise IOError('kindat for %s undefined - please add to typeTab.txt' % (str((kindatList[0], inst))))
kindatDesc = kindatDesc.strip()
kindatListStr += '%i' % (kindatList[0])
# create an expSummary numpy recarray
summArr = numpy.recarray((14,), dtype = [('name', h5py.special_dtype(vlen=str) ),
('value', h5py.special_dtype(vlen=str) )])
summArr['name'][0] = 'instrument'
summArr['name'][1] = 'instrument code(s)'
summArr['name'][2] = 'kind of data file'
summArr['name'][3] = 'kindat code(s)'
summArr['name'][4] = 'start time'
summArr['name'][5] = 'end time'
summArr['name'][6] = 'Cedar file name'
summArr['name'][7] = 'status description'
summArr['name'][8] = 'instrument latitude'
summArr['name'][9] = 'instrument longitude'
summArr['name'][10] = 'instrument altitude'
summArr['name'][11] = 'instrument category'
summArr['name'][12] = 'instrument PI'
summArr['name'][13] = 'instrument PI email'
summArr['value'][0] = instrumentName
summArr['value'][1] = instrumentCodes
summArr['value'][2] = kindatDesc
summArr['value'][3] = kindatListStr
summArr['value'][4] = startDateStr
summArr['value'][5] = endDateStr
summArr['value'][6] = cedarFileName
summArr['value'][7] = statusDesc
summArr['value'][8] = str(instLat)
summArr['value'][9] = str(instLon)
summArr['value'][10] = str(instAlt)
summArr['value'][11] = categoryStr
summArr['value'][12] = piStr
summArr['value'][13] = piEmailStr
self._experimentParameters = summArr
def createCatalogTimeSection(self):
"""createCatalogTimeSection will return all the lines in the catalog record that
describe the start and end time of the data records.
Inputs: None
Returns: a tuple with three items 1) a string in the format of the time section of a
catalog record, 2) earliest datetime, 3) latest datetime
"""
earliestStartTime = self.getEarliestDT()
latestEndTime = self.getLatestDT()
sy = 'IBYRE %4s Beginning year' % (str(earliestStartTime.year))
sd = 'IBDTE %4s Beginning month and day' % (str(earliestStartTime.month*100 + \
earliestStartTime.day))
sh = 'IBHME %4s Beginning UT hour and minute' % (str(earliestStartTime.hour*100 + \
earliestStartTime.minute))
totalCS = earliestStartTime.second*100 + (earliestStartTime.microsecond/10000)
ss = 'IBCSE %4s Beginning centisecond' % (str(totalCS))
ey = 'IEYRE %4s Ending year' % (str(latestEndTime.year))
ed = 'IEDTE %4s Ending month and day' % (str(latestEndTime.month*100 + \
latestEndTime.day))
eh = 'IEHME %4s Ending UT hour and minute' % (str(latestEndTime.hour*100 + \
latestEndTime.minute))
totalCS = latestEndTime.second*100 + (latestEndTime.microsecond/10000)
es = 'IECSE %4s Ending centisecond' % (str(totalCS))
retStr = ''
retStr += sy + (80-len(sy))*' '
retStr += sd + (80-len(sd))*' '
retStr += sh + (80-len(sh))*' '
retStr += ss + (80-len(ss))*' '
retStr += ey + (80-len(ey))*' '
retStr += ed + (80-len(ed))*' '
retStr += eh + (80-len(eh))*' '
retStr += es + (80-len(es))*' '
return((retStr, earliestStartTime, latestEndTime))
def createHeaderTimeSection(self, dataRecList=None):
"""createHeaderTimeSection will return all the lines in the header record that
describe the start and end time of the data records.
Inputs:
dataRecList - if given, examine only those MadrigalDataRecords in dataRecList.
If None (the default), examine all MadrigalDataRecords in this
MadrigalCedarFile
Returns: a tuple with three items 1) a string in the format of the time section of a
header record, 2) earliest datetime, 3) latest datetime
"""
if dataRecList is None:
earliestStartTime = self.getEarliestDT()
latestEndTime = self.getLatestDT()
else:
earliestStartTime = None
latestEndTime = None
for rec in dataRecList:
if rec.getType() != 'data':
continue
#earliest time
thisTime = rec.getStartDatetime()
if earliestStartTime is None:
earliestStartTime = thisTime
if earliestStartTime > thisTime:
earliestStartTime = thisTime
#latest time
thisTime = rec.getEndDatetime()
if latestEndTime is None:
latestEndTime = thisTime
if latestEndTime < thisTime:
latestEndTime = thisTime
sy = 'IBYRT %4s Beginning year' % (str(earliestStartTime.year))
sd = 'IBDTT %4s Beginning month and day' % (str(earliestStartTime.month*100 + \
earliestStartTime.day))
sh = 'IBHMT %4s Beginning UT hour and minute' % (str(earliestStartTime.hour*100 + \
earliestStartTime.minute))
totalCS = earliestStartTime.second*100 + (earliestStartTime.microsecond/10000)
ss = 'IBCST %4s Beginning centisecond' % (str(totalCS))
ey = 'IEYRT %4s Ending year' % (str(latestEndTime.year))
ed = 'IEDTT %4s Ending month and day' % (str(latestEndTime.month*100 + \
latestEndTime.day))
eh = 'IEHMT %4s Ending UT hour and minute' % (str(latestEndTime.hour*100 + \
latestEndTime.minute))
totalCS = latestEndTime.second*100 + (latestEndTime.microsecond/10000)
es = 'IECST %4s Ending centisecond' % (str(totalCS))
retStr = ''
retStr += sy + (80-len(sy))*' '
retStr += sd + (80-len(sd))*' '
retStr += sh + (80-len(sh))*' '
retStr += ss + (80-len(ss))*' '
retStr += ey + (80-len(ey))*' '
retStr += ed + (80-len(ed))*' '
retStr += eh + (80-len(eh))*' '
retStr += es + (80-len(es))*' '
return((retStr, earliestStartTime, latestEndTime))
def updateMinMaxParmDict(self):
"""updateMinMaxParmDict updates self._minMaxParmDict
"""
for parm in self.get1DParms() + self.get2DParms():
self.getMaxMinValues(parm, True)
def _writeHdf5Metadata(self, f, refreshCatHeadTimes):
"""_writeHdf5Metadata is responsible for writing Metadata group in Hdf5 file
Can be called multiple times, but will only write "Experiment Notes" if any catalog or header
records found
Inputs:
f - the open h5py.File object
refreshCatHeadTimes - if True, update start and and times in the catalog and header
records to represent the times in the data. If False, use existing times in those records.
"""
if "Metadata" not in list(f.keys()):
# metadata tables that are only updated once
metadataGroup = f.create_group("Metadata")
self._addDataParametersTable(metadataGroup)
if self._experimentParameters is None:
self.refreshSummary()
metadataGroup.create_dataset('Experiment Parameters', data=self._experimentParameters)
# create Independent Spatial Parameters recordset
indParmList = self.getIndSpatialParms()
indParmDesc = []
longestMnemStr = 1
longestDescStr = 1
for indParm in indParmList:
if len(indParm) > longestMnemStr:
longestMnemStr = len(indParm)
indParmDesc.append(self._madParmObj.getSimpleParmDescription(indParm))
if len(indParmDesc[-1]) > longestDescStr:
longestDescStr = len(indParmDesc[-1])
indSpatialArr = numpy.recarray((len(indParmList),),
dtype = [('mnemonic', '|S%i' % (longestMnemStr)),
('description', '|S%i' % (longestDescStr))])
for i, indParm in enumerate(indParmList):
indSpatialArr[i]['mnemonic'] = indParm
indSpatialArr[i]['description'] = indParmDesc[i]
metadataGroup.create_dataset('Independent Spatial Parameters', data=indSpatialArr)
else:
metadataGroup = f["Metadata"]
self._writeRecordLayout(metadataGroup)
self.writeExperimentNotes(metadataGroup, refreshCatHeadTimes)
def _addArrayDump(self):
"""_addArrayDump adds Array Layout to an Hdf5 file created by dump.
Inputs: None
Outputs: None
Affects: adds "Array Layout" group to f['Data']
"""
if self._format != 'hdf5':
raise ValueError('Can only call _addArrayDump for Hdf5 files written using dump')
self._createArrayLayout2()
# try to gzip 2d array data
filename, file_extension = os.path.splitext(self._fullFilename)
# tmp file name to use to run h5repack
tmpFile = filename + '_tmp' + file_extension
cmd = 'h5repack -i %s -o %s --filter=\"Data/Array Layout\":GZIP=4' % (self._fullFilename, tmpFile)
try:
subprocess.check_call(shlex.split(cmd))
except:
traceback.print_exc()
return
shutil.move(tmpFile, self._fullFilename)
def _writeHdf5Data(self, f):
"""_writeHdf5Data is responsible for writing Data group in Hdf5 file
Input: f - the open h5py.File object
"""
tableName = 'Table Layout'
# create a merged dataset
datasetList = []
nrows = None
for rec in self._privList:
if rec.getType() == 'data':
datasetList.append(rec._dataset)
if nrows is None:
nrows = rec.getNrow()
if len(datasetList) == 0:
raise IOError('No data records in file')
merged_dataset = numpy.concatenate(datasetList)
if "Data" not in list(f.keys()):
dataGroup = f.create_group("Data")
dset = dataGroup.create_dataset(tableName, data=merged_dataset, compression='gzip', maxshape=(None,),
chunks=True)
else:
# append
dataGroup = f["Data"]
added_len = merged_dataset.shape[0]
dset = dataGroup[tableName]
dset.resize((dset.shape[0] + added_len,))
dset.write_direct(merged_dataset,None,numpy.s_[-1*added_len:])
del(merged_dataset)
def _createArrayLayout(self, f, arraySplittingParms):
"""_createArrayLayout will append an Array Layout to the open Hdf5 file f
arraySplittingParms - a list of parameters as mnemonics used to split
arrays into subarrays. For example, beamcode would split data with separate beamcodes
into separate arrays. The number of separate arrays will be up to the product of the number of
unique values found for each parameter, with the restriction that combinations with no records will
not create a separate array.
IOError raised if Array Layout already exists - this can only be called once
"""
if self._skipArray:
return
# get info from recarrays that already exist
table = f['Data']['Table Layout']
recLayout = f['Metadata']['_record_layout']
metadataGroup = f['Metadata']
# inputs
indParmList = self.getIndSpatialParms()
if "Array Layout" in list(f['Data'].keys()):
raise IOError('Array Layout already created - this can only be created once.')
# add "Parameters Used to Split Array Data" to Metadata
if len(arraySplittingParms) > 0:
arrSplitParmDesc = []
longestMnemStr = 0
longestDescStr = 0
for arrSplitParm in arraySplittingParms:
if len(arrSplitParm) > longestMnemStr:
longestMnemStr = len(arrSplitParm)
arrSplitParmDesc.append(self._madParmObj.getSimpleParmDescription(arrSplitParm))
if len(arrSplitParmDesc[-1]) > longestDescStr:
longestDescStr = len(arrSplitParmDesc[-1])
arrSplitArr = numpy.recarray((len(arraySplittingParms),),
dtype = [('mnemonic', '|S%i' % (longestMnemStr)),
('description', '|S%i' % (longestDescStr))])
for i, arrSplitParm in enumerate(arraySplittingParms):
arrSplitArr[i]['mnemonic'] = arrSplitParm
arrSplitArr[i]['description'] = arrSplitParmDesc[i]
metadataGroup.create_dataset('Parameters Used to Split Array Data', data=arrSplitArr)
arrGroup = f['Data'].create_group("Array Layout")
arraySplittingList = [] # list of lists of all existing values for each array splitting parm
for parm in arraySplittingParms:
arraySplittingList.append(numpy.unique(table[parm]))
tableSubsets = []
for combo in itertools.product(*arraySplittingList):
tableSubsets.append(_TableSubset(arraySplittingParms, combo, table))
for tableSubset in tableSubsets:
uniqueIndValueDict = {}
for indParm in indParmList:
uniqueIndValueDict[indParm] = numpy.unique(tableSubset.table[indParm])
unique_times = numpy.unique(tableSubset.table['ut1_unix'])
group_name = tableSubset.getGroupName()
if group_name != None:
thisGroup = arrGroup.create_group(tableSubset.getGroupName())
else:
thisGroup = arrGroup # no splitting, so no subgroup needed
self._addLayoutDescription(thisGroup)
ts_dset = thisGroup.create_dataset("timestamps", data=unique_times)
for indParm in indParmList:
thisGroup.create_dataset(indParm, data=uniqueIndValueDict[indParm])
# one D parm arrays
oneDGroup = thisGroup.create_group('1D Parameters')
self._addDataParametersTable(oneDGroup, 1)
for parm in recLayout.dtype.names[len(self.requiredFields):]:
if recLayout[parm][0] == 1:
dset = tableSubset.table[parm][tableSubset.oneDIndices]
oneDGroup.create_dataset(parm, data=dset)
# two D parm arrays
twoDGroup = thisGroup.create_group('2D Parameters')
self._addDataParametersTable(twoDGroup, 2)
# get shape of 2D data (number of dimensions dynamic)
twoDShape = []
for indParm in indParmList:
twoDShape.append(len(uniqueIndValueDict[indParm]))
twoDShape.append(len(unique_times))
dsetDict = {} # key = parm, value 2D dataset
for parm in recLayout.dtype.names[len(self.requiredFields):]:
if recLayout[parm][0] == 2:
dsetDict[parm] = numpy.zeros(twoDShape, dtype=table.dtype[parm])
if self.parmIsInt(parm):
dsetDict[parm][:] = self.missing_int
else:
dsetDict[parm][:] = self.missing
# precalculate the indices
# time index
time_indices = numpy.zeros((1, len(tableSubset.table)), numpy.int)
times = tableSubset.table['ut1_unix']
for i in range(len(unique_times)):
t = unique_times[i]
indices = numpy.argwhere(times == t)
time_indices[0, indices] = i
# ind parm indexes
indParmIndexDict = {}
for indParm in indParmList:
values = tableSubset.table[indParm]
indParmIndexDict[indParm] = numpy.zeros((1, len(tableSubset.table)), numpy.int)
for i in range(len(uniqueIndValueDict[indParm])):
v = uniqueIndValueDict[indParm][i]
indices = numpy.argwhere(values == v)
indParmIndexDict[indParm][0, indices] = i
# concatenate
tableIndex = None
for indParm in indParmList:
if tableIndex is None:
tableIndex = indParmIndexDict[indParm]
else:
tableIndex = numpy.concatenate((tableIndex, indParmIndexDict[indParm]), 0)
tableIndex = numpy.concatenate((tableIndex, time_indices), 0)
# set 2D parms
for parm in recLayout.dtype.names[len(self.requiredFields):]:
if recLayout[parm][0] == 2:
if len(indParmList) == 1:
dsetDict[parm][tableIndex[0], tableIndex[1]] = tableSubset.table[parm]
elif len(indParmList) == 2:
dsetDict[parm][tableIndex[0], tableIndex[1], tableIndex[2]] = tableSubset.table[parm]
elif len(indParmList) == 3:
dsetDict[parm][tableIndex[0], tableIndex[1], tableIndex[2], tableIndex[3]] = tableSubset.table[parm]
else:
raise ValueError('Can not handle more than 3 independent spatial parms - there are %i' % (len(indParmList)))
# write the datasets out
for parm in recLayout.dtype.names[len(self.requiredFields):]:
if recLayout[parm][0] == 2:
twoDGroup.create_dataset(parm, data=dsetDict[parm], compression='gzip')
def _createArrayLayout2(self):
"""_createArrayLayout2 will append an Array Layout to the Hdf5 file. Called after dump.
IOError raised if Array Layout already exists - this can only be called once
"""
f = h5py.File(self._fullFilename, 'a')
# get info from recarrays that already exist
table = f['Data']['Table Layout']
recLayout = f['Metadata']['_record_layout']
metadataGroup = f['Metadata']
# inputs
indParmList = self.getIndSpatialParms()
if "Array Layout" in list(f['Data'].keys()):
raise IOError('Array Layout already created - this can only be created once.')
# update metadata now that file finished
self._writeHdf5Metadata(f, refreshCatHeadTimes=True)
if self._skipArray:
f.close()
return
# now that self._arrDict is competely filled out, create a similar dict, except that
# all sets are replaced by ordered python arrays
total_allowed_records = 0 # make sure all ind parameters declared by checking that the product
# of all the ind parm value lengths time the numner of time is equal
# to or greater than the number of total records
arrDict = {}
for key in list(self._arrDict.keys()):
total_ind_parm_lens = []
thisDict = self._arrDict[key]
arrDict[key] = {}
for key2 in list(thisDict.keys()):
thisSet = thisDict[key2]
# convert to ordered numpy array
thisList = list(thisSet)
thisList.sort()
total_ind_parm_lens.append(len(thisList))
if self._madParmObj.isInteger(key2):
data = numpy.array(thisList, dtype=numpy.int64)
elif self._madParmObj.isString(key2):
strLen = self._madParmObj.getStringLen(key2)
data = numpy.array(thisList, dtype=numpy.dtype((str, strLen)))
else:
data = numpy.array(thisList, dtype=numpy.float64)
arrDict[key][key2] = data
# add the max number of records for this group
tmp = total_ind_parm_lens[0]
for v in total_ind_parm_lens[1:]:
tmp *= v
total_allowed_records += tmp
# protect against too many ind parm combinations (too sparse an array)
total_ind_combos = total_ind_parm_lens[1]
for v in total_ind_parm_lens[2:]:
total_ind_combos *= v
if total_ind_combos > 1000000:
print(('Skipping array creation since %i independent parm combinations would create too big an array' % (total_ind_combos)))
f.close()
return
if len(table) > total_allowed_records:
raise ValueError('Found %i lines in table, but values of times and ind parms %s allow maximum of %i values in file %s' % \
(len(table), str(indParmList), total_allowed_records, self._fullFilename))
# add "Parameters Used to Split Array Data" to Metadata
if not self._arraySplitParms == []:
arrSplitParmDesc = []
longestMnemStr = 0
longestDescStr = 0
for arrSplitParm in self._arraySplitParms:
if len(arrSplitParm) > longestMnemStr:
longestMnemStr = len(arrSplitParm)
arrSplitParmDesc.append(self._madParmObj.getSimpleParmDescription(arrSplitParm))
if len(arrSplitParmDesc[-1]) > longestDescStr:
longestDescStr = len(arrSplitParmDesc[-1])
arrSplitArr = numpy.recarray((len(self._arraySplitParms),),
dtype = [('mnemonic', '|S%i' % (longestMnemStr)),
('description', '|S%i' % (longestDescStr))])
for i, arrSplitParm in enumerate(self._arraySplitParms):
arrSplitArr[i]['mnemonic'] = arrSplitParm
arrSplitArr[i]['description'] = arrSplitParmDesc[i]
metadataGroup.create_dataset('Parameters Used to Split Array Data', data=arrSplitArr)
arrGroup = f['Data'].create_group("Array Layout")
# stage 1 - write all needed tables with nan values
for key in list(arrDict.keys()):
if key != '':
groupName = self._getGroupName(key)
thisGroup = arrGroup.create_group(groupName)
else:
thisGroup = arrGroup # no subgroups needed
self._addLayoutDescription(thisGroup)
"""thisDict is dict of key = 'ut1_unix' and ind 2d parm names (possibly minus arraySplitParms), values
= ordered numpy array of all unique values"""
thisDict = arrDict[key]
unique_times = thisDict['ut1_unix']
ts_dset = thisGroup.create_dataset("timestamps", data=unique_times)
for indParm in indParmList:
if indParm in self._arraySplitParms:
# not needed
continue
dataset = thisDict[indParm]
thisGroup.create_dataset(indParm, data=dataset)
# one D parm arrays
oneDGroup = thisGroup.create_group('1D Parameters')
self._addDataParametersTable(oneDGroup, 1)
for parm in recLayout.dtype.names[len(self.requiredFields):]:
if recLayout[parm][0] == 1:
if self._madParmObj.isInteger(parm):
dset = numpy.zeros((len(unique_times),), dtype=numpy.int64)
dset[:] = numpy.iinfo(numpy.int64).min
elif self._madParmObj.isString(parm):
strLen = self._madParmObj.getStringLen(parm)
dset = numpy.zeros((len(unique_times),), dtype=numpy.dtype((str, strLen)))
else:
dset = numpy.zeros((len(unique_times),), dtype=numpy.float64)
dset[:] = numpy.nan
oneDGroup.create_dataset(parm, data=dset)
# two D parm arrays
twoDGroup = thisGroup.create_group('2D Parameters')
self._addDataParametersTable(twoDGroup, 2)
# get shape of 2D data (number of dimensions dynamic)
twoDShape = []
for indParm in indParmList:
if indParm in self._arraySplitParms:
# not needed
continue
twoDShape.append(len(thisDict[indParm]))
twoDShape.append(len(unique_times))
for parm in recLayout.dtype.names[len(self.requiredFields):]:
if recLayout[parm][0] == 2:
if self._madParmObj.isInteger(parm):
dset = numpy.zeros(twoDShape, dtype=numpy.int64)
dset[:] = numpy.iinfo(numpy.int64).min
elif self._madParmObj.isString(parm):
strLen = self._madParmObj.getStringLen(parm)
dset = numpy.zeros(twoDShape, dtype=numpy.dtype((str, strLen)))
else:
dset = numpy.zeros(twoDShape, dtype=numpy.float64)
dset[:] = numpy.nan
twoDGroup.create_dataset(parm, data=dset)
# flush file
f.close()
f = h5py.File(self._fullFilename, 'a')
table = f['Data']['Table Layout']
recLayout = f['Metadata']['_record_layout']
# now loop through Table Layout and populate all the 1 and 2 d arrays
step = 10 # number of records to load at once
total_steps = int(len(self._recIndexList) / step)
if total_steps * step < len(self._recIndexList):
total_steps += 1
for i in range(total_steps):
startTimeIndex = i*step
if (i+1)*step < len(self._recIndexList) - 1:
endTimeIndex = (i+1)*step - 1
else:
endTimeIndex = len(self._recIndexList) - 1
table_data = table[self._recIndexList[startTimeIndex][0]:self._recIndexList[endTimeIndex][1]]
# loop through all groups
for key in list(arrDict.keys()):
tableSubset = _TableSubset(self._arraySplitParms, key, table_data)
# its possible no data in this slice for this subset
if len(tableSubset.table) == 0:
continue
timestamps = arrDict[key]['ut1_unix']
# get index of first and last time found
first_ut1_unix = tableSubset.table[0]['ut1_unix']
last_ut1_unix = tableSubset.table[-1]['ut1_unix']
time_index_1 = numpy.searchsorted(timestamps, first_ut1_unix)
time_index_2 = numpy.searchsorted(timestamps, last_ut1_unix) + 1
groupName = tableSubset.getGroupName()
# get shape of 2D data (number of dimensions dynamic)
twoDShape = []
for indParm in indParmList:
if indParm in self._arraySplitParms:
# not needed
continue
twoDShape.append(len(arrDict[key][indParm]))
twoDShape.append(time_index_2 - time_index_1)
# ind parm indexes
indParmIndexDict = {}
for indParm in indParmList:
if indParm in self._arraySplitParms:
# not needed
continue
values = tableSubset.table[indParm]
indParmIndexDict[indParm] = numpy.zeros((len(tableSubset.table),), numpy.int)
for i in range(len(arrDict[key][indParm])):
v = arrDict[key][indParm][i]
indices = numpy.argwhere(values == v)
indParmIndexDict[indParm][indices] = i
# finally time dimension
values = tableSubset.table['ut1_unix']
timeIndices = numpy.zeros((len(tableSubset.table),), numpy.int)
thisTimestampArr = numpy.unique(tableSubset.table['ut1_unix'])
for i in range(len(thisTimestampArr)):
v = thisTimestampArr[i]
indices = numpy.argwhere(values == v)
timeIndices[indices] = i
# concatenate
tableIndex = []
for indParm in indParmList:
if indParm in self._arraySplitParms:
# not needed
continue
tableIndex.append(indParmIndexDict[indParm])
tableIndex.append(timeIndices)
for parm in recLayout.dtype.names[len(self.requiredFields):]:
if recLayout[parm][0] == 1:
dset = tableSubset.table[parm][tableSubset.oneDIndices]
if not groupName is None:
f['Data']['Array Layout'][groupName]['1D Parameters'][parm][time_index_1:time_index_2] = dset
else:
f['Data']['Array Layout']['1D Parameters'][parm][time_index_1:time_index_2] = dset
elif recLayout[parm][0] == 2:
if self._madParmObj.isInteger(parm):
dset2 = numpy.zeros(tuple(twoDShape), dtype=numpy.int64)
dset2[:] = numpy.iinfo(numpy.int64).min
elif self._madParmObj.isString(parm):
strLen = self._madParmObj.getStringLen(parm)
dset2 = numpy.zeros(tuple(twoDShape), dtype=numpy.dtype((str, strLen)))
dset2[:] = ''
else:
dset2 = numpy.zeros(tuple(twoDShape), dtype=numpy.float64)
dset2[:] = numpy.nan
dset2[tuple(tableIndex)] = tableSubset.table[parm]
if not groupName is None:
fdata = f['Data']['Array Layout'][groupName]['2D Parameters'][parm]
else:
fdata = f['Data']['Array Layout']['2D Parameters'][parm]
if len(indParmList) - self._num2DSplit == 1:
fdata[:,time_index_1:time_index_2] = dset2
elif len(indParmList) - self._num2DSplit == 2:
fdata[:,:,time_index_1:time_index_2] = dset2
elif len(indParmList) - self._num2DSplit == 3:
fdata[:,:,:,time_index_1:time_index_2] = dset2
elif len(indParmList) - self._num2DSplit == 4:
fdata[:,:,:,:,time_index_1:time_index_2] = dset2
elif len(indParmList) - self._num2DSplit == 5:
fdata[:,:,:,:,:,time_index_1:time_index_2] = dset2
else:
raise ValueError('Can not handle more than 5 independent spatial parms - there are %i' % (len(indParmList)))
f.close()
def _writeNetCDF4(self, f, arraySplittingParms):
"""_writeNetCDF4 will write to a netCDF4 file f
arraySplittingParms - a list of parameters as mnemonics used to split
arrays into subarrays. For example, beamcode would split data with separate beamcodes
into separate arrays. The number of separate arrays will be up to the product of the number of
unique values found for each parameter, with the restriction that combinations with no records will
not create a separate array.
"""
# create merged datasets table and recLayout
datasetList = []
recordList = []
for rec in self._privList:
if rec.getType() == 'data':
datasetList.append(rec._dataset)
if len(recordList) == 0:
recordList.append(rec._recordSet)
elif rec.getType() == 'catalog':
f.catalog_text = rec.getText()
elif rec.getType() == 'header':
f.header_text = rec.getText()
if len(datasetList) == 0:
raise IOError('No data records in file')
table = numpy.concatenate(datasetList)
recLayout = numpy.concatenate(recordList)
if self._experimentParameters is None:
self.refreshSummary()
# write Experiment Parameters
for i in range(len(self._experimentParameters)):
name = self._experimentParameters['name'][i]
# make text acceptable attribute names
if type(name) in (bytes, numpy.bytes_):
name = name.replace(b' ', b'_')
name = name.replace(b'(s)', b'')
else:
name = name.replace(' ', '_')
name = name.replace('(s)', '')
f.setncattr(name, self._experimentParameters['value'][i])
indParmList = self.getIndSpatialParms()
# add "Parameters Used to Split Array Data" to Metadata
if len(arraySplittingParms) > 0:
arrSplitParmDesc = ''
for arrSplitParm in arraySplittingParms:
arrSplitParmDesc += '%s: ' % (arrSplitParm)
arrSplitParmDesc += '%s' % (self._madParmObj.getSimpleParmDescription(arrSplitParm))
if arrSplitParm != arraySplittingParms[-1]:
arrSplitParmDesc += ' -- '
f.parameters_used_to_split_data = arrSplitParmDesc
arraySplittingList = [] # list of lists of all existing values for each array splitting parm
for parm in arraySplittingParms:
if type(parm) in (bytes, numpy.bytes_):
parm = parm.decode('utf-8')
arraySplittingList.append(numpy.unique(table[parm]))
tableSubsets = []
for combo in itertools.product(*arraySplittingList):
tableSubsets.append(_TableSubset(arraySplittingParms, combo, table))
for tableSubset in tableSubsets:
uniqueIndValueDict = {}
for indParm in indParmList:
uniqueIndValueDict[indParm] = numpy.unique(tableSubset.table[indParm])
unique_times = numpy.unique(tableSubset.table['ut1_unix'])
group_name = tableSubset.getGroupName()
if group_name != None:
group_name = group_name.strip().replace(' ', '_')
thisGroup = f.createGroup(group_name)
else:
thisGroup = f # no splitting, so no subgroup needed
# next step - create dimensions
dims = []
thisGroup.createDimension("timestamps", len(unique_times))
timeVar = thisGroup.createVariable("timestamps", 'f8', ("timestamps",),
zlib=True)
timeVar.units = 'Unix seconds'
timeVar.description = 'Number of seconds since UT midnight 1970-01-01'
timeVar[:] = unique_times
dims.append("timestamps")
for indParm in indParmList:
thisGroup.createDimension(indParm, len(uniqueIndValueDict[indParm]))
if self._madParmObj.isInteger(indParm):
thisVar = thisGroup.createVariable(indParm, 'i8', (indParm,), zlib=True)
elif self._madParmObj.isString(indParm):
thisVar = thisGroup.createVariable(indParm, self.getStringFormat(indParm), (indParm,), zlib=True)
else:
thisVar = thisGroup.createVariable(indParm, 'f8', (indParm,), zlib=True)
thisVar[:] = uniqueIndValueDict[indParm]
thisVar.units = self._madParmObj.getParmUnits(indParm)
thisVar.description = self._madParmObj.getSimpleParmDescription(indParm)
dims.append(indParm)
# create one and two D parm arrays, set 1D
twoDVarDict = {} # key = parm name, value = netCDF4 variable
for parm in recLayout.dtype.names[len(self.requiredFields):]:
if recLayout[parm][0] == 1:
dset = tableSubset.table[parm][tableSubset.oneDIndices]
if self.parmIsInt(parm):
oneDVar = thisGroup.createVariable(parm, 'i8', (dims[0],), zlib=True)
elif self.parmIsString(parm):
oneDVar = thisGroup.createVariable(parm, self.getStringFormat(parm), (dims[0],),
zlib=True)
else: # float
oneDVar = thisGroup.createVariable(parm, 'f8', (dims[0],), zlib=True)
oneDVar.units = self._madParmObj.getParmUnits(parm)
oneDVar.description = self._madParmObj.getSimpleParmDescription(parm)
try:
oneDVar[:] = dset
except:
raise ValueError('There may be an issue with array splitting because more records than times')
elif recLayout[parm][0] == 2:
if self.parmIsInt(parm):
twoDVarDict[parm] = thisGroup.createVariable(parm, 'i8', dims, zlib=True)
elif self.parmIsString(parm):
twoDVarDict[parm] = thisGroup.createVariable(parm, self.getStringFormat(parm), dims, zlib=True)
else:
twoDVarDict[parm] = thisGroup.createVariable(parm, 'f8', dims, zlib=True)
twoDVarDict[parm].units = self._madParmObj.getParmUnits(parm)
twoDVarDict[parm].description = self._madParmObj.getSimpleParmDescription(parm)
# two D parm arrays
# get shape of 2D data (number of dimensions dynamic)
twoDShape = []
twoDShape.append(len(unique_times))
for indParm in indParmList:
twoDShape.append(len(uniqueIndValueDict[indParm]))
dsetDict = {} # key = parm, value 2D dataset
for parm in recLayout.dtype.names[len(self.requiredFields):]:
if recLayout[parm][0] == 2:
dsetDict[parm] = numpy.zeros(twoDShape, dtype=table.dtype[parm])
if self.parmIsInt(parm):
dsetDict[parm][:] = self.missing_int
else:
dsetDict[parm][:] = self.missing
# precalculate the indices
# time index
time_indices = numpy.zeros((1, len(tableSubset.table)), numpy.int)
times = tableSubset.table['ut1_unix']
for i in range(len(unique_times)):
t = unique_times[i]
indices = numpy.argwhere(times == t)
time_indices[0, indices] = i
# ind parm indexes
indParmIndexDict = {}
for indParm in indParmList:
values = tableSubset.table[indParm]
indParmIndexDict[indParm] = numpy.zeros((1, len(tableSubset.table)), numpy.int)
for i in range(len(uniqueIndValueDict[indParm])):
v = uniqueIndValueDict[indParm][i]
indices = numpy.argwhere(values == v)
indParmIndexDict[indParm][0, indices] = i
# concatenate
tableIndex = time_indices
for indParm in indParmList:
tableIndex = numpy.concatenate((tableIndex, indParmIndexDict[indParm]), 0)
# set 2D parms
for parm in recLayout.dtype.names[len(self.requiredFields):]:
if recLayout[parm][0] == 2:
if len(indParmList) == 1:
dsetDict[parm][tableIndex[0], tableIndex[1]] = tableSubset.table[parm]
elif len(indParmList) == 2:
dsetDict[parm][tableIndex[0], tableIndex[1], tableIndex[2]] = tableSubset.table[parm]
elif len(indParmList) == 3:
dsetDict[parm][tableIndex[0], tableIndex[1], tableIndex[2], tableIndex[3]] = tableSubset.table[parm]
elif len(indParmList) == 0:
continue
else:
raise ValueError('Can not handle more than 3 independent spatial parms - there are %i' % (len(indParmList)))
# write the datasets out
for parm in recLayout.dtype.names[len(self.requiredFields):]:
if recLayout[parm][0] == 2:
if len(indParmList) == 1:
twoDVarDict[parm][:,:] = dsetDict[parm]
elif len(indParmList) == 2:
twoDVarDict[parm][:,:,:] = dsetDict[parm]
elif len(indParmList) == 3:
twoDVarDict[parm][:,:,:,:] = dsetDict[parm]
def _firstDumpNetCDF4(self, f, parmIndexDict):
"""_firstDumpNetCDF4 will dump initial data to a netCDF4 file f. Called via dump
parmIndexDict - is a dictionary with key = timestamps and ind spatial parm names,
value = dictionary of keys = unique values, value = index of that value. Can only be used when
arraySplittingParms == []
"""
# create merged datasets table and recLayout
datasetList = []
recordList = []
for rec in self._privList:
if rec.getType() == 'data':
datasetList.append(rec._dataset)
if len(recordList) == 0:
recordList.append(rec._recordSet)
elif rec.getType() == 'catalog':
f.catalog_text = rec.getText()
elif rec.getType() == 'header':
f.header_text = rec.getText()
if len(datasetList) == 0:
raise IOError('No data records in file')
table = numpy.concatenate(datasetList)
recLayout = numpy.concatenate(recordList)
if self._experimentParameters is None:
self.refreshSummary()
# write Experiment Parameters
for i in range(len(self._experimentParameters)):
name = self._experimentParameters['name'][i]
# make text acceptable attribute names
if type(name) in (bytes, numpy.bytes_):
name = name.replace(b' ', b'_')
name = name.replace(b'(s)', b'')
else:
name = name.replace(' ', '_')
name = name.replace('(s)', '')
f.setncattr(name, self._experimentParameters['value'][i])
indParmList = self.getIndSpatialParms()
uniqueIndValueDict = {}
for indParm in indParmList:
uniqueIndValueDict[indParm] = numpy.array(list(parmIndexDict[indParm].keys()))
unique_times = numpy.array(list(parmIndexDict['ut1_unix'].keys()))
thisGroup = f # no splitting, so no subgroup needed
# next step - create dimensions
dims = []
thisGroup.createDimension("timestamps", len(unique_times))
timeVar = thisGroup.createVariable("timestamps", 'f8', ("timestamps",),
zlib=True)
timeVar.units = 'Unix seconds'
timeVar.description = 'Number of seconds since UT midnight 1970-01-01'
timeVar[:] = unique_times
dims.append("timestamps")
for indParm in indParmList:
thisGroup.createDimension(indParm, len(uniqueIndValueDict[indParm]))
if self._madParmObj.isInteger(indParm):
thisVar = thisGroup.createVariable(indParm, 'i8', (indParm,), zlib=False)
elif self._madParmObj.isString(indParm):
thisVar = thisGroup.createVariable(indParm, self.getStringFormat(indParm), (indParm,), zlib=False)
else:
thisVar = thisGroup.createVariable(indParm, 'f8', (indParm,), zlib=False)
thisVar[:] = uniqueIndValueDict[indParm]
thisVar.units = self._madParmObj.getParmUnits(indParm)
thisVar.description = self._madParmObj.getSimpleParmDescription(indParm)
dims.append(indParm)
# create one and two D parm arrays, set 1D
twoDVarDict = {} # key = parm name, value = netCDF4 variable
for parm in recLayout.dtype.names[len(self.requiredFields):]:
if recLayout[parm][0] == 1:
dset = table[parm][:]
if self.parmIsInt(parm):
oneDVar = thisGroup.createVariable(parm, 'i8', (dims[0],), zlib=False)
elif self.parmIsString(parm):
oneDVar = thisGroup.createVariable(parm, self.getStringFormat(parm), (dims[0],),
zlib=False)
else: # float
oneDVar = thisGroup.createVariable(parm, 'f8', (dims[0],), zlib=False)
oneDVar.units = self._madParmObj.getParmUnits(parm)
oneDVar.description = self._madParmObj.getSimpleParmDescription(parm)
lastTS = 0.0
for i, ts in enumerate(table['ut1_unix']):
if ts != lastTS:
# set it
oneDVar[parmIndexDict['ut1_unix'][ts]] = dset[i]
lastTS = ts
elif recLayout[parm][0] == 2:
if self.parmIsInt(parm):
twoDVarDict[parm] = thisGroup.createVariable(parm, 'i8', dims, zlib=False)
elif self.parmIsString(parm):
twoDVarDict[parm] = thisGroup.createVariable(parm, self.getStringFormat(parm), dims, zlib=False)
else:
twoDVarDict[parm] = thisGroup.createVariable(parm, 'f8', dims, zlib=False, fill_value=numpy.nan)
twoDVarDict[parm].units = self._madParmObj.getParmUnits(parm)
twoDVarDict[parm].description = self._madParmObj.getSimpleParmDescription(parm)
# set 2D parms
for i in range(len(table)):
parmIndices = [parmIndexDict['ut1_unix'][table['ut1_unix'][i]]]
for indParm in indParmList:
item = table[indParm][i]
if type(item) in (bytes, numpy.bytes_):
item = item.decode('utf-8')
parmIndices.append(parmIndexDict[indParm][item])
for parm in recLayout.dtype.names[len(self.requiredFields):]:
if recLayout[parm][0] == 2:
if len(indParmList) == 1:
twoDVarDict[parm][parmIndices[0], parmIndices[1]] = table[parm][i]
elif len(indParmList) == 2:
twoDVarDict[parm][parmIndices[0], parmIndices[1], parmIndices[2]] = table[parm][i]
elif len(indParmList) == 3:
twoDVarDict[parm][parmIndices[0], parmIndices[1], parmIndices[2], parmIndices[3]] = table[parm][i]
elif len(indParmList) == 0:
continue
else:
raise ValueError('Can not handle more than 3 independent spatial parms - there are %i' % (len(indParmList)))
def _appendNetCDF4(self, f, parmIndexDict):
"""_appendNetCDF4 will dump appended data to a netCDF4 file f. Called via dump
parmIndexDict - is a dictionary with key = timestamps and ind spatial parm names,
value = dictionary of keys = unique values, value = index of that value. Can only be used when
arraySplittingParms == []
"""
# create merged datasets table and recLayout
datasetList = []
recordList = []
for rec in self._privList:
if rec.getType() == 'data':
datasetList.append(rec._dataset)
if len(recordList) == 0:
recordList.append(rec._recordSet)
elif rec.getType() == 'catalog':
f.catalog_text = rec.getText()
elif rec.getType() == 'header':
f.header_text = rec.getText()
if len(datasetList) == 0:
raise IOError('No data records in file')
table = numpy.concatenate(datasetList)
recLayout = numpy.concatenate(recordList)
indParmList = self.getIndSpatialParms()
uniqueIndValueDict = {}
for indParm in indParmList:
uniqueIndValueDict[indParm] = numpy.array(list(parmIndexDict[indParm].keys()))
unique_times = numpy.array(list(parmIndexDict['ut1_unix'].keys()))
thisGroup = f # no splitting, so no subgroup needed
# next step - create dimensions
dims = []
timeVar = thisGroup.variables["timestamps"]
dims.append("timestamps")
for indParm in indParmList:
thisVar = thisGroup.variables[indParm]
dims.append(indParm)
# create one and two D parm arrays, set 1D
twoDVarDict = {} # key = parm name, value = netCDF4 variable
for parm in recLayout.dtype.names[len(self.requiredFields):]:
if recLayout[parm][0] == 1:
dset = table[parm][:]
oneDVar = thisGroup.variables[parm]
lastTS = 0.0
for i, ts in enumerate(table['ut1_unix']):
if ts != lastTS:
# set it
oneDVar[parmIndexDict['ut1_unix'][ts]] = dset[i]
lastTS = ts
elif recLayout[parm][0] == 2:
twoDVarDict[parm] = thisGroup.variables[parm]
# set 2D parms
for parm in recLayout.dtype.names[len(self.requiredFields):]:
if recLayout[parm][0] == 2:
for i in range(len(table)):
parmIndices = [parmIndexDict['ut1_unix'][table['ut1_unix'][i]]]
for indParm in indParmList:
item = table[indParm][i]
if type(item) in (bytes, numpy.bytes_):
item = item.decode('utf-8')
parmIndices.append(parmIndexDict[indParm][item])
if len(indParmList) == 1:
twoDVarDict[parm][parmIndices[0], parmIndices[1]] = table[parm][i]
elif len(indParmList) == 2:
twoDVarDict[parm][parmIndices[0], parmIndices[1], parmIndices[2]] = table[parm][i]
elif len(indParmList) == 3:
twoDVarDict[parm][parmIndices[0], parmIndices[1], parmIndices[2], parmIndices[3]] = table[parm][i]
elif len(indParmList) == 0:
continue
else:
raise ValueError('Can not handle more than 3 independent spatial parms - there are %i' % (len(indParmList)))
def _addLayoutDescription(self, group):
"""_addLayoutDescription added a Layout Description dataset to h5py group
"""
indSpatialParms = self.getIndSpatialParms()
layoutDesc = self._getLayoutDescription() % (str(indSpatialParms),
str(indSpatialParms),
1+len(indSpatialParms),
len(indSpatialParms),
str(indSpatialParms),
str(indSpatialParms))
LayoutDescription = layoutDesc.split('\n')
# create a recarray to hold this text
textArr = numpy.recarray((len(LayoutDescription),), dtype=[('Layout Description', h5py.special_dtype(vlen=str))])
for i in range(len(LayoutDescription)):
textArr['Layout Description'][i] = LayoutDescription[i]
group.create_dataset('Layout Description', data=textArr)
def _getLayoutDescription(self):
"""_getLayoutDescription returns a description of the layout selected.
Returns:
LayoutDescription: A list of strings summarizing the Layout Description
Affects: Nothing
Exceptions: None
"""
LayoutDescription = """
This data layout contains reshaped data from the Table Layout. The reshaped data is stored as an array, with time and
the independent spatial parameters (%s) in different dimensions. It creates an array for each parameter found in file.
This layout contains:
- "1D parameters" group: contains one 1D-array for each 1d parameter
stored in the file. Time-dependent only parameters.
- "2D parameters" group: contains one 2D-array for each 2d parameter
stored in the file. Time and %s are independent parameters.
Every 2D array has %i dimensions - one for time, and %i for
the independent spatial parameters (%s).
- timestamps: Time vector in seconds from 1/1/1970.
- %s : The independent spatial parameters for this file"""
return(LayoutDescription)
def _addDataParametersTable(self, group, dim=None):
"""_addDataParametersTable adds the "Data Parameters" table to the h5py Group group if any parameters found
Inputs:
group - the h5py Group to add the dataset to
dim - if None, include all parameters. If 1 or 2, just include non-required 1 or 2 D parms
"""
if dim not in (None, 1, 2):
raise ValueError('dim must be in (None, 1, 2), not <%s>' % (str(dim)))
# this first pass is just to set the maximum length of all the strings
longestMnemStr = 0
longestDescStr = 0
longestUnitsStr = 0
longestCategoryStr = 0
count = 0
for i, parm in enumerate(self._tableDType.names):
if dim in (1,2) and i < len(self.requiredFields):
# skipping default parms
continue
if dim in (1,2):
if self.getParmDim(parm) != dim:
continue
count += 1
if len(self._madParmObj.getParmMnemonic(parm)) > longestMnemStr:
longestMnemStr = len(self._madParmObj.getParmMnemonic(parm))
if len(self._madParmObj.getSimpleParmDescription(parm)) > longestDescStr:
longestDescStr = len(self._madParmObj.getSimpleParmDescription(parm))
if len(self._madParmObj.getParmUnits(parm)) > longestUnitsStr:
longestUnitsStr = len(self._madParmObj.getParmUnits(parm))
if len(self._madParmObj.getParmCategory(parm)) > longestCategoryStr:
longestCategoryStr = len(self._madParmObj.getParmCategory(parm))
if count == 0: # no parms to add
return
parmArr = numpy.recarray((count,),
dtype = [('mnemonic', '|S%i' % (longestMnemStr)),
('description', '|S%i' % (longestDescStr)),
('isError', 'int'),
('units', '|S%i' % (longestUnitsStr)),
('category', '|S%i' % (longestCategoryStr))])
# set all the values
count = 0
for i, parm in enumerate(self._tableDType.names):
if dim in (1,2) and i < len(self.requiredFields):
# skipping default parms
continue
if dim in (1,2):
if self.getParmDim(parm) != dim:
continue
parmArr['mnemonic'][count] = self._madParmObj.getParmMnemonic(parm)
parmArr['description'][count] = self._madParmObj.getSimpleParmDescription(parm)
parmArr['isError'][count] = self._madParmObj.isError(parm)
parmArr['units'][count] = self._madParmObj.getParmUnits(parm)
parmArr['category'][count] = self._madParmObj.getParmCategory(parm)
count += 1
group.create_dataset("Data Parameters", data=parmArr)
def _writeRecordLayout(self, metadataGroup):
"""_writeRecordLayout adds the "_record_layout" table to the Metadata group metadataGroup if needed
"""
tableName = '_record_layout'
if self._recDset is None:
raise IOError('self._recDset not yet specified')
if tableName not in list(metadataGroup.keys()):
dset = metadataGroup.create_dataset(tableName, data=self._recDset)
dset.attrs['description'] = 'This is meant to be internal data. For each Madrigal record and parameter, it has a 2 if its a 2D parameter, 1 if its a 1D parameter, and 0 if there is no data.'
def _verifyFormat(self, tableDset, recDset):
"""verifyFormat raises an exception if any problem with the Hdf5 input file.
Inputs:
tableDset - dataset from hdfFile["Data"]["Table Layout"]
recDset - dataset from hdfFile["Metadata"]["_record_layout"]
Rules: 1. self._tableDset must start with int columns 'year', 'month', 'day', 'hour', 'min', 'sec',
'recno', 'kindat', 'kinst' and float columns 'ut1_unix', 'ut2_unix'
2. The len of self._recDset must be 1 and must start with the same columns
"""
for i, requiredField in enumerate(self.requiredFields):
if requiredField != tableDset.dtype.names[i]:
raise IOError('Field %s not found in table dset of Hdf5 file %s' % (requiredField, str(self._fullFilename)))
if requiredField != self._recDset.dtype.names[i]:
raise IOError('Field %s not found in record dset of Hdf5 file %s' % (requiredField, str(self._fullFilename)))
if len(recDset) != 1:
raise IOError('recDset must have len 1, not' % (len(recDset)))
for field in self._recDset.dtype.names:
if self._recDset[0][field] not in (1,2,3):
raise IOError('For field %s, got illegal recordset value %s' % (field, str(self._recDset[0][field])))
def _appendCatalogRecs(self, expNotesDataset):
"""_appendCatalogRecs will append 0 or more MadrigalCatalogRecords to self._privList based
on the contents in h5py dataset expNotesDataset
"""
start_delimiter = 'Catalog information from record'
end_delimiter = 'Header information from record'
in_catalog = False
catalog_text = ''
start_found = False
if len(self._kinstList) > 0:
kinst = self._kinstList[0]
else:
kinst = None
for line in expNotesDataset:
line = line[0]
if type(line) in (bytes, numpy.bytes_):
line = line.decode('utf8')
if not in_catalog and line.find(start_delimiter) != -1:
in_catalog = True
continue
if in_catalog and (line.find(end_delimiter) != -1 or line.find(start_delimiter) != -1):
start_found = False
if len(catalog_text) > 0:
self._privList.append(MadrigalCatalogRecord(kinst,None,None,None,None,
None,None,None,None,None,
None,None,None,None,None,
None,None,self._madInstObj, '',
catalog_text))
catalog_text = ''
if line.find(start_delimiter) == -1:
in_catalog = False
continue
if in_catalog and not start_found and len(line.split()) == 0:
continue
if in_catalog:
start_found = True
catalog_text += line + ' ' * (80 - len(line))
# see if last part was a catalog
if len(catalog_text) > 0 and len(catalog_text.split()) > 0:
self._privList.append(MadrigalCatalogRecord(kinst,None,None,None,None,
None,None,None,None,None,
None,None,None,None,None,
None,None,self._madInstObj,
catalog_text))
catalog_text = ''
def _appendHeaderRecs(self, expNotesDataset):
"""_appendHeaderRecs will append 0 or more MadrigalHeaderRecords to self._privList based
on the contents in h5py dataset expNotesDataset
"""
start_delimiter = 'Header information from record'
end_delimiter = 'Catalog information from record'
in_header = False
header_text = ''
start_found = False
if len(self._kinstList) > 0:
kinst = self._kinstList[0]
else:
kinst = None
if len(self._kindatList) > 0:
kindat = self._kindatList[0]
else:
kindat = None
for line in expNotesDataset:
line = line[0]
if type(line) in (bytes, numpy.bytes_):
line = line.decode('utf8')
if not in_header and line.find(start_delimiter) != -1:
in_header = True
continue
if in_header and (line.find(end_delimiter) != -1 or line.find(start_delimiter) != -1):
start_found = False
if len(header_text) > 0:
self._privList.append(MadrigalHeaderRecord(kinst,kindat,None,None,None,None,
None,None,None,None,None,None,
None,None,None,None,None,None,
None,self._madInstObj,
self._madKindatObj, header_text))
header_text = ''
if line.find(start_delimiter) == -1:
in_header = False
continue
if in_header and not start_found and len(line.split()) == 0:
continue
if in_header:
header_text += line + ' ' * (80 - len(line))
# see if last part was a header
if len(header_text) > 0 and len(header_text.split()) > 0:
self._privList.append(MadrigalHeaderRecord(kinst,kindat,None,None,None,None,
None,None,None,None,None,None,
None,None,None,None,None,None,
None,self._madInstObj,
self._madKindatObj, header_text))
header_text = ''
def _getArraySplitParms(self, metadataGroup):
"""_getArraySplitParms appends a list of parameters used to split arrays (if any) from
metadataGroup["Parameters Used to Split Array Data"]. If no such table or empty, returns
empty list
"""
retList2 = []
try:
dset = metadataGroup["Parameters Used to Split Array Data"]
retList2 = [mnem.lower() for mnem in dset['mnemonic']]
except:
return(retList2)
# verify ascii
retList = []
for mnem in retList2:
if type(mnem) in (bytes, numpy.bytes_):
retList.append(mnem.decode("ascii"))
else:
retList.append(mnem)
return(retList)
def writeExperimentNotes(self, metadataGroup, refreshCatHeadTimes):
"""writeExperimentNotes writes the "Experiment Notes" dataset to the h5py group metadataGroup
if any catalog or header records found.
refreshCatHeadTimes - if True, update start and and times in the catalog and header
records to represent the times in the data. If False, use existing times in those records.
"""
# templates
cat_template = 'Catalog information from record %i:'
head_template = 'Header information from record %i:'
if "Experiment Notes" in list(metadataGroup.keys()):
# already exists
return
recDict = {} # key = rec number, value = tuple of recarray of lines, 'Catalog' or 'Header' str)
for i, rec in enumerate(self._privList):
if rec.getType() == 'catalog':
if refreshCatHeadTimes:
sDT = self.getEarliestDT()
eDT = self.getLatestDT()
rec.setTimeLists(sDT.year, sDT.month, sDT.day,
sDT.hour, sDT.minute, sDT.second, int(sDT.microsecond/10000),
eDT.year, eDT.month, eDT.day,
eDT.hour, eDT.minute, eDT.second, int(eDT.microsecond/10000))
recarray = rec.getLines()
recDict[i] = (recarray, 'Catalog')
elif rec.getType() == 'header':
if refreshCatHeadTimes:
sDT = self.getEarliestDT()
eDT = self.getLatestDT()
rec.setTimeLists(sDT.year, sDT.month, sDT.day,
sDT.hour, sDT.minute, sDT.second, int(sDT.microsecond/10000),
eDT.year, eDT.month, eDT.day,
eDT.hour, eDT.minute, eDT.second, int(eDT.microsecond/10000))
recarray = rec.getLines()
recDict[i] = (recarray, 'Header')
keys = list(recDict.keys())
keys.sort()
if len(keys) == 0:
return
recarray = None
for key in keys:
new_recarray = numpy.recarray((2,), dtype=[('File Notes', h5py.special_dtype(vlen=str))])
if recDict[key][1] == 'Catalog':
topStr = cat_template % (key)
else:
topStr = head_template % (key)
new_recarray[0]['File Notes'] = topStr + ' ' * (80 - len(topStr))
new_recarray[1]['File Notes'] = ' ' * 80
if recarray is None:
recarray = new_recarray
else:
recarray = numpy.concatenate((recarray, new_recarray))
recarray = numpy.concatenate((recarray, recDict[key][0]))
metadataGroup.create_dataset('Experiment Notes', data=recarray)
def _getKinstList(self, recarr):
"""_getKinstList returns an array of instrument code ints by parsing a numpy recarray
with columns name and value. If name is "instrument code(s)", then parse comma separated
kinst int values in values column. Returns empty list if not found
"""
retList = []
for i in range(len(recarr)):
try:
if int(recarr[i]['name'].decode('utf8').find('instrument code(s)')) != -1:
retList = [int(float(kinst)) for kinst in recarr[i]['value'].decode('utf8').split(',')]
break
except AttributeError:
# not binary
if int(recarr[i]['name'].find('instrument code(s)')) != -1:
retList = [int(float(kinst)) for kinst in recarr[i]['value'].split(',')]
break
return(retList)
def _getKindatList(self, recarr):
"""_getKindatList returns an array of kind of data code ints by parsing a numpy recarray
with columns name and value. If name is "kindat code(s)", then parse comma separated
kindat int values in values column. Returns empty list if not found
"""
retList = []
for i in range(len(recarr)):
try:
if recarr[i]['name'].decode('utf8').find('kindat code(s)') != -1:
retList = [int(float(kindat)) for kindat in recarr[i]['value'].decode('utf8').split(',')]
break
except AttributeError:
# not binary
if recarr[i]['name'].find('kindat code(s)') != -1:
retList = [int(float(kindat)) for kindat in recarr[i]['value'].split(',')]
break
return(retList)
def _printSummary(self, f, filterList):
"""_printSummary prints an overview of the original filename and filters used if any
to open file f (may be stdout)
Inputs:
f - open file to write to
filterList - a list of madrigal.derivation.MadrigalFilter objects to be described in the
summary. Default is None, in which case not described in summary. Ignored if summary
is not 'summary'
"""
if self._fullFilename is not None:
f.write('Data derived from file %s:\n' % (self._fullFilename))
if filterList is None:
return
if len(filterList) == 0:
return
f.write('Filters used:\n')
for i in range(len(filterList)):
f.write('Filter %i:\n' % (i+1))
f.write('%s\n' % (str(filterList[i])))
def _getGroupName(self, indValues):
"""_getGroupName returns the name of a array group when split
Input:
indValues - a list of values, one for each array splitting parameter
"""
groupName = 'Array with '
for i, parm in enumerate(self._arraySplitParms):
if type(parm) == bytes:
parmString = parm.decode('utf8')
else:
parmString = parm
groupName += '%s=%s ' % (parmString, str(indValues[i]))
if i < len(indValues)-1:
groupName += 'and '
return(groupName)
""" the following methods are added to allow this class to emulate a list."""
def __len__(self):
return len(self._privList)
def __getitem__(self, key):
return self._privList[key]
def __setitem__(self, key, value):
# check that value in (MadrigalCatalogRecord, MadrigalHeaderRecord, MadrigalDataRecord)
if not isinstance(value, MadrigalCatalogRecord) and \
not isinstance(value, MadrigalHeaderRecord) and \
not isinstance(value, MadrigalDataRecord):
# check that its not an empty list (used to delete records)
okay = False
if type(value) == list:
if len(value) == 0:
okay = True
if not okay:
raise ValueError('In MadrigalCedarFile, can only add MadrigalCatalogRecord, MadrigalHeaderRecord, or MadrigalDataRecord')
self._privList[key] = value
def __getslice__(self, i, j):
return self._privList[i:j]
def __setslice__(self,i,j,seq):
# check every item in seq
for item in seq:
if not isinstance(value, MadrigalCatalogRecord) and \
not isinstance(value, MadrigalHeaderRecord) and \
not isinstance(value, MadrigalDataRecord):
raise ValueError('In MadrigalCedarFile, can only add MadrigalCatalogRecord, MadrigalHeaderRecord, or MadrigalDataRecord')
self._privList[max(0, i):max(0, j):] = seq
def __delslice__(self, i, j):
del self._privList[max(0, i):max(0, j):]
def __delitem__(self, key):
del self._privList[key]
def __iter__(self):
return iter(self._privList)
def __contains__(self, other):
for item in self._privList:
if item == other:
return 1
# not found
return 0
def __str__(self):
retStr = ''
for item in self._privList:
retStr += '%s\n' % (str(item))
return retStr
def append(self, item):
# check that value in (MadrigalCatalogRecord, MadrigalHeaderRecord, MadrigalDataRecord)
if not isinstance(item, MadrigalCatalogRecord) and \
not isinstance(item, MadrigalHeaderRecord) and \
not isinstance(item, MadrigalDataRecord):
raise ValueError('In MadrigalCedarFile, can only add MadrigalCatalogRecord, MadrigalHeaderRecord, or MadrigalDataRecord')
if isinstance(item, MadrigalDataRecord):
if self._tableDType != None:
if item.getDType() != self._tableDType:
raise ValueError('Varying dtypes found: %s versus %s' % (str(item.getDType()), str(self._tableDType)))
else:
self.setDType(item.getDType())
if self._recDset is not None:
if item.getRecordset() != self._recDset:
raise ValueError('Varying recordsets found: %s versus %s' % (str(item.getRecordset()), str(self._recDset)))
else:
self._recDset = item.getRecordset()
if self._oneDList is None:
# set all internal data structures set by data records only if not yet set
self._oneDList = item.get1DParms()
self._twoDList = item.get2DParms()
self._ind2DList = item.getInd2DParms()
# set self._num2DSplit
twoDSet = set([o.mnemonic for o in self._twoDList])
arraySplitSet = set(self._arraySplitParms)
self._num2DSplit = len(twoDSet.intersection(arraySplitSet))
if self._earliestDT is None:
self._earliestDT = item.getStartDatetime()
self._latestDT = item.getEndDatetime()
else:
if item.getStartDatetime() < self._earliestDT:
self._earliestDT = item.getStartDatetime()
if item.getEndDatetime() > self._latestDT:
self._latestDT = item.getEndDatetime()
if item.getKinst() not in self._kinstList:
self._kinstList.append(item.getKinst())
if item.getKindat() not in self._kindatList:
self._kindatList.append(item.getKindat())
item.setRecno(self._totalDataRecords)
self._totalDataRecords += 1
# update self._recIndexList
if len(self._recIndexList) > 0:
lastIndex = self._recIndexList[-1][1]
else:
lastIndex = 0
self._recIndexList.append((lastIndex, lastIndex + len(item.getDataset())))
if len(self._ind2DList) > 0:
dataset = item.getDataset()
rowsToCheck = dataset
for i, thisRow in enumerate(rowsToCheck):
# update self._arrDict
if not self._arraySplitParms == []:
arraySplitParms = []
for parm in self._arraySplitParms:
if type(parm) == bytes:
arraySplitParms.append(parm.decode('utf8'))
else:
arraySplitParms.append(parm)
key = tuple([thisRow[parm] for parm in arraySplitParms])
# array splitting parameters can never be nan
for this_value in key:
if not this_value.dtype.type is numpy.string_:
if numpy.isnan(this_value):
raise ValueError('parm %s is an array splitting parameter, so its illegal to have a nan value for it anywhere in the file' % (str(parm)))
else:
key = '' # no splitting
if key not in list(self._arrDict.keys()):
self._arrDict[key] = {}
# first add ut1_unix if needed
if 'ut1_unix' in self._arrDict[key]:
if thisRow['ut1_unix'] not in self._arrDict[key]['ut1_unix']:
self._arrDict[key]['ut1_unix'] = self._arrDict[key]['ut1_unix'].union([thisRow['ut1_unix']])
else:
self._arrDict[key]['ut1_unix'] = set([thisRow['ut1_unix']])
# now deal with all ind parms
for parm in self._ind2DList:
mnem = parm.mnemonic
if mnem in self._arraySplitParms:
# no need to create separate dimension since already split out
continue
if mnem in self._arrDict[key]:
if thisRow[mnem] not in self._arrDict[key][mnem]:
self._arrDict[key][mnem] = self._arrDict[key][mnem].union([thisRow[mnem]])
else:
self._arrDict[key][mnem] = set([thisRow[mnem]])
# enforce nan rule for ind2DList
thisList = list(self._arrDict[key][mnem])
if len(thisList) > 0:
skip = False
if type(thisList[0]) != bytes:
skip = True
if type(thisList[0]) == numpy.ndarray:
if thisList[0].dtype.type is numpy.string_:
skip = True
if not skip:
if numpy.any(numpy.isnan(thisList)):
raise ValueError('Cannot have nan in ind parm %s: %s' % (mnem, str(self._arrDict[key][mnem])))
self._privList.append(item)
def count(self, other):
return self._privList.count(other)
def index(self, other):
return self._privList.index(other)
def insert(self, i, x):
self._privList.insert(i, x)
def pop(self, i):
return self._privList.pop(i)
def remove(self, x):
self._privList.remove(x)
def reverse(self):
self._privList.reverse()
def sort(self):
self._privList.sort()
def __del__(self):
if not self._closed and self._createFlag and self._format == 'hdf5':
print(('Warning - created file %s being closed by __del__. Best practice is to call close() directly, to avoid this warning' % (str(self._fullFilename))))
self.close()
Ancestors (in MRO)
- MadrigalCedarFile
- builtins.object
Class variables
var missing
var missing_int
var requiredFields
Static methods
def __init__(
self, fullFilename, createFlag=False, startDatetime=None, endDatetime=None, maxRecords=None, recDset=None, arraySplitParms=None, skipArray=False)
init initializes MadrigalCedarFile by reading in existing file, if any.
Inputs:
fullFilename - either the existing Cedar file in Hdf5 format,
or a file to be created. May also be None if this
data is simply derived parameters that be written to stdout.
createFlag - tells whether this is a file to be created. If False and
fullFilename cannot be read, an error is raised. If True and
fullFilename already exists, or fullFilename cannot be created,
an error is raised.
startDatetime - if not None (the default), reject all input records where
record end time < startDatetime (datetime.datetime object).
Ignored if createFlag == True
endDatetime - if not None (the default), reject all input records where
record start time > endDatetime (datetime.datetime object)
Ignored if createFlag == True
maxRecords - the maximum number of records to read into memory
Ignored if createFlag == True
recDset - a numpy recarray with column names the names of all parameters, starting
with requiredFields. Values are 1 for 1D (all the required parms are 1D), 2 for
dependent 2D, and 3 for independent spatial 2D parameters. If None, self._recDset
not set until first data record appended.
arraySplitParms - if None (the default), read in arraySplitParms from the exiting file.
Otherwise set self._arraySplitParms to arraySplitParms, which is a list of 1D or 2D parms
where each unique set of value of the parms in this list will be used to split the full
data into separate arrays in Hdf5 or netCDF4 files. For example arraySplitParms=['beamid']
would split the data into separate arrays for each beamid. If None and new file is being
created, no array splitting (self._arraySplitParms = []).
skipArray - if False and any 2D parms, create array layout (the default). If True, skip array
layout (typically used when there are too many ind parm value combinations - generally not recommended).
Affects: populates self._privList if file exists. self._privList is the underlying
list of MadrigalDataRecords, MadrigalCatalogRecords, and MadrigalHeaderRecords.
Also populates:
self._tableDType - the numpy dtype to use to build the table layout
self._nextRecord - the index of the next record to read from the input file. Not used if
createFlag = True
(The following are the input arguments described above)
self._fullFilename
self._createFlag
self._startDatetime
self._endDatetime
self._maxRecords
self._totalDataRecords - number of data records appended (may differ from len(self._privList)
if dump called).
self._minMaxParmDict - a dictionary with key = parm mnems, value = tuple of
min, max values (may be nan)
self._arrDict - a dictionary with key = list of array split parm values found in file,
('' if no spliting), and values = dict of key = 'ut1_unix' and ind 2d parm names (excluding
array splitting parms, if also ind 2D parm), and values
= python set of all unique values. Populated only if createFlag=True. Used to create
Array Layout
self._recIndexList - a list of (startIndex, endIndex) for each data record added. Used to slice out
data records from Table Layout
self._num2DSplit - Number of arraySplitParms that are 2D
self._closed - a boolean used to determine if the file being created was already closed
Returns: void
def __init__(self, fullFilename,
reateFlag=False,
startDatetime=None,
endDatetime=None,
maxRecords=None,
recDset=None,
arraySplitParms=None,
skipArray=False):
"""__init__ initializes MadrigalCedarFile by reading in existing file, if any.
Inputs:
fullFilename - either the existing Cedar file in Hdf5 format,
or a file to be created. May also be None if this
data is simply derived parameters that be written to stdout.
createFlag - tells whether this is a file to be created. If False and
fullFilename cannot be read, an error is raised. If True and
fullFilename already exists, or fullFilename cannot be created,
an error is raised.
startDatetime - if not None (the default), reject all input records where
record end time < startDatetime (datetime.datetime object).
Ignored if createFlag == True
endDatetime - if not None (the default), reject all input records where
record start time > endDatetime (datetime.datetime object)
Ignored if createFlag == True
maxRecords - the maximum number of records to read into memory
Ignored if createFlag == True
recDset - a numpy recarray with column names the names of all parameters, starting
with requiredFields. Values are 1 for 1D (all the required parms are 1D), 2 for
dependent 2D, and 3 for independent spatial 2D parameters. If None, self._recDset
not set until first data record appended.
arraySplitParms - if None (the default), read in arraySplitParms from the exiting file.
Otherwise set self._arraySplitParms to arraySplitParms, which is a list of 1D or 2D parms
where each unique set of value of the parms in this list will be used to split the full
data into separate arrays in Hdf5 or netCDF4 files. For example arraySplitParms=['beamid']
would split the data into separate arrays for each beamid. If None and new file is being
created, no array splitting (self._arraySplitParms = []).
skipArray - if False and any 2D parms, create array layout (the default). If True, skip array
layout (typically used when there are too many ind parm value combinations - generally not recommended).
Affects: populates self._privList if file exists. self._privList is the underlying
list of MadrigalDataRecords, MadrigalCatalogRecords, and MadrigalHeaderRecords.
Also populates:
self._tableDType - the numpy dtype to use to build the table layout
self._nextRecord - the index of the next record to read from the input file. Not used if
createFlag = True
(The following are the input arguments described above)
self._fullFilename
self._createFlag
self._startDatetime
self._endDatetime
self._maxRecords
self._totalDataRecords - number of data records appended (may differ from len(self._privList)
if dump called).
self._minMaxParmDict - a dictionary with key = parm mnems, value = tuple of
min, max values (may be nan)
self._arrDict - a dictionary with key = list of array split parm values found in file,
('' if no spliting), and values = dict of key = 'ut1_unix' and ind 2d parm names (excluding
array splitting parms, if also ind 2D parm), and values
= python set of all unique values. Populated only if createFlag=True. Used to create
Array Layout
self._recIndexList - a list of (startIndex, endIndex) for each data record added. Used to slice out
data records from Table Layout
self._num2DSplit - Number of arraySplitParms that are 2D
self._closed - a boolean used to determine if the file being created was already closed
Returns: void
"""
self._privList = []
self._fullFilename = fullFilename
self._startDatetime = startDatetime
self._endDatetime = endDatetime
self._maxRecords = maxRecords
self._totalDataRecords = 0
self._nextRecord = 0
self._tableDType = None # will be set to the dtype of Table Layout
self._oneDList = None # will be set when first data record appended
self._twoDList = None # will be set when first data record appended
self._ind2DList = None # will be set when first data record appended
self._arraySplitParms = arraySplitParms
self._skipArray = bool(skipArray)
if createFlag:
self._closed = False
else:
self._closed = True # no need to close file only being read
self._hdf5Extensions = ('.hdf5', '.h5', '.hdf')
# keep track of earliest and latest record times
self._earliestDT = None
self._latestDT = None
# summary info
self._experimentParameters = None
self._kinstList = [] # a list of all kinsts integers in file
self._kindatList = [] # a list of all kindats integers in file
self._status = 'Unknown' # can be externally set
self._format = None # used to check that partial writes via dump are consistent
if createFlag not in (True, False):
raise ValueError('in MadrigalCedarFile, createFlag must be either True or False')
self._createFlag = createFlag
if createFlag == False:
if not os.access(fullFilename, os.R_OK):
raise ValueError('in MadrigalCedarFile, fullFilename %s does not exist' % (str(fullFilename)))
if not fullFilename.endswith(self._hdf5Extensions):
raise IOError('MadrigalCedarFile can only read in CEDAR Hdf5 files, not %s' % (fullFilename))
if createFlag == True:
if fullFilename != None: # then this data will never be persisted - only written to stdout
if os.access(fullFilename, os.R_OK):
raise ValueError('in MadrigalCedarFile, fullFilename %s already exists' % (str(fullFilename)))
if not os.access(os.path.dirname(fullFilename), os.W_OK):
raise ValueError('in MadrigalCedarFile, fullFilename %s cannot be created' % (str(fullFilename)))
if not fullFilename.endswith(self._hdf5Extensions):
raise IOError('All Madrigal files must end with hdf5 extension, <%s> does not' % (str(fullFilename)))
if self._arraySplitParms is None:
self._arraySplitParms = []
# create needed Madrigal objects
self._madDBObj = madrigal.metadata.MadrigalDB()
self._madInstObj = madrigal.metadata.MadrigalInstrument(self._madDBObj)
self._madParmObj = madrigal.data.MadrigalParameters(self._madDBObj)
self._madKindatObj = madrigal.metadata.MadrigalKindat(self._madDBObj)
if not self._arraySplitParms is None:
self._arraySplitParms = [self._madParmObj.getParmMnemonic(p).lower() for p in self._arraySplitParms]
self._minMaxParmDict = {}
self._arrDict = {}
self._num2DSplit = None # will be set to bool when first record added
self._recIndexList = []
if recDset is not None:
self._recDset = recDset
else:
self._recDset = None
if createFlag == False:
self.loadNextRecords(self._maxRecords)
def append(
self, item)
def append(self, item):
# check that value in (MadrigalCatalogRecord, MadrigalHeaderRecord, MadrigalDataRecord)
if not isinstance(item, MadrigalCatalogRecord) and \
not isinstance(item, MadrigalHeaderRecord) and \
not isinstance(item, MadrigalDataRecord):
raise ValueError('In MadrigalCedarFile, can only add MadrigalCatalogRecord, MadrigalHeaderRecord, or MadrigalDataRecord')
if isinstance(item, MadrigalDataRecord):
if self._tableDType != None:
if item.getDType() != self._tableDType:
raise ValueError('Varying dtypes found: %s versus %s' % (str(item.getDType()), str(self._tableDType)))
else:
self.setDType(item.getDType())
if self._recDset is not None:
if item.getRecordset() != self._recDset:
raise ValueError('Varying recordsets found: %s versus %s' % (str(item.getRecordset()), str(self._recDset)))
else:
self._recDset = item.getRecordset()
if self._oneDList is None:
# set all internal data structures set by data records only if not yet set
self._oneDList = item.get1DParms()
self._twoDList = item.get2DParms()
self._ind2DList = item.getInd2DParms()
# set self._num2DSplit
twoDSet = set([o.mnemonic for o in self._twoDList])
arraySplitSet = set(self._arraySplitParms)
self._num2DSplit = len(twoDSet.intersection(arraySplitSet))
if self._earliestDT is None:
self._earliestDT = item.getStartDatetime()
self._latestDT = item.getEndDatetime()
else:
if item.getStartDatetime() < self._earliestDT:
self._earliestDT = item.getStartDatetime()
if item.getEndDatetime() > self._latestDT:
self._latestDT = item.getEndDatetime()
if item.getKinst() not in self._kinstList:
self._kinstList.append(item.getKinst())
if item.getKindat() not in self._kindatList:
self._kindatList.append(item.getKindat())
item.setRecno(self._totalDataRecords)
self._totalDataRecords += 1
# update self._recIndexList
if len(self._recIndexList) > 0:
lastIndex = self._recIndexList[-1][1]
else:
lastIndex = 0
self._recIndexList.append((lastIndex, lastIndex + len(item.getDataset())))
if len(self._ind2DList) > 0:
dataset = item.getDataset()
rowsToCheck = dataset
for i, thisRow in enumerate(rowsToCheck):
# update self._arrDict
if not self._arraySplitParms == []:
arraySplitParms = []
for parm in self._arraySplitParms:
if type(parm) == bytes:
arraySplitParms.append(parm.decode('utf8'))
else:
arraySplitParms.append(parm)
key = tuple([thisRow[parm] for parm in arraySplitParms])
# array splitting parameters can never be nan
for this_value in key:
if not this_value.dtype.type is numpy.string_:
if numpy.isnan(this_value):
raise ValueError('parm %s is an array splitting parameter, so its illegal to have a nan value for it anywhere in the file' % (str(parm)))
else:
key = '' # no splitting
if key not in list(self._arrDict.keys()):
self._arrDict[key] = {}
# first add ut1_unix if needed
if 'ut1_unix' in self._arrDict[key]:
if thisRow['ut1_unix'] not in self._arrDict[key]['ut1_unix']:
self._arrDict[key]['ut1_unix'] = self._arrDict[key]['ut1_unix'].union([thisRow['ut1_unix']])
else:
self._arrDict[key]['ut1_unix'] = set([thisRow['ut1_unix']])
# now deal with all ind parms
for parm in self._ind2DList:
mnem = parm.mnemonic
if mnem in self._arraySplitParms:
# no need to create separate dimension since already split out
continue
if mnem in self._arrDict[key]:
if thisRow[mnem] not in self._arrDict[key][mnem]:
self._arrDict[key][mnem] = self._arrDict[key][mnem].union([thisRow[mnem]])
else:
self._arrDict[key][mnem] = set([thisRow[mnem]])
# enforce nan rule for ind2DList
thisList = list(self._arrDict[key][mnem])
if len(thisList) > 0:
skip = False
if type(thisList[0]) != bytes:
skip = True
if type(thisList[0]) == numpy.ndarray:
if thisList[0].dtype.type is numpy.string_:
skip = True
if not skip:
if numpy.any(numpy.isnan(thisList)):
raise ValueError('Cannot have nan in ind parm %s: %s' % (mnem, str(self._arrDict[key][mnem])))
self._privList.append(item)
def close(
self)
close closes an open MadrigalCedarFile. It calls _writeHdf5Metadata and _addArray if ind parms.
Most be called directly when dump used.
def close(self):
"""close closes an open MadrigalCedarFile. It calls _writeHdf5Metadata and _addArray if ind parms.
Most be called directly when dump used.
"""
if self._closed:
# nothing to do
return
with h5py.File(self._fullFilename, 'a') as f:
self._writeHdf5Metadata(f, refreshCatHeadTimes=True)
if len(self.getIndSpatialParms()) > 0:
if not self._skipArray:
self._addArrayDump()
self._closed = True
def count(
self, other)
def count(self, other):
return self._privList.count(other)
def createCatalogTimeSection(
self)
createCatalogTimeSection will return all the lines in the catalog record that describe the start and end time of the data records.
Inputs: None
Returns: a tuple with three items 1) a string in the format of the time section of a catalog record, 2) earliest datetime, 3) latest datetime
def createCatalogTimeSection(self):
"""createCatalogTimeSection will return all the lines in the catalog record that
describe the start and end time of the data records.
Inputs: None
Returns: a tuple with three items 1) a string in the format of the time section of a
catalog record, 2) earliest datetime, 3) latest datetime
"""
earliestStartTime = self.getEarliestDT()
latestEndTime = self.getLatestDT()
sy = 'IBYRE %4s Beginning year' % (str(earliestStartTime.year))
sd = 'IBDTE %4s Beginning month and day' % (str(earliestStartTime.month*100 + \
earliestStartTime.day))
sh = 'IBHME %4s Beginning UT hour and minute' % (str(earliestStartTime.hour*100 + \
earliestStartTime.minute))
totalCS = earliestStartTime.second*100 + (earliestStartTime.microsecond/10000)
ss = 'IBCSE %4s Beginning centisecond' % (str(totalCS))
ey = 'IEYRE %4s Ending year' % (str(latestEndTime.year))
ed = 'IEDTE %4s Ending month and day' % (str(latestEndTime.month*100 + \
latestEndTime.day))
eh = 'IEHME %4s Ending UT hour and minute' % (str(latestEndTime.hour*100 + \
latestEndTime.minute))
totalCS = latestEndTime.second*100 + (latestEndTime.microsecond/10000)
es = 'IECSE %4s Ending centisecond' % (str(totalCS))
retStr = ''
retStr += sy + (80-len(sy))*' '
retStr += sd + (80-len(sd))*' '
retStr += sh + (80-len(sh))*' '
retStr += ss + (80-len(ss))*' '
retStr += ey + (80-len(ey))*' '
retStr += ed + (80-len(ed))*' '
retStr += eh + (80-len(eh))*' '
retStr += es + (80-len(es))*' '
return((retStr, earliestStartTime, latestEndTime))
def createHeaderTimeSection(
self, dataRecList=None)
createHeaderTimeSection will return all the lines in the header record that describe the start and end time of the data records.
Inputs:
dataRecList - if given, examine only those MadrigalDataRecords in dataRecList.
If None (the default), examine all MadrigalDataRecords in this
MadrigalCedarFile
Returns: a tuple with three items 1) a string in the format of the time section of a header record, 2) earliest datetime, 3) latest datetime
def createHeaderTimeSection(self, dataRecList=None):
"""createHeaderTimeSection will return all the lines in the header record that
describe the start and end time of the data records.
Inputs:
dataRecList - if given, examine only those MadrigalDataRecords in dataRecList.
If None (the default), examine all MadrigalDataRecords in this
MadrigalCedarFile
Returns: a tuple with three items 1) a string in the format of the time section of a
header record, 2) earliest datetime, 3) latest datetime
"""
if dataRecList is None:
earliestStartTime = self.getEarliestDT()
latestEndTime = self.getLatestDT()
else:
earliestStartTime = None
latestEndTime = None
for rec in dataRecList:
if rec.getType() != 'data':
continue
#earliest time
thisTime = rec.getStartDatetime()
if earliestStartTime is None:
earliestStartTime = thisTime
if earliestStartTime > thisTime:
earliestStartTime = thisTime
#latest time
thisTime = rec.getEndDatetime()
if latestEndTime is None:
latestEndTime = thisTime
if latestEndTime < thisTime:
latestEndTime = thisTime
sy = 'IBYRT %4s Beginning year' % (str(earliestStartTime.year))
sd = 'IBDTT %4s Beginning month and day' % (str(earliestStartTime.month*100 + \
earliestStartTime.day))
sh = 'IBHMT %4s Beginning UT hour and minute' % (str(earliestStartTime.hour*100 + \
earliestStartTime.minute))
totalCS = earliestStartTime.second*100 + (earliestStartTime.microsecond/10000)
ss = 'IBCST %4s Beginning centisecond' % (str(totalCS))
ey = 'IEYRT %4s Ending year' % (str(latestEndTime.year))
ed = 'IEDTT %4s Ending month and day' % (str(latestEndTime.month*100 + \
latestEndTime.day))
eh = 'IEHMT %4s Ending UT hour and minute' % (str(latestEndTime.hour*100 + \
latestEndTime.minute))
totalCS = latestEndTime.second*100 + (latestEndTime.microsecond/10000)
es = 'IECST %4s Ending centisecond' % (str(totalCS))
retStr = ''
retStr += sy + (80-len(sy))*' '
retStr += sd + (80-len(sd))*' '
retStr += sh + (80-len(sh))*' '
retStr += ss + (80-len(ss))*' '
retStr += ey + (80-len(ey))*' '
retStr += ed + (80-len(ed))*' '
retStr += eh + (80-len(eh))*' '
retStr += es + (80-len(es))*' '
return((retStr, earliestStartTime, latestEndTime))
def dump(
self, format='hdf5', newFilename=None, parmIndexDict=None)
dump appends all the present records in MadrigalCedarFile to file, and removes present data records from MadrigalCedarFile.
Can be used to append records to a file. Catalog and header records are maintaained.
Typically close is called after all calls to dump. The del method will automatically call close if needed, and print a warning that the user should add it to their code.
Inputs:
format - a format to save the file in. The format argument only exists for backwards
compatibility - only hdf5 is allowed. IOError raised is any other argument given.
newFilename - a filename to save to. Defaults to self._fullFilename passed into initializer if not given.
parmIndexDict - used only for dumping netCDF4
Outputs: None
Affects: writes a MadrigalCedarFile to file
def dump(self, format='hdf5', newFilename=None, parmIndexDict=None):
"""dump appends all the present records in MadrigalCedarFile to file, and removes present data records from MadrigalCedarFile.
Can be used to append records to a file. Catalog and header records are maintaained.
Typically close is called after all calls to dump. The __del__ method will automatically call
close if needed, and print a warning that the user should add it to their code.
Inputs:
format - a format to save the file in. The format argument only exists for backwards
compatibility - only hdf5 is allowed. IOError raised is any other argument given.
newFilename - a filename to save to. Defaults to self._fullFilename passed into initializer if not given.
parmIndexDict - used only for dumping netCDF4
Outputs: None
Affects: writes a MadrigalCedarFile to file
"""
if self._format != None:
if self._format != format:
raise ValueError('Previous dump format was %s, cannot now use %s' % (str(self._format), str(format)))
if format not in ('hdf5', 'netCDF4'):
raise ValueError('Format must be hdf5 for dump, not %s' % (str(format)))
if newFilename is None:
newFilename = self._fullFilename
if self._format is None:
# first write - run checks, and create all possible metadata and data
if os.access(newFilename, os.R_OK):
raise IOError('newFilename <%s> already exists' % (newFilename))
if format == 'hdf5':
if not newFilename.endswith(tuple(list(self._hdf5Extensions) + ['.nc'])):
raise IOError('filename must end with %s, <%s> does not' % (str(tuple(list(self._hdf5Extensions) + ['.nc'])), newFilename))
elif format == 'netCDF4':
if not newFilename.endswith('.nc'):
raise IOError('filename must end with %s, <%s> does not' % ('.nc', newFilename))
if len(self._privList) == 0:
# nothing to dump
return
if format == 'hdf5':
try:
# we need to make sure this file is closed and then deleted if an error
f = None # used if next line fails
f = h5py.File(newFilename, 'a')
self._closed = False
if self.hasArray(f):
raise IOError('Cannot call dump for hdf5 after write or close')
self._writeHdf5Data(f)
f.close()
except:
# on any error, close and delete file, then reraise error
if f:
f.close()
if os.access(newFilename, os.R_OK):
os.remove(newFilename)
raise
elif format == 'netCDF4':
if len(self._arraySplitParms) != 0:
raise IOError('Cannot dump netCDF4 files with arraySplitParms - write to Hdf5 and then convert')
if self._format is None:
# first write
f = netCDF4.Dataset(newFilename, 'w', format='NETCDF4')
self._firstDumpNetCDF4(f, parmIndexDict)
f.close()
else:
f = netCDF4.Dataset(newFilename, 'a', format='NETCDF4')
self._appendNetCDF4(f, parmIndexDict)
f.close()
self._format = format
# dump data records out of memory
self._privList = [rec for rec in self._privList if not rec.getType() == 'data']
def get1DParms(
self)
get1DParms returns a list of mnemonics of 1D parms in file. May be empty if none.
Raises ValueError if self._oneDList is None, since parameters unknown
def get1DParms(self):
"""get1DParms returns a list of mnemonics of 1D parms
in file. May be empty if none.
Raises ValueError if self._oneDList is None, since parameters unknown
"""
if self._oneDList is None:
raise ValueError('get1DParms cannot be called before any data records added to this file')
retList = []
for parm in self._oneDList:
retList.append(parm.mnemonic)
return(retList)
def get2DParms(
self)
get2DParms returns a list of mnemonics of dependent 2D parms in file. May be empty if none.
Raises ValueError if self._twoDList is None, since parameters unknown
def get2DParms(self):
"""get2DParms returns a list of mnemonics of dependent 2D parms
in file. May be empty if none.
Raises ValueError if self._twoDList is None, since parameters unknown
"""
if self._twoDList is None:
raise ValueError('get2DParms cannot be called before any data records added to this file')
retList = []
for parm in self._twoDList:
retList.append(parm.mnemonic)
return(retList)
def getArraySplitParms(
self)
getArraySplitParms returns a list of mnemonics of parameters used to split array. May be empty or None.
def getArraySplitParms(self):
"""getArraySplitParms returns a list of mnemonics of parameters used to split array. May be empty or None.
"""
return(self._arraySplitParms)
def getDType(
self)
getDType returns the dtype of the table array in this file
def getDType(self):
"""getDType returns the dtype of the table array in this file
"""
return(self._tableDType)
def getEarliestDT(
self)
getEarliestDT returns the earliest datetime found in file, or None if no data
def getEarliestDT(self):
"""getEarliestDT returns the earliest datetime found in file, or None if no data
"""
return(self._earliestDT)
def getIndSpatialParms(
self)
getIndSpatialParms returns a list of mnemonics of independent spatial parameters in file. May be empty if none.
Raises ValueError if self._ind2DList is None, since parameters unknown
def getIndSpatialParms(self):
"""getIndSpatialParms returns a list of mnemonics of independent spatial parameters
in file. May be empty if none.
Raises ValueError if self._ind2DList is None, since parameters unknown
"""
if self._ind2DList is None:
raise ValueError('getIndSpatialParms cannot be called before any data records added to this file')
retList = []
for parm in self._ind2DList:
retList.append(parm.mnemonic)
return(retList)
def getKindatList(
self)
getKindatList returns the list of kindat integers in the file
def getKindatList(self):
"""getKindatList returns the list of kindat integers in the file
"""
return(self._kindatList)
def getKinstList(
self)
getKinstList returns the list of kinst integers in the file
def getKinstList(self):
"""getKinstList returns the list of kinst integers in the file
"""
return(self._kinstList)
def getLatestDT(
self)
getLatestDT returns the latest datetime found in file, or None if no data
def getLatestDT(self):
"""getLatestDT returns the latest datetime found in file, or None if no data
"""
return(self._latestDT)
def getMaxMinValues(
self, mnemonic, verifyValid=False)
getMaxMinValues returns a tuple of (minimum value, maximum value) of the value of parm in this file. If verifyValid is True, then only lines with valid 2D data are included. If no valid values, returns (NaN, NaN). Also updates self._minMaxParmDict
Raise IOError if parm not found
def getMaxMinValues(self, mnemonic, verifyValid=False):
"""getMaxMinValues returns a tuple of (minimum value, maximum value) of the value
of parm in this file. If verifyValid is True, then only lines with valid 2D data
are included. If no valid values, returns (NaN, NaN). Also updates self._minMaxParmDict
Raise IOError if parm not found
"""
parm = mnemonic.lower()
# for string data, always return (Nan, Nan)
if self._madParmObj.isString(parm):
self._minMaxParmDict[parm] = [numpy.NaN, numpy.NaN]
return((numpy.NaN, numpy.NaN))
# create a merged dataset
datasetList = []
for rec in self._privList:
if rec.getType() == 'data':
datasetList.append(rec._dataset)
if len(datasetList) == 0:
if parm in self._minMaxParmDict:
return(self._minMaxParmDict[parm])
else:
raise IOError('No data records in file')
merged_dataset = numpy.concatenate(datasetList)
if not verifyValid:
# veru simple - just jusr numpy methods
try:
data = merged_dataset[parm]
except:
raise IOError('parm %s not found in file' % (parm))
with warnings.catch_warnings():
warnings.simplefilter("ignore")
minValue = numpy.nanmin(data)
maxValue = numpy.nanmax(data)
if parm not in self._minMaxParmDict:
self._minMaxParmDict[parm] = [minValue, maxValue]
else:
orgMin, orgMax = self._minMaxParmDict[parm]
self._minMaxParmDict[parm] = [min(minValue, orgMin), max(maxValue, orgMax)]
return((minValue, maxValue))
# we need to find the minimum and maximum for only valid data
# first sort by parm so we just need to walk until we find a valid row starting at the top and bottom
sorted_indices = numpy.argsort(merged_dataset[parm])
# find min
minValue = None
for i in sorted_indices:
if numpy.isnan(merged_dataset[parm][i]):
continue
for twoDParm in self.get2DParms():
if self._madParmObj.isString(twoDParm):
continue
if numpy.isnan(merged_dataset[twoDParm][i]):
continue
# make sure its not a special error value
if self._madParmObj.isError(twoDParm) and merged_dataset[twoDParm][i] < 0:
continue
# minimum found
minValue = merged_dataset[parm][i]
break
if not minValue is None:
break
# find max
maxValue = None
for i in reversed(sorted_indices):
if numpy.isnan(merged_dataset[parm][i]):
continue
for twoDParm in self.get2DParms():
if self._madParmObj.isString(twoDParm):
continue
if numpy.isnan(merged_dataset[twoDParm][i]):
continue
# make sure its not a special error value
if self._madParmObj.isError(twoDParm) and merged_dataset[twoDParm][i] < 0:
continue
# minimum found
maxValue = merged_dataset[parm][i]
break
if not maxValue is None:
break
if minValue is None:
minValue = numpy.nan
if maxValue is None:
maxValue = numpy.nan
if parm not in self._minMaxParmDict:
self._minMaxParmDict[parm] = [minValue, maxValue]
else:
orgMin, orgMax = self._minMaxParmDict[parm]
self._minMaxParmDict[parm] = [min(minValue, orgMin), max(maxValue, orgMax)]
return((minValue, maxValue))
def getParmDim(
self, parm)
getParmDim returns the dimension (1,2, or 3 for independent spatial parms) of input parm
Raises ValueError if no data records yet. Raise KeyError if that parameter not found in file
def getParmDim(self, parm):
"""getParmDim returns the dimension (1,2, or 3 for independent spatial parms) of input parm
Raises ValueError if no data records yet.
Raise KeyError if that parameter not found in file
"""
if self._ind2DList is None:
raise ValueError('getParmDim cannot be called before any data records added to this file')
for obj in self._oneDList:
if obj.mnemonic.lower() == parm.lower():
return(1)
# do ind 2D next since they are in both lists
for obj in self._ind2DList:
if obj.mnemonic.lower() == parm.lower():
return(3)
for obj in self._twoDList:
if obj.mnemonic.lower() == parm.lower():
return(2)
raise KeyError('Parm <%s> not found in data' % (str(parm)))
def getRecDType(
self)
getRecDType returns the dtype of _record_layout
def getRecDType(self):
"""getRecDType returns the dtype of _record_layout
"""
return(self._recDset.dtype)
def getRecIndexList(
self)
getRecIndexList returns a list of record indexes into Table Layout
def getRecIndexList(self):
"""getRecIndexList returns a list of record indexes into Table Layout
"""
return(self._recIndexList)
def getRecordset(
self)
getRecordset returns the recordset array from the first data record.
Raises IOError if None
def getRecordset(self):
"""getRecordset returns the recordset array from the first data record.
Raises IOError if None
"""
if self._recDset is None:
raise IOError('self._recDset is None')
return(self._recDset)
def getStatus(
self)
getStatus returns the status string
def getStatus(self):
"""getStatus returns the status string
"""
return(self._status)
def getStringFormat(
self, parm)
getStringFormat returns string format string. Raises error if not string type, or parm not in record. or table dtype not yet set
def getStringFormat(self, parm):
"""getStringFormat returns string format string. Raises error if not string type,
or parm not in record. or table dtype not yet set
"""
if not self.parmIsString(parm):
raise ValueError('parm %s not a string, cannot call getStringFormat' % (str(parm)))
return(str(self._tableDType[parm.lower()]))
def hasArray(
self, f)
hasArray returns True in f['Data']['Array Layout'] exists, False otherwise
def hasArray(self, f):
"""hasArray returns True in f['Data']['Array Layout'] exists, False otherwise
"""
if 'Data' in list(f.keys()):
if 'Array Layout' in list(f['Data'].keys()):
return(True)
return(False)
def index(
self, other)
def index(self, other):
return self._privList.index(other)
def insert(
self, i, x)
def insert(self, i, x):
self._privList.insert(i, x)
def loadNextRecords(
self, numRecords=None, removeExisting=True)
loadNextRecords loads a maximum of numRecords. Returns tuple of the the number of records loaded, and boolean of whether complete. May be less than numRecords if not enough records in the input file. Returns 0 if no records left.
Inputs:
numRecords - number of records to try to load. If None, load all remaining records
removeExisting - if True (the default), remove existing records before loading new
ones. If False, append new records to existing records.
Returns:
tuple of the the number of records loaded, and boolean of whether complete.
May be less than numRecords if not enough records.
Raises error if file opened with createFlag = True
def loadNextRecords(self, numRecords=None, removeExisting=True):
"""loadNextRecords loads a maximum of numRecords. Returns tuple of the the number of records loaded, and boolean of whether complete.
May be less than numRecords if not enough records in the input file. Returns 0 if no records left.
Inputs:
numRecords - number of records to try to load. If None, load all remaining records
removeExisting - if True (the default), remove existing records before loading new
ones. If False, append new records to existing records.
Returns:
tuple of the the number of records loaded, and boolean of whether complete.
May be less than numRecords if not enough records.
Raises error if file opened with createFlag = True
"""
if self._createFlag:
raise IOError('Cannot call loadNextRecords when creating a new MadrigalCedarFile')
if removeExisting:
self._privList = []
isComplete = False
hdfFile = h5py.File(self._fullFilename, 'r')
tableDset = hdfFile["Data"]["Table Layout"]
metadataGroup = hdfFile["Metadata"]
recDset = metadataGroup["_record_layout"]
if self._nextRecord == 0:
if self._recDset is None:
self._recDset = recDset[()]
elif self._recDset != recDset:
raise IOError('recDset in first record <%s> does not match expected recDset <%s>' % \
(str(recDset), str(self._recDset)))
self._verifyFormat(tableDset, recDset)
self._tableDType = tableDset.dtype
self._experimentParameters = numpy.array(hdfFile["Metadata"]['Experiment Parameters'])
self._kinstList = self._getKinstList(self._experimentParameters)
self._kindatList = self._getKindatList(self._experimentParameters)
if self._arraySplitParms is None:
self._arraySplitParms = self._getArraySplitParms(hdfFile["Metadata"])
if 'Experiment Notes' in list(hdfFile["Metadata"].keys()):
self._appendCatalogRecs(hdfFile["Metadata"]['Experiment Notes'])
self._appendHeaderRecs(hdfFile["Metadata"]['Experiment Notes'])
if self._ind2DList is not None:
parmObjList = (self._oneDList, self._twoDList, self._ind2DList) # used for performance in load
else:
parmObjList = None
# get indices for each record
recLoaded = 0
recTested = 0
if not hasattr(self, 'recnoArr'):
self.recnoArr = tableDset['recno']
# read all the records in at once for performance
if not numRecords is None:
indices = numpy.searchsorted(self.recnoArr, numpy.array([self._nextRecord, self._nextRecord + numRecords]))
tableIndices = numpy.arange(indices[0], indices[1])
if len(tableIndices) > 0:
fullTableSlice = tableDset[tableIndices[0]:tableIndices[-1]+1]
fullRecnoArr = fullTableSlice['recno']
else:
fullTableSlice = tableDset
fullRecnoArr = self.recnoArr
while(True):
if not numRecords is None:
if len(tableIndices) == 0:
isComplete = True
break
if numRecords:
if recTested >= numRecords:
break
# get slices of tableDset and recDset to create next MadrigalDataRecord
indices = numpy.searchsorted(fullRecnoArr, numpy.array([self._nextRecord, self._nextRecord + 1]))
tableIndices = numpy.arange(indices[0], indices[1])
if len(tableIndices) == 0:
isComplete = True
break
tableSlice = fullTableSlice[tableIndices[0]:tableIndices[-1]+1]
self._recIndexList.append((tableIndices[0],tableIndices[-1]+1))
self._nextRecord += 1
firstRow = tableSlice[0]
startDT = datetime.datetime.utcfromtimestamp(firstRow['ut1_unix'])
stopDT = datetime.datetime.utcfromtimestamp(firstRow['ut2_unix'])
if firstRow['kinst'] not in self._kinstList:
self._kinstList.append(firstRow['kinst'])
if firstRow['kindat'] not in self._kindatList:
self._kindatList.append(firstRow['kindat'])
# find earliest and latest times
if self._earliestDT is None:
self._earliestDT = startDT
self._latestDT = stopDT
else:
if startDT < self._earliestDT:
self._earliestDT = startDT
if stopDT > self._latestDT:
self._latestDT = stopDT
recTested += 1 # increment here because the next step may reject it
# check if datetime filter should be applied
if not self._startDatetime is None or not self._endDatetime is None:
if not self._startDatetime is None:
if stopDT < self._startDatetime:
continue
if not self._endDatetime is None:
if startDT > self._endDatetime:
isComplete = True
break
if self._ind2DList is None:
try:
indParmList = metadataGroup['Independent Spatial Parameters']['mnemonic']
indParms = [item.decode('utf-8') for item in indParmList]
except:
indParms = []
else:
indParms = self._ind2DList
newMadDataRec = MadrigalDataRecord(madInstObj=self._madInstObj, madParmObj=self._madParmObj,
dataset=tableSlice, recordSet=self._recDset,
parmObjList=parmObjList, ind2DList=indParms)
if self._ind2DList is None:
self._oneDList = newMadDataRec.get1DParms()
self._twoDList = newMadDataRec.get2DParms()
self._ind2DList = newMadDataRec.getInd2DParms()
parmObjList = (self._oneDList, self._twoDList, self._ind2DList) # used for performance in load
# set self._num2DSplit
twoDSet = set([o.mnemonic for o in self._twoDList])
arraySplitSet = set(self._arraySplitParms)
self._num2DSplit = len(twoDSet.intersection(arraySplitSet))
self._privList.append(newMadDataRec)
recLoaded += 1
hdfFile.close()
# update minmax
if self._totalDataRecords > 0:
self.updateMinMaxParmDict()
return((recLoaded, isComplete))
def parmIsInt(
self, parm)
parmIsInt returns True if this parm (mnemonic) is integer type, False if not
Raises ValueError if parm not in record. or table dtype not yet set
def parmIsInt(self, parm):
"""parmIsInt returns True if this parm (mnemonic) is integer type, False if not
Raises ValueError if parm not in record. or table dtype not yet set
"""
if self._tableDType is None:
raise ValueError('Cannot call parmIsInt until a data record is added')
try:
typeStr = str(self._tableDType[parm.lower()])
except KeyError:
raise ValueError('Parm <%s> not found in file' % (str(parm)))
if typeStr.find('int') != -1:
return(True)
else:
return(False)
def parmIsString(
self, parm)
parmIsString returns True if this parm (mnemonic) is string type, False if not
Raises ValueError if parm not in record. or table dtype not yet set
def parmIsString(self, parm):
"""parmIsString returns True if this parm (mnemonic) is string type, False if not
Raises ValueError if parm not in record. or table dtype not yet set
"""
if self._tableDType is None:
raise ValueError('Cannot call parmIsInt until a data record is added')
try:
typeStr = str(self._tableDType[parm.lower()])
except KeyError:
raise ValueError('Parm <%s> not found in file' % (str(parm)))
if typeStr.lower().find('s') == -1:
return(False)
else:
return(True)
def pop(
self, i)
def pop(self, i):
return self._privList.pop(i)
def refreshSummary(
self)
refreshSummary rebuilds the recarray self._experimentParameters
def refreshSummary(self):
"""refreshSummary rebuilds the recarray self._experimentParameters
"""
inst = int(self.getKinstList()[0])
delimiter = ','
kinstCodes = []
kinstNames = []
for code in self.getKinstList():
kinstCodes.append(str(int(code)))
kinstNames.append(str(self._madInstObj.getInstrumentName(int(code))))
instrumentCodes = delimiter.join(kinstCodes)
instrumentName = delimiter.join(kinstNames)
categoryStr = self._madInstObj.getCategory(inst)
piStr = self._madInstObj.getContactName(inst)
piEmailStr = self._madInstObj.getContactEmail(inst)
startDateStr = self.getEarliestDT().strftime('%Y-%m-%d %H:%M:%S UT')
endDateStr = self.getLatestDT().strftime('%Y-%m-%d %H:%M:%S UT')
cedarFileName = str(os.path.basename(self._fullFilename))
statusDesc = self._status
instLat = self._madInstObj.getLatitude(inst)
instLon = self._madInstObj.getLongitude(inst)
instAlt = self._madInstObj.getAltitude(inst)
# create kindat description based on all kindats
kindatList = self.getKindatList()
kindatDesc = ''
kindatListStr = ''
if len(kindatList) > 1:
kindatDesc = 'This experiment has %i kinds of data. They are:' % (len(kindatList))
for i, kindat in enumerate(kindatList):
thisKindatDesc = self._madKindatObj.getKindatDescription(kindat, inst)
if not thisKindatDesc:
raise IOError('kindat %i undefined - please add to typeTab.txt' % (kindat))
thisKindatDesc = thisKindatDesc.strip()
kindatDesc += ' %i) %s (code %i)' % (i+1, thisKindatDesc, kindat)
kindatListStr += '%i' % (kindat)
if i < len(kindatList) - 1:
kindatDesc += ', '
kindatListStr += ', '
else:
kindatDesc = self._madKindatObj.getKindatDescription(kindatList[0], inst)
if not kindatDesc:
raise IOError('kindat for %s undefined - please add to typeTab.txt' % (str((kindatList[0], inst))))
kindatDesc = kindatDesc.strip()
kindatListStr += '%i' % (kindatList[0])
# create an expSummary numpy recarray
summArr = numpy.recarray((14,), dtype = [('name', h5py.special_dtype(vlen=str) ),
('value', h5py.special_dtype(vlen=str) )])
summArr['name'][0] = 'instrument'
summArr['name'][1] = 'instrument code(s)'
summArr['name'][2] = 'kind of data file'
summArr['name'][3] = 'kindat code(s)'
summArr['name'][4] = 'start time'
summArr['name'][5] = 'end time'
summArr['name'][6] = 'Cedar file name'
summArr['name'][7] = 'status description'
summArr['name'][8] = 'instrument latitude'
summArr['name'][9] = 'instrument longitude'
summArr['name'][10] = 'instrument altitude'
summArr['name'][11] = 'instrument category'
summArr['name'][12] = 'instrument PI'
summArr['name'][13] = 'instrument PI email'
summArr['value'][0] = instrumentName
summArr['value'][1] = instrumentCodes
summArr['value'][2] = kindatDesc
summArr['value'][3] = kindatListStr
summArr['value'][4] = startDateStr
summArr['value'][5] = endDateStr
summArr['value'][6] = cedarFileName
summArr['value'][7] = statusDesc
summArr['value'][8] = str(instLat)
summArr['value'][9] = str(instLon)
summArr['value'][10] = str(instAlt)
summArr['value'][11] = categoryStr
summArr['value'][12] = piStr
summArr['value'][13] = piEmailStr
self._experimentParameters = summArr
def remove(
self, x)
def remove(self, x):
self._privList.remove(x)
def reverse(
self)
def reverse(self):
self._privList.reverse()
def setDType(
self, dtype)
setDType sets the dtype of the table array
def setDType(self, dtype):
"""setDType sets the dtype of the table array
"""
self._tableDType = dtype
def setStatus(
self, status)
setStatus sets the status string
def setStatus(self, status):
"""setStatus sets the status string
"""
self._status = str(status)
def sort(
self)
def sort(self):
self._privList.sort()
def updateMinMaxParmDict(
self)
updateMinMaxParmDict updates self._minMaxParmDict
def updateMinMaxParmDict(self):
"""updateMinMaxParmDict updates self._minMaxParmDict
"""
for parm in self.get1DParms() + self.get2DParms():
self.getMaxMinValues(parm, True)
def write(
self, format='hdf5', newFilename=None, refreshCatHeadTimes=True, arraySplittingParms=None, skipArrayLayout=False, overwrite=False)
write persists a MadrigalCedarFile to file.
Note: There are two ways to write to a MadrigalCedarFile. Either this method (write) is called after all the records have been appended to the MadrigalCedarFile, or dump is called after a certain number of records are appended, and then at the end dump is called a final time if there were any records not yet dumped, followed by close. The del method will automatically call close if needed, and print a warning that the user should add it to their code.
write has the advantage of being simplier, but has the disadvantage for larger files of keeping all those records in memory. dump/close has the advantage of significantly reducing the memory footprint, but is somewhat more complex.
Inputs:
format - a format to save the file in. For now, the allowed values are
'hdf5' and 'netCDF4'. Defaults to 'hdf5'. Use writeText method to get text output.
newFilename - a filename to save to. Defaults to self._fullFilename passed into initializer if not given.
refreshCatHeadTimes - if True (the default), update start and and times in the catalog and header
records to represent the times in the data. If False, use existing times in those records.
skipArrayLayout - if True, do not include Array Layout even if there are independent spatial
parameters. If False (the default) write Array Layout if there are independent spatial
parameters and format = 'hdf5'
arraySplittingParms - a list of parameters as mnemonics used to split
arrays into subarrays. For example, beamcode would split data with separate beamcodes
into separate arrays. The number of separate arrays will be up to the product of the number of
unique values found for each parameter, with the restriction that combinations with no records will
not create a separate array. If default None passed in, then set to self._arraySplitParms,
set when CEDAR file read in.
overwrite - if False (the default) do not overwrite existing file. If True, overwrite file is it already exists.
Outputs: None
Affects: writes a MadrigalCedarFile to file
def write(self, format='hdf5', newFilename=None, refreshCatHeadTimes=True,
arraySplittingParms=None, skipArrayLayout=False, overwrite=False):
"""write persists a MadrigalCedarFile to file.
Note: There are two ways to write to a MadrigalCedarFile. Either this method (write) is called after all the
records have been appended to the MadrigalCedarFile, or dump is called after a certain number of records are appended,
and then at the end dump is called a final time if there were any records not yet dumped, followed by close.
The __del__ method will automatically call close if needed, and print a warning that the user should add it to
their code.
write has the advantage of being simplier, but has the disadvantage for larger files of keeping all those records
in memory. dump/close has the advantage of significantly reducing the memory footprint, but is somewhat more complex.
Inputs:
format - a format to save the file in. For now, the allowed values are
'hdf5' and 'netCDF4'. Defaults to 'hdf5'. Use writeText method to get text output.
newFilename - a filename to save to. Defaults to self._fullFilename passed into initializer if not given.
refreshCatHeadTimes - if True (the default), update start and and times in the catalog and header
records to represent the times in the data. If False, use existing times in those records.
skipArrayLayout - if True, do not include Array Layout even if there are independent spatial
parameters. If False (the default) write Array Layout if there are independent spatial
parameters and format = 'hdf5'
arraySplittingParms - a list of parameters as mnemonics used to split
arrays into subarrays. For example, beamcode would split data with separate beamcodes
into separate arrays. The number of separate arrays will be up to the product of the number of
unique values found for each parameter, with the restriction that combinations with no records will
not create a separate array. If default None passed in, then set to self._arraySplitParms,
set when CEDAR file read in.
overwrite - if False (the default) do not overwrite existing file. If True, overwrite file is it already exists.
Outputs: None
Affects: writes a MadrigalCedarFile to file
"""
if self._format != None:
raise ValueError('Cannot call write method after calling dump method')
if newFilename is None:
newFilename = self._fullFilename
if format not in ('hdf5', 'netCDF4'):
raise ValueError('Illegal format <%s> - must be hdf5 or netCDF4' % (format))
if os.access(newFilename, os.R_OK) and not overwrite:
raise IOError('newFilename <%s> already exists' % (newFilename))
self._format = format
if arraySplittingParms is None:
arraySplittingParms = self._arraySplitParms
if arraySplittingParms is None:
arraySplittingParms = []
if self._format == 'hdf5':
if not newFilename.endswith(self._hdf5Extensions):
raise IOError('filename must end with %s, <%s> does not' % (str(self._hdf5Extensions), newFilename))
try:
# we need to make sure this file is closed and then deleted if an error
f = None # used if next line fails
f = h5py.File(newFilename, 'w')
self._writeHdf5Metadata(f, refreshCatHeadTimes)
self._writeHdf5Data(f)
if len(self.getIndSpatialParms()) > 0:
self._createArrayLayout(f, arraySplittingParms)
f.close()
except:
# on any error, close and delete file, then reraise error
if f:
f.close()
if os.access(newFilename, os.R_OK):
os.remove(newFilename)
raise
elif self._format == 'netCDF4':
try:
# we need to make sure this file is closed and then deleted if an error
f = None # used if next line fails
f = netCDF4.Dataset(newFilename, 'w', format='NETCDF4')
self._writeNetCDF4(f, arraySplittingParms)
f.close()
except:
# on any error, close and delete file, then reraise error
if f:
f.close()
if os.access(newFilename, os.R_OK):
os.remove(newFilename)
raise
self._closed = True # write ends with closed file
def writeExperimentNotes(
self, metadataGroup, refreshCatHeadTimes)
writeExperimentNotes writes the "Experiment Notes" dataset to the h5py group metadataGroup if any catalog or header records found.
refreshCatHeadTimes - if True, update start and and times in the catalog and header
records to represent the times in the data. If False, use existing times in those records.
def writeExperimentNotes(self, metadataGroup, refreshCatHeadTimes):
"""writeExperimentNotes writes the "Experiment Notes" dataset to the h5py group metadataGroup
if any catalog or header records found.
refreshCatHeadTimes - if True, update start and and times in the catalog and header
records to represent the times in the data. If False, use existing times in those records.
"""
# templates
cat_template = 'Catalog information from record %i:'
head_template = 'Header information from record %i:'
if "Experiment Notes" in list(metadataGroup.keys()):
# already exists
return
recDict = {} # key = rec number, value = tuple of recarray of lines, 'Catalog' or 'Header' str)
for i, rec in enumerate(self._privList):
if rec.getType() == 'catalog':
if refreshCatHeadTimes:
sDT = self.getEarliestDT()
eDT = self.getLatestDT()
rec.setTimeLists(sDT.year, sDT.month, sDT.day,
sDT.hour, sDT.minute, sDT.second, int(sDT.microsecond/10000),
eDT.year, eDT.month, eDT.day,
eDT.hour, eDT.minute, eDT.second, int(eDT.microsecond/10000))
recarray = rec.getLines()
recDict[i] = (recarray, 'Catalog')
elif rec.getType() == 'header':
if refreshCatHeadTimes:
sDT = self.getEarliestDT()
eDT = self.getLatestDT()
rec.setTimeLists(sDT.year, sDT.month, sDT.day,
sDT.hour, sDT.minute, sDT.second, int(sDT.microsecond/10000),
eDT.year, eDT.month, eDT.day,
eDT.hour, eDT.minute, eDT.second, int(eDT.microsecond/10000))
recarray = rec.getLines()
recDict[i] = (recarray, 'Header')
keys = list(recDict.keys())
keys.sort()
if len(keys) == 0:
return
recarray = None
for key in keys:
new_recarray = numpy.recarray((2,), dtype=[('File Notes', h5py.special_dtype(vlen=str))])
if recDict[key][1] == 'Catalog':
topStr = cat_template % (key)
else:
topStr = head_template % (key)
new_recarray[0]['File Notes'] = topStr + ' ' * (80 - len(topStr))
new_recarray[1]['File Notes'] = ' ' * 80
if recarray is None:
recarray = new_recarray
else:
recarray = numpy.concatenate((recarray, new_recarray))
recarray = numpy.concatenate((recarray, recDict[key][0]))
metadataGroup.create_dataset('Experiment Notes', data=recarray)
def writeText(
self, newFilename=None, summary='plain', showHeaders=False, selectParms=None, filterList=None, missing=None, assumed=None, knownbad=None, append=False, firstWrite=False)
writeText writes text to new filename
Inputs:
newFilename - name of new file to create and write to. If None, the default, write to stdout
summary - type of summary line to print at top. Allowed values are:
'plain' - text only mnemonic names, but only if not showHeaders
'html' - mnemonic names wrapped in standard javascript code to allow descriptive popups
'summary' - print overview of file and filters used. Also text only mnemonic names,
but only if not showHeaders
None - no summary line
showHeaders - if True, print header in format for each record. If False, the default,
do not.
selectParms - If None, simply print all parms that are in the file. If a list
of parm mnemonics, print only those parms in the order specified.
filterList - a list of madrigal.derivation.MadrigalFilter objects to be described in the
summary. Default is None, in which case not described in summary. Ignored if summary
is not 'summary'
missing, assumed, knownbad - how to print Cedar special values. Default is None for
all, so that value printed in value in numpy table as per spec.
append - if True, open newFilename in append mode, and dump records after writing. If False,
open in write mode. Used to allow writing in conjuction with loadNextRecords.
firstWrite - True if this is the first group of records added, and append mode is True.
Used to know whether to write summary lines. If False and append is True, no summary
lines are added; if True and append is True, summary lines are added. If append is not
True, this argument ignored.
def writeText(self, newFilename=None, summary='plain', showHeaders=False, selectParms=None,
filterList=None, missing=None, assumed=None, knownbad=None, append=False,
firstWrite=False):
"""writeText writes text to new filename
Inputs:
newFilename - name of new file to create and write to. If None, the default, write to stdout
summary - type of summary line to print at top. Allowed values are:
'plain' - text only mnemonic names, but only if not showHeaders
'html' - mnemonic names wrapped in standard javascript code to allow descriptive popups
'summary' - print overview of file and filters used. Also text only mnemonic names,
but only if not showHeaders
None - no summary line
showHeaders - if True, print header in format for each record. If False, the default,
do not.
selectParms - If None, simply print all parms that are in the file. If a list
of parm mnemonics, print only those parms in the order specified.
filterList - a list of madrigal.derivation.MadrigalFilter objects to be described in the
summary. Default is None, in which case not described in summary. Ignored if summary
is not 'summary'
missing, assumed, knownbad - how to print Cedar special values. Default is None for
all, so that value printed in value in numpy table as per spec.
append - if True, open newFilename in append mode, and dump records after writing. If False,
open in write mode. Used to allow writing in conjuction with loadNextRecords.
firstWrite - True if this is the first group of records added, and append mode is True.
Used to know whether to write summary lines. If False and append is True, no summary
lines are added; if True and append is True, summary lines are added. If append is not
True, this argument ignored.
"""
# constants
_underscore = 95 # used to indicate missing character
if newFilename is not None:
if append:
f = open(newFilename, 'a')
else:
f = open(newFilename, 'w')
else:
f = sys.stdout
if summary not in ('plain', 'summary', 'html', None):
raise ValueError('Illegal summary value <%s>' % (str(summary)))
# cache information needed to replace special values if needed
# helps performance when replacing
if missing is not None:
missing = str(missing)
missing_len = len(missing)
missing_search = '\ ' * max(0, missing_len-3) + 'nan'
if missing_len < 3:
missing = ' ' * (3-missing_len) + missing
if assumed is not None:
assumed = str(assumed)
assumed_len = len(assumed)
assumed_search = '\ ' * max(0, assumed_len-3) + 'inf'
if assumed_len < 3:
assumed = ' ' * (3-assumed_len) + assumed
if knownbad is not None:
knownbad = str(knownbad)
knownbad_len = len(knownbad)
knownbad_search = '\ ' * max(0, knownbad_len-4) + '-inf'
if knownbad_len < 4:
knownbad = ' ' * (4-knownbad_len) + knownbad
# create format string and header strings
formatStr = ''
parmStr = ''
if not selectParms is None:
names = selectParms
else:
names = self._tableDType.names
for parm in names:
parm = parm.upper()
format = self._madParmObj.getParmFormat(parm)
try:
# first handle float formats
dataWidth = int(format[1:format.find('.')])
# make sure width is big enough for special values
newDataWidth = dataWidth
if missing is not None:
newDataWidth = max(newDataWidth, len(missing)+1)
if self._madParmObj.isError(parm):
if assumed is not None:
newDataWidth = max(newDataWidth, dataWidth, len(assumed)+1)
if knownbad is not None:
newDataWidth = max(newDataWidth, dataWidth, len(knownbad)+1)
if newDataWidth > dataWidth:
# we need to expand format
format = '%%%i%s' % (newDataWidth, format[format.find('.'):])
dataWidth = newDataWidth
except ValueError:
# now handle integer or string formats - assumed to never be error values
if format.find('i') != -1:
if len(format) == 2:
# we need to insert a length
format = '%%%ii' % (self._madParmObj.getParmWidth(parm)-1)
dataWidth = self._madParmObj.getParmWidth(parm)
else:
dataWidth = int(format[1:-1])
elif format.find('S') != -1 or format.find('s') != -1:
dataWidth = int(format[1:-1])
else:
raise
width = max(self._madParmObj.getParmWidth(parm), dataWidth)
formatStr += '%s' % (format)
formatStr += ' ' * (max(1, width-dataWidth)) # sets spacing between numbers
if len(parm) >= width-1:
# need to truncate name
if summary != 'html':
parmStr += parm[:width-1] + ' '
else:
parmStr += "%s " % (parm[:width-1].upper(),
self._madParmObj.getSimpleParmDescription(parm),
self._madParmObj.getParmUnits(parm),
parm[:width-1].upper())
else:
# pad evenly on both sides
firstHalfSpace = int((width-len(parm))/2)
secHalfSpace = int((width-len(parm)) - firstHalfSpace)
if summary != 'html':
parmStr += ' ' * firstHalfSpace + parm.upper() + ' ' * secHalfSpace
else:
parmStr += ' ' * firstHalfSpace
parmStr += "%s" % (parm[:width-1].upper(),
self._madParmObj.getSimpleParmDescription(parm),
self._madParmObj.getParmUnits(parm),
parm[:width-1].upper())
parmStr += ' ' * secHalfSpace
formatStr += '\n'
firstHeaderPrinted = False # state variable for adding extra space between lines
if summary == 'summary':
if not append or (append and firstWrite):
self._printSummary(f, filterList)
if summary in ('plain', 'summary', 'html') and not showHeaders:
if not append or (append and firstWrite):
# print single header at top
f.write('%s\n' % (parmStr))
if summary == 'html':
f.write('
\n')
if len(self._privList) == 0:
# nothing more to write
if f != sys.stdout:
f.close()
return
# see if only 1D parms are selected, which implies printing only a single line per record
is1D = False
if not selectParms is None:
#make sure its a lowercase list
selectParms = list(selectParms)
selectParms = getLowerCaseList(selectParms)
# see if only 1D parameters are being printed, so that we should only print the first row
is1D = True
recordset = self.getRecordset()
for parm in selectParms:
if recordset[parm][0] != 1:
is1D = False
break
for rec in self._privList:
if rec.getType() != 'data':
continue
if showHeaders:
kinst = rec.getKinst()
instDesc = self._madInstObj.getInstrumentName(kinst)
sDT = rec.getStartDatetime()
sDTStr = sDT.strftime('%Y-%m-%d %H%M:%S')
eDT = rec.getEndDatetime()
eDTStr = eDT.strftime('%H%M:%S')
headerStr = '%s: %s-%s\n' % (instDesc, sDTStr, eDTStr)
if firstHeaderPrinted or summary is None:
f.write('\n%s' % (headerStr))
else:
f.write('%s' % (headerStr))
firstHeaderPrinted = True
f.write('%s\n' % (parmStr))
dataset = rec.getDataset()
if not selectParms is None:
recnoSet = dataset['recno'].copy() # used to see if we are at a new record
dataset_view = dataset[selectParms].copy()
else:
dataset_view = dataset
# modify special values if required
if assumed is not None or knownbad is not None:
for name in dataset_view.dtype.names:
if self._madParmObj.isError(name) and not self.parmIsInt(name):
if assumed is not None:
# set all -1 values to inf
assumedIndices = numpy.where(dataset_view[name] == -1.0)
if len(assumedIndices):
dataset_view[name][assumedIndices] = numpy.Inf
if knownbad is not None:
# set all -2 values to ninf
knownbadIndices = numpy.where(dataset_view[name] == -2.0)
if len(knownbadIndices):
dataset_view[name][knownbadIndices] = numpy.NINF
lastRecno = None
for i in range(len(dataset_view)):
if not selectParms is None:
thisRecno = recnoSet[i]
if is1D and (thisRecno == lastRecno):
continue
data = tuple(list(dataset_view[i]))
try:
text = formatStr % data
except:
# something bad happened - give up and just convert data to a string
textList = [str(item) for item in data]
delimiter = ' '
text = delimiter.join(textList) + '\n'
# modify special values if required
if missing is not None:
if text.find('nan') != -1:
text = re.sub(missing_search, missing, text)
if knownbad is not None:
if text.find('-inf') != -1:
text = re.sub(knownbad_search, knownbad, text)
if assumed is not None:
if text.find('inf') != -1:
text = re.sub(assumed_search, assumed, text)
if summary != 'html':
f.write(text)
else:
f.write(text.replace(' ', ' '))
if summary == 'html':
f.write('
\n')
if not selectParms is None:
lastRecno = thisRecno
if f != sys.stdout:
f.close()
if append:
# remove all records
self._privList = []
class MadrigalDataRecord
MadrigalDataRecord holds all the information in a Cedar data record.
class MadrigalDataRecord:
"""MadrigalDataRecord holds all the information in a Cedar data record."""
# cedar special values
missing = numpy.nan
assumed = -1.0
knownbad = -2.0
missing_int = numpy.iinfo(numpy.int64).min
assumed_int = -1
knownbad_int = -2
# standard parms
_stdParms = ['year', 'month', 'day', 'hour', 'min', 'sec',
'recno', 'kindat', 'kinst', 'ut1_unix', 'ut2_unix']
def __init__(self,kinst=None,
kindat=None,
sYear=None,sMonth=None,sDay=None,sHour=None,sMin=None,sSec=None,sCentisec=None,
eYear=None,eMonth=None,eDay=None,eHour=None,eMin=None,eSec=None,eCentisec=None,
oneDList=None,
twoDList=None,
nrow=None,
madInstObj=None,
madParmObj=None,
ind2DList=None,
dataset=None, recordSet=None, verbose=True,
parmObjList=None,
madDB=None):
"""__init__ creates a MadrigalDataRecord with all missing data.
Note: all inputs have default values because there are two ways to populate this structure:
1) with all inputs from kinst to nrow when new data is being created, or
2) with numpy arrays dataset and recordSet from existing Hdf5 file.
Inputs:
kinst - the kind of instrument code. A warning will be raised if not in instTab.txt.
Default is None, in which case recno, dataset, and recordSet must be given.
kindat - kind of data code. Must be a non-negative integer.
Default is None, in which case recno, dataset, and recordSet must be given.
sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec - record start time. sCentisec must be 0-99
Default is None, in which case recno, dataset, and recordSet must be given.
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec - record end time. eCentisec must be 0-99
Default is None, in which case recno, dataset, and recordSet must be given.
oneDList - list of one-dimensional parameters in record. Parameters can be defined as codes
(integers) or case-insensitive mnemonic strings (eg, "Gdalt"), or CedarParameter objects.
Default is None, in which case recno, dataset, and recordSet must be given.
twoDList - list of two-dimensional parameters in record. Parameters can be defined as codes
(integers) or case-insensitive mnemonic strings (eg, "Gdalt"), or CedarParameter objects.
Default is None, in which case recno, dataset, and recordSet must be given.
nrow - number of rows of 2D data to create. Until set, all values default to missing.
Default is None, in which case recno, dataset, and recordSet must be given.
madInstObj - a madrigal.metadata.MadrigalInstrument object. If None, one will be created.
Used to verify kinst.
madParmObj - a madrigal.data.MadrigalParameter object. If None, one will be created.
Used to verify convert parameters to codes.
ind2DList - list of indepedent spatial two-dimensional parameters in record.
Parameters can be defined as codes. Each must also be listed in twoDList.
(integers) or case-insensitive mnemonic strings (eg, "Gdalt"), or CedarParameter objects.
Default is None, in which case dataset, and recordSet must be given.
dataset - an h5py dataset, as found in the Hdf5 group "Data", dataset "Table Layout".
Set to None if this is a new record.
recordSet - an h5py dataset, as found in the Hdf5 group "Metadata", dataset "_record_layout".
Set to None if this is a new record.
verbose - if True (the default), print all warnings. If False, surpress warnings
parmObjList - a list or tuple of three lists: self._oneDList, self._twoDList, and
self._ind2DList described below. Used only to speed performance. Default is
None, in which case new copies are created.
madDB - madrigal.metadata.MadrigalDB object. If None, one will be created.
Outputs: None
Returns: None
Affects:
Creates attributes:
self._madInstObj - madrigal.metadata.MadrigalInstrument object
self._madParmObj - madrigal.data.MadrigalParameters object
self._dataset - h5py dataset in from of Table Layout numpy recarray
self._recordSet - h5py dataset in form of _record_layout numpy recarray
self._verbose - bool indicating verbose or not
self._oneDList - a list of 1D CedarParameter objects in this MadrigalDataRecord
self._twoDList - a list of 2D CedarParameter objects in this MadrigalDataRecord
self._ind2DList - a list of independent spatial parameters in self._twoDList
"""
if madDB is None:
self._madDB = madrigal.metadata.MadrigalDB()
else:
self._madDB = madDB
# create any needed Madrigal objects, if not passed in
if madInstObj is None:
self._madInstObj = madrigal.metadata.MadrigalInstrument(self._madDB)
else:
self._madInstObj = madInstObj
if madParmObj is None:
self._madParmObj = madrigal.data.MadrigalParameters(self._madDB)
else:
self._madParmObj = madParmObj
if twoDList is None:
twoDList = []
if ind2DList is None:
ind2DList = []
if dataset is None or recordSet is None:
if ind2DList is None:
# get it from cachedFiles.ini
extraParms, ind2DList, splitParms = self._madDB.getKinstKindatConfig(kinst, kindat)
# verify there are independent spatial parms if there are 2D parms
if not len(twoDList) == 0 and len(ind2DList) == 0:
raise ValueError('Cannot have 2D parms without an independent spatial parm set')
self._createArraysFromArgs(kinst,kindat,sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec,
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec,
oneDList,twoDList,nrow,ind2DList)
else:
# verify there are independent spatial parms if there are 2D parms
if not len(twoDList) == 0 and len(ind2DList) == 0:
raise ValueError('Cannot have 2D parms without an independent spatial parm set')
self._dataset = dataset
self._recordSet = recordSet
if parmObjList is not None:
self._oneDList = copy.deepcopy(parmObjList[0])
self._twoDList = copy.deepcopy(parmObjList[1])
self._ind2DList = copy.deepcopy(parmObjList[2])
else:
# populate self._oneDList, self._twoDList, and self._ind2DList
self._oneDList = []
self._twoDList = []
self._ind2DList = []
for parm in self._recordSet.dtype.names[len(self._stdParms):]:
if self.parmIsInt(parm):
isInt = True
else:
isInt = False
newCedarParm = CedarParameter(self._madParmObj.getParmCodeFromMnemonic(parm),
parm, self._madParmObj.getParmDescription(parm),
isInt)
if self._recordSet[parm][0] == 1:
self._oneDList.append(newCedarParm)
if self._recordSet[parm][0] in (2,3):
self._twoDList.append(newCedarParm)
if self._recordSet[parm][0] == 3:
self._ind2DList.append(newCedarParm)
self._verbose = bool(verbose)
def getType(self):
""" returns the type 'data'"""
return 'data'
def getDType(self):
"""getDType returns the dtype of the table array with this data
"""
return(self._dataset.dtype)
def getRecDType(self):
"""getRecDType returns the dtype of _record_array
"""
return(self._recordSet.dtype)
def getDataset(self):
"""getDataset returns the dataset table
"""
return(self._dataset)
def getRecordset(self):
"""getRecordset returns the recordSet table
"""
return(self._recordSet)
def parmIsInt(self, parm):
"""parmIsInt returns True if this parm (mnemonic) is integer type, False if float or string
Raises ValueError if parm not in record
"""
try:
typeStr = str(self._dataset.dtype[parm.lower()].kind)
except KeyError:
raise ValueError('Parm <%s> not found in file' % (str(parm)))
if typeStr.find('i') != -1:
return(True)
else:
return(False)
def parmIsString(self, parm):
"""parmIsString returns True if this parm (mnemonic) is String type, False if float or int
Raises ValueError is parm not in record
"""
try:
typeStr = str(self._dataset.dtype[parm.lower()].kind)
except KeyError:
raise ValueError('Parm <%s> not found in file' % (str(parm)))
if typeStr.find('S') != -1:
return(True)
else:
return(False)
def getStrLen(self, parm):
"""getStrLen returns True if this parm (mnemonic) is integer type, False if float or string
Raises ValueError is parm not in record, or is not String
"""
if not self.parmIsString(parm):
raise ValueError('Parm <%s> not string type' % (str(parm)))
return(self._dataset.dtype[parm.lower()].itemsize)
def add1D(self, oneDParm):
"""add1D adds a new one-dim parameter to a MadrigalDataRecord
Input: oneDParm - Parameter can be defined as codes (integer) or case-insensitive
mnemonic string (eg, "Gdalt")
Affects: 1) adds new column to self._dataset with all values Nan, and 2) adds
value to end of self._recordSet with value = 1 since 1D parm
If these addition makes self._dataset.dtype differ from that in MadrigalCedarFile, appending this
MadrigalDataRecord to MadrigalCedarFile will raise an IOError. Also raises error if parm already exists.
"""
self.addParm(oneDParm, 1)
def add2D(self, twoDParm):
"""add2D adds a new two-dim parameter to a MadrigalDataRecord
Input: twoDParm - Parameter can be defined as codes (integer) or case-insensitive
mnemonic string (eg, "Gdalt")
Affects: 1) adds new column to self._dataset with all values Nan, and 2) adds
value to end of self._recordSet with value = 2 since 2D parm
If these addition makes self._dataset.dtype differ from that in MadrigalCedarFile, appending this
MadrigalDataRecord to MadrigalCedarFile will raise an IOError. Also raises error if parm already exists.
"""
self.addParm(twoDParm, 2)
def addParm(self, newParm, dim):
"""addParm adds a new one or two-dim parameter to a MadrigalDataRecord
Input: newParm - Parameter can be defined as codes (integer) or case-insensitive
mnemonic string (eg, "Gdalt")
dim - either 1 for scalar, or 2 for vector parm
Affects: 1) adds new column to self._dataset with all values Nan, and 2) adds
value to end of self._recordSet with value = dim
If these addition makes self._dataset.dtype differ from that in MadrigalCedarFile, appending this
MadrigalDataRecord to MadrigalCedarFile will raise an IOError. Also raises error if parm already exists.
"""
if dim not in (1,2):
raise ValueError('dim must be 1 or 2, not %s' % (str(dim)))
# see if its an integer
try:
code = int(newParm)
isInt = True
except:
isInt = False
if isInt:
# try to look up mnemonic
mnem = self._madParmObj.getParmMnemonic(int(newParm)).lower()
if mnem == str(newParm):
raise IOError('Cannot use unknown parm %i' % (int(newParm)))
else:
# this must succeed or an exception raised
try:
code = self._madParmObj.getParmCodeFromMnemonic(newParm.lower())
except ValueError:
raise IOError('Mnem %s not found' % (newParm))
mnem = newParm.lower()
# issue warning if an unneeded time parameter being added
if self._verbose and abs(code) < timeParms:
sys.stderr.write('WARNING: Parameter %s is a time parameter that potentially conflicts with prolog times\n' % (parm[1]))
# figure out dtype
format = self._madParmObj.getParmFormat(mnem)
if format[-1] == 'i':
dtype = numpy.int64
else:
dtype = numpy.float64
data = numpy.zeros((len(self._dataset),), dtype)
data[:] = numpy.nan
self._dataset = numpy.lib.recfunctions.append_fields(self._dataset, mnem, data)
data = numpy.array([dim], numpy.int64)
self._recordSet = numpy.lib.recfunctions.append_fields(self._recordSet, mnem, data, usemask=False)
newCedarParm = CedarParameter(self._madParmObj.getParmCodeFromMnemonic(mnem),
mnem, self._madParmObj.getParmDescription(mnem),
isInt)
if dim == 1:
self._oneDList.append(newCedarParm)
else:
self._twoDList.append(newCedarParm)
def set1D(self, parm, value):
"""set1D sets a 1D value for a given 1D parameter
Inputs:
parm - can be defined as code (integer) or case-insensitive mnemonic string (eg, "Gdalt")
value - double (or string convertable to double) value to set 1D parameter to. To set special Cedar values, the global values
missing, assumed, or knownbad may be used, or the strings "missing", "assumed", or "knownbad"
May also be int or string if that type
Outputs: None
"""
parm = self._madParmObj.getParmMnemonic(parm).lower()
if value == 'missing':
if self.parmIsInt(parm):
value = self.missing_int
elif self.parmIsString(parm):
value = ' ' * self.getStrLen(parm)
else:
value = self.missing
if self._madParmObj.isError(parm):
if value == 'assumed':
if self.parmIsInt(parm):
value = self.assumed_int
else:
value = self.assumed
elif value == 'knownbad':
if self.parmIsInt(parm):
value = self.knownbad_int
else:
value = self.knownbad
elif value in ('assumed', 'knownbad'):
raise ValueError('It is illegal to set the non-error parm %s to %s' % (parm, value))
# make sure this is a one-d parm
try:
dim = self._recordSet[parm]
except ValueError:
raise ValueError('parm %s does not exist' % (str(parm)))
if dim != 1:
raise ValueError('parm %s is 2D, not 1D' % (str(parm)))
# set it
self._dataset[parm] = value
def set2D(self, parm, row, value):
"""set2D sets a 2D value for a given 2D parameter and row
Inputs:
parm - can be defined as code (integer) or case-insensitive mnemonic string (eg, "Gdalt")
row - row number to set data. Starts at 0.
value - double (or string convertable to double) value to set 2D parameter to. To set special Cedar values, the global values
missing, assumed, or knownbad may be used, or the strings "missing", "assumed", or "knownbad"
May also be int or string if that type
Outputs: None
"""
if row >= len(self._dataset) or row < 0:
raise ValueError('Illegal value of row %i with nrow = %i' % (row,len(self._dataset)))
parm = self._madParmObj.getParmMnemonic(parm).lower()
isString = self._madParmObj.isString(parm)
if value == 'missing':
if self.parmIsInt(parm):
value = self.missing_int
elif self.parmIsString(parm):
value = ' ' * self.getStrLen(parm)
else:
value = self.missing
if self._madParmObj.isError(parm):
if value == 'assumed':
if self.parmIsInt(parm):
value = self.assumed_int
else:
value = self.assumed
elif value == 'knownbad':
if self.parmIsInt(parm):
value = self.knownbad_int
else:
value = self.knownbad
elif value in ('assumed', 'knownbad'):
raise ValueError('It is illegal to set the non-error parm %s to %s' % (parm, value))
# make sure this is a two-d parm
try:
dim = self._recordSet[parm]
except ValueError:
raise ValueError('parm %s does not exist' % (str(parm)))
if dim not in (2, 3):
raise ValueError('parm %s is 1D, not 2D' % (str(parm)))
# if its ind parm, make sure its not nan
if parm in self._ind2DList:
if not isString:
if numpy.isnan(value):
raise ValueError('Cannot set ind parm %s to nan at row %i' % (parm, row))
# set it
self._dataset[parm][row] = value
def set2DParmValues(self, parm, values):
"""set2DParmValues sets all 2D value in all rows for a given 2D parameter
Inputs:
parm - can be defined as code (integer) or case-insensitive mnemonic string (eg, "Gdalt")
value - list, tuple, or numpy array of int64 or float64 type. Must match len of nrows. User is responsible
for having set all special values to missing, assumed and knownbad as defined at top
of this class for either ints or floats
Outputs: None
"""
# make sure this is a two-d parm
parm = self._madParmObj.getParmMnemonic(parm).lower()
isString = self._madParmObj.isString(parm)
try:
dim = self._recordSet[parm]
except ValueError:
raise ValueError('parm %s does not exist' % (str(parm)))
if dim not in (2, 3):
raise ValueError('parm %s is 1D, not 2D' % (str(parm)))
if parm in self._ind2DList:
if not isString:
if numpy.any(numpy.isnan(values)):
raise ValueError('Cannot set any ind parm %s value to nan: %s' % (parm, str(values)))
# set it
self._dataset[parm] = values
def get1D(self, parm):
"""get1D returns the 1D value for a given 1D parameter
Inputs:
parm - can be defined as code (integer) or case-insensitive mnemonic string (eg, "Gdalt")
Outputs: value
"""
# make sure this is a one-d parm
parm = self._madParmObj.getParmMnemonic(parm).lower()
isString = self._madParmObj.isString(parm)
try:
dim = self._recordSet[parm]
except ValueError:
raise ValueError('parm %s does not exist' % (str(parm)))
if dim != 1:
raise ValueError('parm %s is 2D, not 1D' % (str(parm)))
value = self._dataset[parm][0]
# check for special values
if not isString:
if numpy.isnan(value):
return('missing')
# if its an error parameter, allow assumed or knownbad
if self._madParmObj.isError(parm):
if int(value) == self.assumed_int:
return('assumed')
if int(value) == self.knownbad_int:
return('knownbad')
return value
def get2D(self, parm, row):
"""get2D returns the 2D value for a given 2D parameter
Inputs:
parm - can be defined as code (integer) or case-insensitive mnemonic string (eg, "Gdalt")
row - row number to get data. Starts at 0.
Outputs: double value, or the strings "missing", "assumed", or "knownbad"
"""
if row >= len(self._dataset) or row < 0:
raise ValueError('Illegal value of row %i with nrow = %i' % (row,len(self._dataset)))
# make sure this is a two-d parm
parm = self._madParmObj.getParmMnemonic(parm).lower()
isString = self._madParmObj.isString(parm)
try:
dim = self._recordSet[parm]
except ValueError:
raise ValueError('parm %s does not exist' % (str(parm)))
if dim not in (2,3):
raise ValueError('parm %s is 1D, not 2D' % (str(parm)))
value = self._dataset[parm][row]
# check for special values
if not isString:
if numpy.isnan(value):
return('missing')
# if its an error parameter, allow assumed or knownbad
if self._madParmObj.isError(parm):
if int(value) == self.assumed_int:
return('assumed')
if int(value) == self.knownbad_int:
return('knownbad')
return value
def getRow(self, row):
"""getRow returns the row of data in order defined in self._dataset.dtype
Input: row number
IndexError raised if not a valid row index
"""
return(self._dataset[row])
def setRow(self, row, values):
"""setRow sets an entire row of data at once
Inputs:
row - row number to set
values - a tuple of values in the right format to match self._dataset.dtype
"""
self._dataset[row] = values
def delete1D(self, parm):
"""delete1D removes the given 1D parameter from the record
Inputs:
parm - can be defined as code (integer) or case-insensitive mnemonic string (eg, "Gdalt")
Outputs: None
Raise exception if 1D parm does not exist. If this deletion makes self._dataset.dtype differ from that in
MadrigalCedarFile, appending this MadrigalDataRecord to MadrigalCedarFile will raise an IOError.
"""
# make sure this is a one-d parm
parm = self._madParmObj.getParmMnemonic(parm).lower()
try:
dim = self._recordSet[parm]
except ValueError:
raise ValueError('parm %s does not exist' % (str(parm)))
if dim != 1:
raise ValueError('parm %s is 2D, not 1D' % (str(parm)))
self._dataset = numpy.lib.recfunctions.drop_fields(self._dataset, parm)
self._recordSet = numpy.lib.recfunctions.drop_fields(self._recordSet, parm)
# find index to delete from self._oneDList
index = None
for i, parmObj in enumerate(self._oneDList):
if parmObj.mnemonic == parm:
index = i
break
if index is None:
raise ValueError('Did not find parm %s in self._oneDList' % (str(parm)))
del self._oneDList[index]
def delete2DParm(self, parm):
"""delete2DParm removes the given 2D parameter from every row in the record
Inputs:
parm - can be defined as code (integer) or case-insensitive mnemonic string (eg, "Gdalt")
Outputs: None
Raise exception if 2D parm does not exist. If this deletion makes self._dataset.dtype differ from that in
MadrigalCedarFile, appending this MadrigalDataRecord to MadrigalCedarFile will raise an IOError.
"""
# make sure this is a two-d parm
parm = self._madParmObj.getParmMnemonic(parm).lower()
try:
dim = self._recordSet[parm]
except ValueError:
raise ValueError('parm %s does not exist' % (str(parm)))
if dim not in (2,3):
raise ValueError('parm %s is 1D, not 2D' % (str(parm)))
self._dataset = numpy.lib.recfunctions.drop_fields(self._dataset, parm)
self._recordSet = numpy.lib.recfunctions.drop_fields(self._recordSet, parm)
# find index to delete from self._twoDList
index = None
for i, parmObj in enumerate(self._twoDList):
if parmObj.mnemonic == parm:
index = i
break
if index is None:
raise ValueError('Did not find parm %s in self._twoDList' % (str(parm)))
del self._twoDList[index]
def delete2DRows(self, rows):
"""delete2DRows removes the given 2D row or rows in the record (first is row 0)
Inputs:
row number (integer) or list of row numbers to delete (first is row 0)
Outputs: None
Raise exception if row does not exist
"""
# make sure row is a list
if type(rows) in (int, int):
rows = [rows]
keepIndices = []
count = 0 # make sure all rows actually exist
for i in range(self.getNrow()):
if i not in rows:
keepIndices.append(i)
else:
count += 1
if count != len(rows):
raise ValueError('Some row in %s out of range, total number of rows is %i' % (str(rows), self.getNrow()))
self._dataset = self._dataset[keepIndices]
def getKinst(self):
"""getKinst returns the kind of instrument code (int) for a given data record.
Inputs: None
Outputs: the kind of instrument code (int) for a given data record.
"""
return(self._dataset['kinst'][0])
def setKinst(self, newKinst):
"""setKinst sets the kind of instrument code (int) for a given data record.
Inputs: newKinst - new instrument code (integer)
Outputs: None
Affects: sets self._dataset['kinst']
"""
newKinst = int(newKinst)
if newKinst < 0:
raise ValueError('Kinst must not be less than 0, not %i' % (newKinst))
# verify and set kinst
instList = self._madInstObj.getInstrumentList()
found = False
for inst in instList:
if inst[2] == newKinst:
self._instrumentName = inst[0]
found = True
break
if found == False:
self._instrumentName = 'Unknown instrument'
sys.stderr.write('Warning: kinst %i not found in instTab.txt\n' % (newKinst))
self._dataset['kinst'] = newKinst
def getKindat(self):
"""getKindat returns the kind of data code (int) for a given data record.
Inputs: None
Outputs: the kind of data code (int) for a given data record.
"""
return(self._dataset['kindat'][0])
def setKindat(self, newKindat):
"""setKindat sets the kind of data code (int) for a given data record.
Inputs: newKindat (integer)
Outputs: None
Affects: sets self._dataset['kindat']
"""
if int(newKindat) < 0:
raise ValueError('kindat cannot be negative: %i' % (int(newKindat)))
self._dataset['kindat'] = int(newKindat)
def getRecno(self):
"""getRecno returns the recno (int) for a given data record.
Inputs: None
Outputs: the recno (int) for a given data record. May be 0 if not yet in a file
"""
return(self._dataset['kindat'][0])
def setRecno(self, newRecno):
"""setRecno sets the recno (int) for a given data record.
Inputs: newRecno (integer)
Outputs: None
Affects: sets self._dataset['recno']
"""
if int(newRecno) < 0:
raise ValueError('recno cannot be negative: %i' % (int(newRecno)))
self._dataset['recno'] = int(newRecno)
def getNrow(self):
"""getNrow returns the number of 2D data rows (int) for a given data record.
Inputs: None
Outputs: the number of 2D data rows.
"""
return(len(self._dataset))
def getStartTimeList(self):
"""getStartTimeList returns a tuple containing sYear, sMonth, sDay, sHour, sMin, sSec, and sCentisec
Inputs: None
Outputs: a tuple containing sYear, sMonth, sDay, sHour, sMin, sSec, and sCentisec.
"""
startDT = self.getStartDatetime()
return((startDT.year, startDT.month, startDT.day,
startDT.hour, startDT.minute, startDT.second, int(startDT.microsecond/1.0E4)))
def getStartDatetime(self):
"""getStartDatetime returns a start record datetime
Inputs: None
Outputs: a datetime.datetime object representing the start time of the record
"""
return(datetime.datetime.utcfromtimestamp(self._dataset['ut1_unix'][0]))
def setStartTimeList(self, sYear, sMonth, sDay, sHour, sMin, sSec, sCentisec=0):
"""setStartTimeList changes the data record start time
Inputs: integers sYear, sMonth, sDay, sHour, sMin, sSec. sCentisec defaults to 0
Outputs: None
Affects: changes self._dataset fields ut1_unix, year, month, day, hour, min,sec
Prints warning if new start time after present end time
"""
# check validity of input time
sCentisec = int(sCentisec)
if sCentisec < 0 or sCentisec > 99:
raise ValueError('Illegal sCentisec %i' % (sCentisec))
try:
sDT = datetime.datetime(sYear, sMonth, sDay, sHour, sMin, sSec, int(sCentisec*1E4))
except:
raise ValueError('Illegal datetime %s' % (str((sYear, sMonth, sDay, sHour, sMin, sSec, sCentisec))))
if sDT > self.getEndDatetime():
sys.stderr.write('Warning: New starting time %s after present ending time %s\n' % (str(sDT),
str(self.getEndDatetime())))
ut1_unix = (sDT - datetime.datetime(1970,1,1)).total_seconds()
self._dataset['ut1_unix'] = ut1_unix
# need to reset average time
aveDT = sDT + (self.getEndDatetime() - sDT)/2
self._dataset['year'] = aveDT.year
self._dataset['month'] = aveDT.month
self._dataset['day'] = aveDT.day
self._dataset['hour'] = aveDT.hour
self._dataset['min'] = aveDT.minute
self._dataset['sec'] = aveDT.second
def getEndTimeList(self):
"""getEndTimeList returns a tuple containing eYear, eMonth, eDay, eHour, eMin, eSec, and eCentisec
Inputs: None
Outputs: a tuple containing eYear, eMonth, eDay, eHour, eMin, eSec, and eCentisec.
"""
endDT = self.getEndDatetime()
return((endDT.year, endDT.month, endDT.day,
endDT.hour, endDT.minute, endDT.second, int(endDT.microsecond/1.0E4)))
def getEndDatetime(self):
"""getEndDatetime returns a end record datetime
Inputs: None
Outputs: a datetime.datetime object representing the end time of the record
"""
return(datetime.datetime.utcfromtimestamp(self._dataset['ut2_unix'][0]))
def setEndTimeList(self, eYear, eMonth, eDay, eHour, eMin, eSec, eCentisec=0):
"""setEndTimeList changes the data record end time
Inputs: integers eYear, eMonth, eDay, eHour, eMin, eSec. eCentisec defaults to 0
Outputs: None
Affects: changes self._dataset fields ut2_unix, year, month, day, hour, min,sec
Prints warning if new start time after present end time
"""
# check validity of input time
eCentisec = int(eCentisec)
if eCentisec < 0 or eCentisec > 99:
raise ValueError('Illegal eCentisec %i' % (eCentisec))
try:
eDT = datetime.datetime(eYear, eMonth, eDay, eHour, eMin, eSec, int(eCentisec*1E4))
except:
raise ValueError('Illegal datetime %s' % (str((eYear, eMonth, eDay, eHour, eMin, eSec, eCentisec))))
if eDT < self.getStartDatetime():
sys.stderr.write('Warning: New ending time %s before present starting time %s\n' % (str(eDT),
str(self.getStartDatetime())))
ut2_unix = (eDT - datetime.datetime(1970,1,1)).total_seconds()
self._dataset['ut2_unix'] = ut2_unix
# need to reset average time
aveDT = eDT - (eDT - self.getStartDatetime())/2
self._dataset['year'] = aveDT.year
self._dataset['month'] = aveDT.month
self._dataset['day'] = aveDT.day
self._dataset['hour'] = aveDT.hour
self._dataset['min'] = aveDT.minute
self._dataset['sec'] = aveDT.second
def get1DParms(self):
"""get1DParms returns a list of 1D parameters in the MadrigalDataRecord.
Inputs: None
Outputs: a list of 1D CedarParameter objects in the MadrigalDataRecord.
"""
return(self._oneDList)
def get2DParms(self):
"""get2DParms returns a list of 2D parameters in the MadrigalDataRecord.
Inputs: None
Outputs: a list of 2D CedarParameter objects in the MadrigalDataRecord. Includes
both independent and dependent parms.
"""
return(self._twoDList)
def getInd2DParms(self):
"""getInd2DParms returns a list of the subset 2D parameters ithat are independent parmeters.
Inputs: None
Outputs: a list of independent 2D CedarParameter objects in the MadrigalDataRecord.
"""
return(self._ind2DList)
def getParmDim(self, parm):
"""getParmDim returns the dimension (1, 2, or 3 for independent spatial parm) of a given parm mnemonic
Raise KeyError if that parameter not found in file
"""
for obj in self._oneDList:
if obj.mnemonic.lower() == parm.lower():
return(1)
# do ind 2D next since they are in both lists
for obj in self._ind2DList:
if obj.mnemonic.lower() == parm.lower():
return(3)
for obj in self._twoDList:
if obj.mnemonic.lower() == parm.lower():
return(2)
raise KeyError('Parm <%s> not found in data' % (str(parm)))
def getHeaderKodLines(self):
"""getHeaderKodLines creates the lines in the Madrigal header record that start KOD and describe parms
Inputs: None
Returns: a string of length 80*num parms. Each 80 characters contains a description
of a single parm accodring to the Cedar Standard
"""
# create a list of oneDCedar codes for the data record.
# Each item has three elements: (code, parameter description, units)
oneDCedarCodes = []
for parm in self._oneDList:
oneDCedarCodes.append((parm.code, self._madParmObj.getSimpleParmDescription(parm.code),
self._madParmObj.getParmUnits(parm.code)))
oneDCedarCodes.sort(key=compareParms)
# create a list of twoDCedar codes for the data record.
# Each item has three elements: (code, parameter description, units)
twoDCedarCodes = []
for parm in self._twoDList:
twoDCedarCodes.append((parm.code, self._madParmObj.getSimpleParmDescription(parm.code),
self._madParmObj.getParmUnits(parm.code)))
twoDCedarCodes.sort(key=compareParms)
# write out lines - one D
retStr = ''
if len(oneDCedarCodes) > 0:
retStr += 'C 1D Parameters:' + (80 - len('C 1D Parameters:'))*' '
for i in range(len(oneDCedarCodes)):
code = oneDCedarCodes[i][0]
desc = oneDCedarCodes[i][1]
units = oneDCedarCodes[i][2]
line = 'KODS(%i)' % (i)
line += (10-len(line))*' '
codeNum = str(code)
codeNum = (10-len(codeNum))* ' ' + codeNum
line += codeNum
if len(desc) > 48:
desc = ' ' + desc[:48] + ' '
else:
desc = ' ' + desc + (49-len(desc))* ' '
line += desc
units = units[:10] + (10-len(units[:10]))*' '
line += units
retStr += line
# two D
if len(twoDCedarCodes) > 0:
retStr += 'C 2D Parameters:' + (80 - len('C 2D Parameters:'))*' '
for i in range(len(twoDCedarCodes)):
code = twoDCedarCodes[i][0]
desc = twoDCedarCodes[i][1]
units = twoDCedarCodes[i][2]
line = 'KODM(%i)' % (i)
line += (10-len(line))*' '
codeNum = str(code)
codeNum = (10-len(codeNum))* ' ' + codeNum
line += codeNum
if len(desc) > 48:
desc = ' ' + desc[:48] + ' '
else:
desc = ' ' + desc + (49-len(desc))* ' '
line += desc
units = units[:10] + (10-len(units[:10]))*' '
line += units
retStr += line
return(retStr)
def _createArraysFromArgs(self,kinst,kindat,sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec,
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec,
oneDList,twoDList,nrow,ind2DList):
"""_createArraysFromArgs creates a table layout array and record array numpy array based in input arguments.
Inputs:
kinst - the kind of instrument code. A warning will be raised if not in instTab.txt.
Default is None, in which case recno, dataset, and recordSet must be given.
kindat - kind of data code. Must be a non-negative integer.
Default is None, in which case recno, dataset, and recordSet must be given.
sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec - record start time. sCentisec must be 0-99
Default is None, in which case recno, dataset, and recordSet must be given.
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec - record end time. eCentisec must be 0-99
Default is None, in which case recno, dataset, and recordSet must be given.
oneDList - list of one-dimensional parameters in record. Parameters can be defined as codes
(integers) or case-insensitive mnemonic strings (eg, "Gdalt"), or CedarParameter objects.
Default is None, in which case recno, dataset, and recordSet must be given.
twoDList - list of two-dimensional parameters in record. Parameters can be defined as codes
(integers) or case-insensitive mnemonic strings (eg, "Gdalt"), or CedarParameter objects.
Default is None, in which case recno, dataset, and recordSet must be given.
nrow - number of rows of 2D data to create. Until set, all values default to missing.
Default is None, in which case recno, dataset, and recordSet must be given.
ind2DList - list of indepedent spatial two-dimensional parameters in record.
Parameters can be defined as codes. Each must also be listed in twoDList.
(integers) or case-insensitive mnemonic strings (eg, "Gdalt"), or CedarParameter objects.
Default is None, in which case dataset, and recordSet must be given.
"""
defaultParms = self._stdParms
dataDtype = [] # data type for the Table Layout recarray
recDType = [] # data type for _record_layout recarray
recDims = [] # dimension of each parameter (1 for 1D, 2 for dependent 2D, 3 for independent 2D)
parmsAddedSoFar = [] # mnemonics added so far
# the following is simply to ensure that independent 2D parms are also listed in twoDList
twoDParms = []
for parm in twoDList:
if isinstance(parm, CedarParameter):
parm = parm.mnemonic
mnem = self._madParmObj.getParmMnemonic(parm)
if mnem in twoDParms:
raise ValueError('Duplicate parmeter %s in twoDList' % (mnem))
twoDParms.append(mnem)
# default parms
for parm in defaultParms:
mnem = self._madParmObj.getParmMnemonic(parm)
if self._madParmObj.isInteger(mnem):
dataDtype.append((mnem.lower(), int))
else: # default parms cannot be strings
dataDtype.append((mnem.lower(), float))
recDType.append((parm.lower(), int))
recDims.append(1)
parmsAddedSoFar.append(mnem)
# one D parms
for parm in oneDList:
if isinstance(parm, CedarParameter):
parm = parm.mnemonic
mnem = self._madParmObj.getParmMnemonic(parm)
if mnem in parmsAddedSoFar:
continue # legal because it may be a default parm
if self._madParmObj.isInteger(mnem):
dataDtype.append((mnem.lower(), int))
elif self._madParmObj.isString(mnem):
strLen = self._madParmObj.getStringLen(mnem)
dataDtype.append((mnem.lower(), numpy.string_, strLen))
else:
dataDtype.append((mnem.lower(), float))
recDType.append((parm.lower(), int))
recDims.append(1)
parmsAddedSoFar.append(mnem)
for parm in ind2DList:
if isinstance(parm, CedarParameter):
parm = parm.mnemonic
mnem = self._madParmObj.getParmMnemonic(parm)
if mnem in parmsAddedSoFar:
raise ValueError('Duplicate parmeter %s' % (mnem))
if mnem not in twoDParms:
raise ValueError('Independent 2D parm %s not found in twoDList' % (mnem))
if self._madParmObj.isInteger(mnem):
dataDtype.append((mnem.lower(), int))
elif self._madParmObj.isString(mnem):
strLen = self._madParmObj.getStringLen(mnem)
dataDtype.append((mnem.lower(), numpy.string_, strLen))
else:
dataDtype.append((mnem.lower(), float))
recDType.append((parm.lower(), int))
recDims.append(3)
parmsAddedSoFar.append(mnem)
for parm in twoDList:
if isinstance(parm, CedarParameter):
parm = parm.mnemonic
mnem = self._madParmObj.getParmMnemonic(parm)
if mnem in parmsAddedSoFar:
continue # legal because may be independent parm
if self._madParmObj.isInteger(mnem):
dataDtype.append((mnem.lower(), int))
elif self._madParmObj.isString(mnem):
strLen = self._madParmObj.getStringLen(mnem)
dataDtype.append((mnem.lower(), numpy.string_, strLen))
else:
dataDtype.append((mnem.lower(), float))
recDType.append((parm.lower(), int))
recDims.append(2)
# create two recarrays
self._dataset = numpy.recarray((max(nrow, 1),), dtype = dataDtype)
self._recordSet = numpy.array([tuple(recDims),], dtype = recDType)
# set prolog values
sDT = datetime.datetime(int(sYear),int(sMonth),int(sDay),int(sHour),int(sMin),int(sSec),int(sCentisec)*10000)
eDT = datetime.datetime(int(eYear),int(eMonth),int(eDay),int(eHour),int(eMin),int(eSec),int(eCentisec)*10000)
midDT = sDT + ((eDT-sDT)/2)
self._dataset['year'] = midDT.year
self._dataset['month'] = midDT.month
self._dataset['day'] = midDT.day
self._dataset['hour'] = midDT.hour
self._dataset['min'] = midDT.minute
self._dataset['sec'] = midDT.second
self._dataset['recno'] = 0
self._dataset['kindat'] = kindat
self.setKinst(kinst)
self._dataset['ut1_unix'] = madrigal.metadata.getUnixUTFromDT(sDT)
self._dataset['ut2_unix'] = madrigal.metadata.getUnixUTFromDT(eDT)
# set all other values to default
for i in range(len(defaultParms), len(dataDtype)):
if dataDtype[i][1] == float:
self._dataset[dataDtype[i][0]] = self.missing
elif dataDtype[i][1] == int:
self._dataset[dataDtype[i][0]] = self.missing_int
else:
# string type only one left
strLen = self._madParmObj.getStringLen(dataDtype[i][0])
self._dataset[dataDtype[i][0]] = ' ' * strLen
def __get2DValueList__(self, parm):
"""__get2DValueList__ returns a list containing all the 2D values of a given parameter.
Inputs:
parm - can be defined as code (integer) or case-insensitive mnemonic string (eg, "Gdalt")
Outputs: a list containing all the 2D values of a given parameter. Special values will
be given the values 'missing', 'assumed', or 'knownbad'
"""
retList = []
nrow = self.getNrow()
for i in range(nrow):
retList.append(self.get2D(parm,i))
return(retList)
def __get2DMainValueList__(self, code, scaleFactor):
"""__get2DMainValueList__ returns a list containing all the 2D values of a given main parameter.
Inputs:
code - parameter code (integer). Must
be a parameter with an additional increment parameter.
Outputs: a list containing all the 2D values of a given main parameter that has an
additional increment parameter. Special values will be given the values 'missing', 'assumed', or 'knownbad'
"""
retList = []
nrow = self.getNrow()
for i in range(nrow):
value = self.get2D(code,i)
if type(value) != bytes:
# subtract off additional increment part
addIncr = value % scaleFactor
if value < 0:
addIncr = -1.0 * (scaleFactor - addIncr)
value = value - addIncr
retList.append(value)
return(retList)
def __get2DIncrValueList__(self, code, scaleFactor):
"""__get2DIncrValueList__ returns a list containing all the additional increment 2D values of a given main parameter.
Inputs:
parm - parameter code (integer). Must
be a parameter with an additional increment parameter.
Outputs: a list containing all the additional increment 2D values of a given main parameter.
Special values will be given the values 'missing', 'assumed', or 'knownbad'
"""
retList = []
nrow = self.getNrow()
for i in range(nrow):
value = self.get2D(code,i)
if type(value) != bytes:
# get additional increment part
incr = value % scaleFactor
if value < 0:
incr = -1.0 * (scaleFactor - incr)
value = incr
retList.append(value)
return(retList)
def __str__(self):
""" returns a string representation of a MadrigalDataRecord """
retStr = 'Data record:\n'
retStr += 'kinst = %i (%s)\n' % (self.getKinst(), self._instrumentName)
retStr += 'kindat = %i\n' % (self.getKindat())
startTimeList = self.getStartTimeList()
endTimeList = self.getEndTimeList()
retStr += 'record start: %04i-%02i-%02i %02i:%02i:%02i.%02i\n' % (tuple(startTimeList))
retStr += 'record end: %04i-%02i-%02i %02i:%02i:%02i.%02i\n' % (tuple(endTimeList))
retStr += 'one-dim parameters:\n'
for parm in self._oneDList:
retStr += '\t%s\n' % (str(parm))
try:
retStr += '%s\n' % (str(self._oneDData))
except AttributeError:
pass # there may not be oneDData
retStr += 'two-dim parameters:\n'
for parm in self._twoDList:
retStr += '\t%s\n' % (str(parm))
try:
retStr += '%s\n' % (str(self._twoDData))
except AttributeError:
pass # there may not be twoDData
return(retStr)
def __cmp__(self, other):
"""cmpRecords compares two cedar records to allow them to be sorted
"""
if other is None:
return(1)
# compare record start times
fList = self.getStartTimeList()
sList = other.getStartTimeList()
fDT = datetime.datetime(*fList)
sDT = datetime.datetime(*sList)
result = cmp(fDT, sDT)
if result:
return(result)
# compare record type
typeList = ('catalog', 'header', 'data')
fType = self.getType()
sType = other.getType()
result = cmp(typeList.index(fType), typeList.index(sType))
if result:
return(result)
# compare record stop times
fList = self.getEndTimeList()
sList = other.getEndTimeList()
fDT = datetime.datetime(*fList)
sDT = datetime.datetime(*sList)
result = cmp(fDT, sDT)
if result:
return(result)
# compare kindat if both data
if fType == 'data' and sType == 'data':
result = cmp(self.getKindat(), other.getKindat())
if result:
return(result)
return(0)
Ancestors (in MRO)
- MadrigalDataRecord
- builtins.object
Class variables
var assumed
var assumed_int
var knownbad
var knownbad_int
var missing
var missing_int
Static methods
def __init__(
self, kinst=None, kindat=None, sYear=None, sMonth=None, sDay=None, sHour=None, sMin=None, sSec=None, sCentisec=None, eYear=None, eMonth=None, eDay=None, eHour=None, eMin=None, eSec=None, eCentisec=None, oneDList=None, twoDList=None, nrow=None, madInstObj=None, madParmObj=None, ind2DList=None, dataset=None, recordSet=None, verbose=True, parmObjList=None, madDB=None)
init creates a MadrigalDataRecord with all missing data.
Note: all inputs have default values because there are two ways to populate this structure: 1) with all inputs from kinst to nrow when new data is being created, or 2) with numpy arrays dataset and recordSet from existing Hdf5 file.
Inputs:
kinst - the kind of instrument code. A warning will be raised if not in instTab.txt.
Default is None, in which case recno, dataset, and recordSet must be given.
kindat - kind of data code. Must be a non-negative integer.
Default is None, in which case recno, dataset, and recordSet must be given.
sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec - record start time. sCentisec must be 0-99
Default is None, in which case recno, dataset, and recordSet must be given.
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec - record end time. eCentisec must be 0-99
Default is None, in which case recno, dataset, and recordSet must be given.
oneDList - list of one-dimensional parameters in record. Parameters can be defined as codes
(integers) or case-insensitive mnemonic strings (eg, "Gdalt"), or CedarParameter objects.
Default is None, in which case recno, dataset, and recordSet must be given.
twoDList - list of two-dimensional parameters in record. Parameters can be defined as codes
(integers) or case-insensitive mnemonic strings (eg, "Gdalt"), or CedarParameter objects.
Default is None, in which case recno, dataset, and recordSet must be given.
nrow - number of rows of 2D data to create. Until set, all values default to missing.
Default is None, in which case recno, dataset, and recordSet must be given.
madInstObj - a madrigal.metadata.MadrigalInstrument object. If None, one will be created.
Used to verify kinst.
madParmObj - a madrigal.data.MadrigalParameter object. If None, one will be created.
Used to verify convert parameters to codes.
ind2DList - list of indepedent spatial two-dimensional parameters in record.
Parameters can be defined as codes. Each must also be listed in twoDList.
(integers) or case-insensitive mnemonic strings (eg, "Gdalt"), or CedarParameter objects.
Default is None, in which case dataset, and recordSet must be given.
dataset - an h5py dataset, as found in the Hdf5 group "Data", dataset "Table Layout".
Set to None if this is a new record.
recordSet - an h5py dataset, as found in the Hdf5 group "Metadata", dataset "_record_layout".
Set to None if this is a new record.
verbose - if True (the default), print all warnings. If False, surpress warnings
parmObjList - a list or tuple of three lists: self._oneDList, self._twoDList, and
self._ind2DList described below. Used only to speed performance. Default is
None, in which case new copies are created.
madDB - madrigal.metadata.MadrigalDB object. If None, one will be created.
Outputs: None
Returns: None
Affects: Creates attributes: self._madInstObj - madrigal.metadata.MadrigalInstrument object self._madParmObj - madrigal.data.MadrigalParameters object self._dataset - h5py dataset in from of Table Layout numpy recarray self._recordSet - h5py dataset in form of _record_layout numpy recarray self._verbose - bool indicating verbose or not self._oneDList - a list of 1D CedarParameter objects in this MadrigalDataRecord self._twoDList - a list of 2D CedarParameter objects in this MadrigalDataRecord self._ind2DList - a list of independent spatial parameters in self._twoDList
def __init__(self,kinst=None,
kindat=None,
sYear=None,sMonth=None,sDay=None,sHour=None,sMin=None,sSec=None,sCentisec=None,
eYear=None,eMonth=None,eDay=None,eHour=None,eMin=None,eSec=None,eCentisec=None,
oneDList=None,
twoDList=None,
nrow=None,
madInstObj=None,
madParmObj=None,
ind2DList=None,
dataset=None, recordSet=None, verbose=True,
parmObjList=None,
madDB=None):
"""__init__ creates a MadrigalDataRecord with all missing data.
Note: all inputs have default values because there are two ways to populate this structure:
1) with all inputs from kinst to nrow when new data is being created, or
2) with numpy arrays dataset and recordSet from existing Hdf5 file.
Inputs:
kinst - the kind of instrument code. A warning will be raised if not in instTab.txt.
Default is None, in which case recno, dataset, and recordSet must be given.
kindat - kind of data code. Must be a non-negative integer.
Default is None, in which case recno, dataset, and recordSet must be given.
sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec - record start time. sCentisec must be 0-99
Default is None, in which case recno, dataset, and recordSet must be given.
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec - record end time. eCentisec must be 0-99
Default is None, in which case recno, dataset, and recordSet must be given.
oneDList - list of one-dimensional parameters in record. Parameters can be defined as codes
(integers) or case-insensitive mnemonic strings (eg, "Gdalt"), or CedarParameter objects.
Default is None, in which case recno, dataset, and recordSet must be given.
twoDList - list of two-dimensional parameters in record. Parameters can be defined as codes
(integers) or case-insensitive mnemonic strings (eg, "Gdalt"), or CedarParameter objects.
Default is None, in which case recno, dataset, and recordSet must be given.
nrow - number of rows of 2D data to create. Until set, all values default to missing.
Default is None, in which case recno, dataset, and recordSet must be given.
madInstObj - a madrigal.metadata.MadrigalInstrument object. If None, one will be created.
Used to verify kinst.
madParmObj - a madrigal.data.MadrigalParameter object. If None, one will be created.
Used to verify convert parameters to codes.
ind2DList - list of indepedent spatial two-dimensional parameters in record.
Parameters can be defined as codes. Each must also be listed in twoDList.
(integers) or case-insensitive mnemonic strings (eg, "Gdalt"), or CedarParameter objects.
Default is None, in which case dataset, and recordSet must be given.
dataset - an h5py dataset, as found in the Hdf5 group "Data", dataset "Table Layout".
Set to None if this is a new record.
recordSet - an h5py dataset, as found in the Hdf5 group "Metadata", dataset "_record_layout".
Set to None if this is a new record.
verbose - if True (the default), print all warnings. If False, surpress warnings
parmObjList - a list or tuple of three lists: self._oneDList, self._twoDList, and
self._ind2DList described below. Used only to speed performance. Default is
None, in which case new copies are created.
madDB - madrigal.metadata.MadrigalDB object. If None, one will be created.
Outputs: None
Returns: None
Affects:
Creates attributes:
self._madInstObj - madrigal.metadata.MadrigalInstrument object
self._madParmObj - madrigal.data.MadrigalParameters object
self._dataset - h5py dataset in from of Table Layout numpy recarray
self._recordSet - h5py dataset in form of _record_layout numpy recarray
self._verbose - bool indicating verbose or not
self._oneDList - a list of 1D CedarParameter objects in this MadrigalDataRecord
self._twoDList - a list of 2D CedarParameter objects in this MadrigalDataRecord
self._ind2DList - a list of independent spatial parameters in self._twoDList
"""
if madDB is None:
self._madDB = madrigal.metadata.MadrigalDB()
else:
self._madDB = madDB
# create any needed Madrigal objects, if not passed in
if madInstObj is None:
self._madInstObj = madrigal.metadata.MadrigalInstrument(self._madDB)
else:
self._madInstObj = madInstObj
if madParmObj is None:
self._madParmObj = madrigal.data.MadrigalParameters(self._madDB)
else:
self._madParmObj = madParmObj
if twoDList is None:
twoDList = []
if ind2DList is None:
ind2DList = []
if dataset is None or recordSet is None:
if ind2DList is None:
# get it from cachedFiles.ini
extraParms, ind2DList, splitParms = self._madDB.getKinstKindatConfig(kinst, kindat)
# verify there are independent spatial parms if there are 2D parms
if not len(twoDList) == 0 and len(ind2DList) == 0:
raise ValueError('Cannot have 2D parms without an independent spatial parm set')
self._createArraysFromArgs(kinst,kindat,sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec,
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec,
oneDList,twoDList,nrow,ind2DList)
else:
# verify there are independent spatial parms if there are 2D parms
if not len(twoDList) == 0 and len(ind2DList) == 0:
raise ValueError('Cannot have 2D parms without an independent spatial parm set')
self._dataset = dataset
self._recordSet = recordSet
if parmObjList is not None:
self._oneDList = copy.deepcopy(parmObjList[0])
self._twoDList = copy.deepcopy(parmObjList[1])
self._ind2DList = copy.deepcopy(parmObjList[2])
else:
# populate self._oneDList, self._twoDList, and self._ind2DList
self._oneDList = []
self._twoDList = []
self._ind2DList = []
for parm in self._recordSet.dtype.names[len(self._stdParms):]:
if self.parmIsInt(parm):
isInt = True
else:
isInt = False
newCedarParm = CedarParameter(self._madParmObj.getParmCodeFromMnemonic(parm),
parm, self._madParmObj.getParmDescription(parm),
isInt)
if self._recordSet[parm][0] == 1:
self._oneDList.append(newCedarParm)
if self._recordSet[parm][0] in (2,3):
self._twoDList.append(newCedarParm)
if self._recordSet[parm][0] == 3:
self._ind2DList.append(newCedarParm)
self._verbose = bool(verbose)
def add1D(
self, oneDParm)
add1D adds a new one-dim parameter to a MadrigalDataRecord
Input: oneDParm - Parameter can be defined as codes (integer) or case-insensitive mnemonic string (eg, "Gdalt")
Affects: 1) adds new column to self._dataset with all values Nan, and 2) adds value to end of self._recordSet with value = 1 since 1D parm
If these addition makes self._dataset.dtype differ from that in MadrigalCedarFile, appending this MadrigalDataRecord to MadrigalCedarFile will raise an IOError. Also raises error if parm already exists.
def add1D(self, oneDParm):
"""add1D adds a new one-dim parameter to a MadrigalDataRecord
Input: oneDParm - Parameter can be defined as codes (integer) or case-insensitive
mnemonic string (eg, "Gdalt")
Affects: 1) adds new column to self._dataset with all values Nan, and 2) adds
value to end of self._recordSet with value = 1 since 1D parm
If these addition makes self._dataset.dtype differ from that in MadrigalCedarFile, appending this
MadrigalDataRecord to MadrigalCedarFile will raise an IOError. Also raises error if parm already exists.
"""
self.addParm(oneDParm, 1)
def add2D(
self, twoDParm)
add2D adds a new two-dim parameter to a MadrigalDataRecord
Input: twoDParm - Parameter can be defined as codes (integer) or case-insensitive mnemonic string (eg, "Gdalt")
Affects: 1) adds new column to self._dataset with all values Nan, and 2) adds value to end of self._recordSet with value = 2 since 2D parm
If these addition makes self._dataset.dtype differ from that in MadrigalCedarFile, appending this MadrigalDataRecord to MadrigalCedarFile will raise an IOError. Also raises error if parm already exists.
def add2D(self, twoDParm):
"""add2D adds a new two-dim parameter to a MadrigalDataRecord
Input: twoDParm - Parameter can be defined as codes (integer) or case-insensitive
mnemonic string (eg, "Gdalt")
Affects: 1) adds new column to self._dataset with all values Nan, and 2) adds
value to end of self._recordSet with value = 2 since 2D parm
If these addition makes self._dataset.dtype differ from that in MadrigalCedarFile, appending this
MadrigalDataRecord to MadrigalCedarFile will raise an IOError. Also raises error if parm already exists.
"""
self.addParm(twoDParm, 2)
def addParm(
self, newParm, dim)
addParm adds a new one or two-dim parameter to a MadrigalDataRecord
Input: newParm - Parameter can be defined as codes (integer) or case-insensitive mnemonic string (eg, "Gdalt")
dim - either 1 for scalar, or 2 for vector parm
Affects: 1) adds new column to self._dataset with all values Nan, and 2) adds value to end of self._recordSet with value = dim
If these addition makes self._dataset.dtype differ from that in MadrigalCedarFile, appending this MadrigalDataRecord to MadrigalCedarFile will raise an IOError. Also raises error if parm already exists.
def addParm(self, newParm, dim):
"""addParm adds a new one or two-dim parameter to a MadrigalDataRecord
Input: newParm - Parameter can be defined as codes (integer) or case-insensitive
mnemonic string (eg, "Gdalt")
dim - either 1 for scalar, or 2 for vector parm
Affects: 1) adds new column to self._dataset with all values Nan, and 2) adds
value to end of self._recordSet with value = dim
If these addition makes self._dataset.dtype differ from that in MadrigalCedarFile, appending this
MadrigalDataRecord to MadrigalCedarFile will raise an IOError. Also raises error if parm already exists.
"""
if dim not in (1,2):
raise ValueError('dim must be 1 or 2, not %s' % (str(dim)))
# see if its an integer
try:
code = int(newParm)
isInt = True
except:
isInt = False
if isInt:
# try to look up mnemonic
mnem = self._madParmObj.getParmMnemonic(int(newParm)).lower()
if mnem == str(newParm):
raise IOError('Cannot use unknown parm %i' % (int(newParm)))
else:
# this must succeed or an exception raised
try:
code = self._madParmObj.getParmCodeFromMnemonic(newParm.lower())
except ValueError:
raise IOError('Mnem %s not found' % (newParm))
mnem = newParm.lower()
# issue warning if an unneeded time parameter being added
if self._verbose and abs(code) < timeParms:
sys.stderr.write('WARNING: Parameter %s is a time parameter that potentially conflicts with prolog times\n' % (parm[1]))
# figure out dtype
format = self._madParmObj.getParmFormat(mnem)
if format[-1] == 'i':
dtype = numpy.int64
else:
dtype = numpy.float64
data = numpy.zeros((len(self._dataset),), dtype)
data[:] = numpy.nan
self._dataset = numpy.lib.recfunctions.append_fields(self._dataset, mnem, data)
data = numpy.array([dim], numpy.int64)
self._recordSet = numpy.lib.recfunctions.append_fields(self._recordSet, mnem, data, usemask=False)
newCedarParm = CedarParameter(self._madParmObj.getParmCodeFromMnemonic(mnem),
mnem, self._madParmObj.getParmDescription(mnem),
isInt)
if dim == 1:
self._oneDList.append(newCedarParm)
else:
self._twoDList.append(newCedarParm)
def delete1D(
self, parm)
delete1D removes the given 1D parameter from the record
Inputs:
parm - can be defined as code (integer) or case-insensitive mnemonic string (eg, "Gdalt")
Outputs: None
Raise exception if 1D parm does not exist. If this deletion makes self._dataset.dtype differ from that in MadrigalCedarFile, appending this MadrigalDataRecord to MadrigalCedarFile will raise an IOError.
def delete1D(self, parm):
"""delete1D removes the given 1D parameter from the record
Inputs:
parm - can be defined as code (integer) or case-insensitive mnemonic string (eg, "Gdalt")
Outputs: None
Raise exception if 1D parm does not exist. If this deletion makes self._dataset.dtype differ from that in
MadrigalCedarFile, appending this MadrigalDataRecord to MadrigalCedarFile will raise an IOError.
"""
# make sure this is a one-d parm
parm = self._madParmObj.getParmMnemonic(parm).lower()
try:
dim = self._recordSet[parm]
except ValueError:
raise ValueError('parm %s does not exist' % (str(parm)))
if dim != 1:
raise ValueError('parm %s is 2D, not 1D' % (str(parm)))
self._dataset = numpy.lib.recfunctions.drop_fields(self._dataset, parm)
self._recordSet = numpy.lib.recfunctions.drop_fields(self._recordSet, parm)
# find index to delete from self._oneDList
index = None
for i, parmObj in enumerate(self._oneDList):
if parmObj.mnemonic == parm:
index = i
break
if index is None:
raise ValueError('Did not find parm %s in self._oneDList' % (str(parm)))
del self._oneDList[index]
def delete2DParm(
self, parm)
delete2DParm removes the given 2D parameter from every row in the record
Inputs:
parm - can be defined as code (integer) or case-insensitive mnemonic string (eg, "Gdalt")
Outputs: None
Raise exception if 2D parm does not exist. If this deletion makes self._dataset.dtype differ from that in MadrigalCedarFile, appending this MadrigalDataRecord to MadrigalCedarFile will raise an IOError.
def delete2DParm(self, parm):
"""delete2DParm removes the given 2D parameter from every row in the record
Inputs:
parm - can be defined as code (integer) or case-insensitive mnemonic string (eg, "Gdalt")
Outputs: None
Raise exception if 2D parm does not exist. If this deletion makes self._dataset.dtype differ from that in
MadrigalCedarFile, appending this MadrigalDataRecord to MadrigalCedarFile will raise an IOError.
"""
# make sure this is a two-d parm
parm = self._madParmObj.getParmMnemonic(parm).lower()
try:
dim = self._recordSet[parm]
except ValueError:
raise ValueError('parm %s does not exist' % (str(parm)))
if dim not in (2,3):
raise ValueError('parm %s is 1D, not 2D' % (str(parm)))
self._dataset = numpy.lib.recfunctions.drop_fields(self._dataset, parm)
self._recordSet = numpy.lib.recfunctions.drop_fields(self._recordSet, parm)
# find index to delete from self._twoDList
index = None
for i, parmObj in enumerate(self._twoDList):
if parmObj.mnemonic == parm:
index = i
break
if index is None:
raise ValueError('Did not find parm %s in self._twoDList' % (str(parm)))
del self._twoDList[index]
def delete2DRows(
self, rows)
delete2DRows removes the given 2D row or rows in the record (first is row 0)
Inputs:
row number (integer) or list of row numbers to delete (first is row 0)
Outputs: None
Raise exception if row does not exist
def delete2DRows(self, rows):
"""delete2DRows removes the given 2D row or rows in the record (first is row 0)
Inputs:
row number (integer) or list of row numbers to delete (first is row 0)
Outputs: None
Raise exception if row does not exist
"""
# make sure row is a list
if type(rows) in (int, int):
rows = [rows]
keepIndices = []
count = 0 # make sure all rows actually exist
for i in range(self.getNrow()):
if i not in rows:
keepIndices.append(i)
else:
count += 1
if count != len(rows):
raise ValueError('Some row in %s out of range, total number of rows is %i' % (str(rows), self.getNrow()))
self._dataset = self._dataset[keepIndices]
def get1D(
self, parm)
get1D returns the 1D value for a given 1D parameter
Inputs:
parm - can be defined as code (integer) or case-insensitive mnemonic string (eg, "Gdalt")
Outputs: value
def get1D(self, parm):
"""get1D returns the 1D value for a given 1D parameter
Inputs:
parm - can be defined as code (integer) or case-insensitive mnemonic string (eg, "Gdalt")
Outputs: value
"""
# make sure this is a one-d parm
parm = self._madParmObj.getParmMnemonic(parm).lower()
isString = self._madParmObj.isString(parm)
try:
dim = self._recordSet[parm]
except ValueError:
raise ValueError('parm %s does not exist' % (str(parm)))
if dim != 1:
raise ValueError('parm %s is 2D, not 1D' % (str(parm)))
value = self._dataset[parm][0]
# check for special values
if not isString:
if numpy.isnan(value):
return('missing')
# if its an error parameter, allow assumed or knownbad
if self._madParmObj.isError(parm):
if int(value) == self.assumed_int:
return('assumed')
if int(value) == self.knownbad_int:
return('knownbad')
return value
def get1DParms(
self)
get1DParms returns a list of 1D parameters in the MadrigalDataRecord.
Inputs: None
Outputs: a list of 1D CedarParameter objects in the MadrigalDataRecord.
def get1DParms(self):
"""get1DParms returns a list of 1D parameters in the MadrigalDataRecord.
Inputs: None
Outputs: a list of 1D CedarParameter objects in the MadrigalDataRecord.
"""
return(self._oneDList)
def get2D(
self, parm, row)
get2D returns the 2D value for a given 2D parameter
Inputs:
parm - can be defined as code (integer) or case-insensitive mnemonic string (eg, "Gdalt")
row - row number to get data. Starts at 0.
Outputs: double value, or the strings "missing", "assumed", or "knownbad"
def get2D(self, parm, row):
"""get2D returns the 2D value for a given 2D parameter
Inputs:
parm - can be defined as code (integer) or case-insensitive mnemonic string (eg, "Gdalt")
row - row number to get data. Starts at 0.
Outputs: double value, or the strings "missing", "assumed", or "knownbad"
"""
if row >= len(self._dataset) or row < 0:
raise ValueError('Illegal value of row %i with nrow = %i' % (row,len(self._dataset)))
# make sure this is a two-d parm
parm = self._madParmObj.getParmMnemonic(parm).lower()
isString = self._madParmObj.isString(parm)
try:
dim = self._recordSet[parm]
except ValueError:
raise ValueError('parm %s does not exist' % (str(parm)))
if dim not in (2,3):
raise ValueError('parm %s is 1D, not 2D' % (str(parm)))
value = self._dataset[parm][row]
# check for special values
if not isString:
if numpy.isnan(value):
return('missing')
# if its an error parameter, allow assumed or knownbad
if self._madParmObj.isError(parm):
if int(value) == self.assumed_int:
return('assumed')
if int(value) == self.knownbad_int:
return('knownbad')
return value
def get2DParms(
self)
get2DParms returns a list of 2D parameters in the MadrigalDataRecord.
Inputs: None
Outputs: a list of 2D CedarParameter objects in the MadrigalDataRecord. Includes both independent and dependent parms.
def get2DParms(self):
"""get2DParms returns a list of 2D parameters in the MadrigalDataRecord.
Inputs: None
Outputs: a list of 2D CedarParameter objects in the MadrigalDataRecord. Includes
both independent and dependent parms.
"""
return(self._twoDList)
def getDType(
self)
getDType returns the dtype of the table array with this data
def getDType(self):
"""getDType returns the dtype of the table array with this data
"""
return(self._dataset.dtype)
def getDataset(
self)
getDataset returns the dataset table
def getDataset(self):
"""getDataset returns the dataset table
"""
return(self._dataset)
def getEndDatetime(
self)
getEndDatetime returns a end record datetime
Inputs: None
Outputs: a datetime.datetime object representing the end time of the record
def getEndDatetime(self):
"""getEndDatetime returns a end record datetime
Inputs: None
Outputs: a datetime.datetime object representing the end time of the record
"""
return(datetime.datetime.utcfromtimestamp(self._dataset['ut2_unix'][0]))
def getEndTimeList(
self)
getEndTimeList returns a tuple containing eYear, eMonth, eDay, eHour, eMin, eSec, and eCentisec
Inputs: None
Outputs: a tuple containing eYear, eMonth, eDay, eHour, eMin, eSec, and eCentisec.
def getEndTimeList(self):
"""getEndTimeList returns a tuple containing eYear, eMonth, eDay, eHour, eMin, eSec, and eCentisec
Inputs: None
Outputs: a tuple containing eYear, eMonth, eDay, eHour, eMin, eSec, and eCentisec.
"""
endDT = self.getEndDatetime()
return((endDT.year, endDT.month, endDT.day,
endDT.hour, endDT.minute, endDT.second, int(endDT.microsecond/1.0E4)))
def getHeaderKodLines(
self)
getHeaderKodLines creates the lines in the Madrigal header record that start KOD and describe parms
Inputs: None
Returns: a string of length 80*num parms. Each 80 characters contains a description of a single parm accodring to the Cedar Standard
def getHeaderKodLines(self):
"""getHeaderKodLines creates the lines in the Madrigal header record that start KOD and describe parms
Inputs: None
Returns: a string of length 80*num parms. Each 80 characters contains a description
of a single parm accodring to the Cedar Standard
"""
# create a list of oneDCedar codes for the data record.
# Each item has three elements: (code, parameter description, units)
oneDCedarCodes = []
for parm in self._oneDList:
oneDCedarCodes.append((parm.code, self._madParmObj.getSimpleParmDescription(parm.code),
self._madParmObj.getParmUnits(parm.code)))
oneDCedarCodes.sort(key=compareParms)
# create a list of twoDCedar codes for the data record.
# Each item has three elements: (code, parameter description, units)
twoDCedarCodes = []
for parm in self._twoDList:
twoDCedarCodes.append((parm.code, self._madParmObj.getSimpleParmDescription(parm.code),
self._madParmObj.getParmUnits(parm.code)))
twoDCedarCodes.sort(key=compareParms)
# write out lines - one D
retStr = ''
if len(oneDCedarCodes) > 0:
retStr += 'C 1D Parameters:' + (80 - len('C 1D Parameters:'))*' '
for i in range(len(oneDCedarCodes)):
code = oneDCedarCodes[i][0]
desc = oneDCedarCodes[i][1]
units = oneDCedarCodes[i][2]
line = 'KODS(%i)' % (i)
line += (10-len(line))*' '
codeNum = str(code)
codeNum = (10-len(codeNum))* ' ' + codeNum
line += codeNum
if len(desc) > 48:
desc = ' ' + desc[:48] + ' '
else:
desc = ' ' + desc + (49-len(desc))* ' '
line += desc
units = units[:10] + (10-len(units[:10]))*' '
line += units
retStr += line
# two D
if len(twoDCedarCodes) > 0:
retStr += 'C 2D Parameters:' + (80 - len('C 2D Parameters:'))*' '
for i in range(len(twoDCedarCodes)):
code = twoDCedarCodes[i][0]
desc = twoDCedarCodes[i][1]
units = twoDCedarCodes[i][2]
line = 'KODM(%i)' % (i)
line += (10-len(line))*' '
codeNum = str(code)
codeNum = (10-len(codeNum))* ' ' + codeNum
line += codeNum
if len(desc) > 48:
desc = ' ' + desc[:48] + ' '
else:
desc = ' ' + desc + (49-len(desc))* ' '
line += desc
units = units[:10] + (10-len(units[:10]))*' '
line += units
retStr += line
return(retStr)
def getInd2DParms(
self)
getInd2DParms returns a list of the subset 2D parameters ithat are independent parmeters.
Inputs: None
Outputs: a list of independent 2D CedarParameter objects in the MadrigalDataRecord.
def getInd2DParms(self):
"""getInd2DParms returns a list of the subset 2D parameters ithat are independent parmeters.
Inputs: None
Outputs: a list of independent 2D CedarParameter objects in the MadrigalDataRecord.
"""
return(self._ind2DList)
def getKindat(
self)
getKindat returns the kind of data code (int) for a given data record.
Inputs: None
Outputs: the kind of data code (int) for a given data record.
def getKindat(self):
"""getKindat returns the kind of data code (int) for a given data record.
Inputs: None
Outputs: the kind of data code (int) for a given data record.
"""
return(self._dataset['kindat'][0])
def getKinst(
self)
getKinst returns the kind of instrument code (int) for a given data record.
Inputs: None
Outputs: the kind of instrument code (int) for a given data record.
def getKinst(self):
"""getKinst returns the kind of instrument code (int) for a given data record.
Inputs: None
Outputs: the kind of instrument code (int) for a given data record.
"""
return(self._dataset['kinst'][0])
def getNrow(
self)
getNrow returns the number of 2D data rows (int) for a given data record.
Inputs: None
Outputs: the number of 2D data rows.
def getNrow(self):
"""getNrow returns the number of 2D data rows (int) for a given data record.
Inputs: None
Outputs: the number of 2D data rows.
"""
return(len(self._dataset))
def getParmDim(
self, parm)
getParmDim returns the dimension (1, 2, or 3 for independent spatial parm) of a given parm mnemonic
Raise KeyError if that parameter not found in file
def getParmDim(self, parm):
"""getParmDim returns the dimension (1, 2, or 3 for independent spatial parm) of a given parm mnemonic
Raise KeyError if that parameter not found in file
"""
for obj in self._oneDList:
if obj.mnemonic.lower() == parm.lower():
return(1)
# do ind 2D next since they are in both lists
for obj in self._ind2DList:
if obj.mnemonic.lower() == parm.lower():
return(3)
for obj in self._twoDList:
if obj.mnemonic.lower() == parm.lower():
return(2)
raise KeyError('Parm <%s> not found in data' % (str(parm)))
def getRecDType(
self)
getRecDType returns the dtype of _record_array
def getRecDType(self):
"""getRecDType returns the dtype of _record_array
"""
return(self._recordSet.dtype)
def getRecno(
self)
getRecno returns the recno (int) for a given data record.
Inputs: None
Outputs: the recno (int) for a given data record. May be 0 if not yet in a file
def getRecno(self):
"""getRecno returns the recno (int) for a given data record.
Inputs: None
Outputs: the recno (int) for a given data record. May be 0 if not yet in a file
"""
return(self._dataset['kindat'][0])
def getRecordset(
self)
getRecordset returns the recordSet table
def getRecordset(self):
"""getRecordset returns the recordSet table
"""
return(self._recordSet)
def getRow(
self, row)
getRow returns the row of data in order defined in self._dataset.dtype
Input: row number
IndexError raised if not a valid row index
def getRow(self, row):
"""getRow returns the row of data in order defined in self._dataset.dtype
Input: row number
IndexError raised if not a valid row index
"""
return(self._dataset[row])
def getStartDatetime(
self)
getStartDatetime returns a start record datetime
Inputs: None
Outputs: a datetime.datetime object representing the start time of the record
def getStartDatetime(self):
"""getStartDatetime returns a start record datetime
Inputs: None
Outputs: a datetime.datetime object representing the start time of the record
"""
return(datetime.datetime.utcfromtimestamp(self._dataset['ut1_unix'][0]))
def getStartTimeList(
self)
getStartTimeList returns a tuple containing sYear, sMonth, sDay, sHour, sMin, sSec, and sCentisec
Inputs: None
Outputs: a tuple containing sYear, sMonth, sDay, sHour, sMin, sSec, and sCentisec.
def getStartTimeList(self):
"""getStartTimeList returns a tuple containing sYear, sMonth, sDay, sHour, sMin, sSec, and sCentisec
Inputs: None
Outputs: a tuple containing sYear, sMonth, sDay, sHour, sMin, sSec, and sCentisec.
"""
startDT = self.getStartDatetime()
return((startDT.year, startDT.month, startDT.day,
startDT.hour, startDT.minute, startDT.second, int(startDT.microsecond/1.0E4)))
def getStrLen(
self, parm)
getStrLen returns True if this parm (mnemonic) is integer type, False if float or string
Raises ValueError is parm not in record, or is not String
def getStrLen(self, parm):
"""getStrLen returns True if this parm (mnemonic) is integer type, False if float or string
Raises ValueError is parm not in record, or is not String
"""
if not self.parmIsString(parm):
raise ValueError('Parm <%s> not string type' % (str(parm)))
return(self._dataset.dtype[parm.lower()].itemsize)
def getType(
self)
returns the type 'data'
def getType(self):
""" returns the type 'data'"""
return 'data'
def parmIsInt(
self, parm)
parmIsInt returns True if this parm (mnemonic) is integer type, False if float or string
Raises ValueError if parm not in record
def parmIsInt(self, parm):
"""parmIsInt returns True if this parm (mnemonic) is integer type, False if float or string
Raises ValueError if parm not in record
"""
try:
typeStr = str(self._dataset.dtype[parm.lower()].kind)
except KeyError:
raise ValueError('Parm <%s> not found in file' % (str(parm)))
if typeStr.find('i') != -1:
return(True)
else:
return(False)
def parmIsString(
self, parm)
parmIsString returns True if this parm (mnemonic) is String type, False if float or int
Raises ValueError is parm not in record
def parmIsString(self, parm):
"""parmIsString returns True if this parm (mnemonic) is String type, False if float or int
Raises ValueError is parm not in record
"""
try:
typeStr = str(self._dataset.dtype[parm.lower()].kind)
except KeyError:
raise ValueError('Parm <%s> not found in file' % (str(parm)))
if typeStr.find('S') != -1:
return(True)
else:
return(False)
def set1D(
self, parm, value)
set1D sets a 1D value for a given 1D parameter
Inputs:
parm - can be defined as code (integer) or case-insensitive mnemonic string (eg, "Gdalt")
value - double (or string convertable to double) value to set 1D parameter to. To set special Cedar values, the global values
missing, assumed, or knownbad may be used, or the strings "missing", "assumed", or "knownbad"
May also be int or string if that type
Outputs: None
def set1D(self, parm, value):
"""set1D sets a 1D value for a given 1D parameter
Inputs:
parm - can be defined as code (integer) or case-insensitive mnemonic string (eg, "Gdalt")
value - double (or string convertable to double) value to set 1D parameter to. To set special Cedar values, the global values
missing, assumed, or knownbad may be used, or the strings "missing", "assumed", or "knownbad"
May also be int or string if that type
Outputs: None
"""
parm = self._madParmObj.getParmMnemonic(parm).lower()
if value == 'missing':
if self.parmIsInt(parm):
value = self.missing_int
elif self.parmIsString(parm):
value = ' ' * self.getStrLen(parm)
else:
value = self.missing
if self._madParmObj.isError(parm):
if value == 'assumed':
if self.parmIsInt(parm):
value = self.assumed_int
else:
value = self.assumed
elif value == 'knownbad':
if self.parmIsInt(parm):
value = self.knownbad_int
else:
value = self.knownbad
elif value in ('assumed', 'knownbad'):
raise ValueError('It is illegal to set the non-error parm %s to %s' % (parm, value))
# make sure this is a one-d parm
try:
dim = self._recordSet[parm]
except ValueError:
raise ValueError('parm %s does not exist' % (str(parm)))
if dim != 1:
raise ValueError('parm %s is 2D, not 1D' % (str(parm)))
# set it
self._dataset[parm] = value
def set2D(
self, parm, row, value)
set2D sets a 2D value for a given 2D parameter and row
Inputs:
parm - can be defined as code (integer) or case-insensitive mnemonic string (eg, "Gdalt")
row - row number to set data. Starts at 0.
value - double (or string convertable to double) value to set 2D parameter to. To set special Cedar values, the global values
missing, assumed, or knownbad may be used, or the strings "missing", "assumed", or "knownbad"
May also be int or string if that type
Outputs: None
def set2D(self, parm, row, value):
"""set2D sets a 2D value for a given 2D parameter and row
Inputs:
parm - can be defined as code (integer) or case-insensitive mnemonic string (eg, "Gdalt")
row - row number to set data. Starts at 0.
value - double (or string convertable to double) value to set 2D parameter to. To set special Cedar values, the global values
missing, assumed, or knownbad may be used, or the strings "missing", "assumed", or "knownbad"
May also be int or string if that type
Outputs: None
"""
if row >= len(self._dataset) or row < 0:
raise ValueError('Illegal value of row %i with nrow = %i' % (row,len(self._dataset)))
parm = self._madParmObj.getParmMnemonic(parm).lower()
isString = self._madParmObj.isString(parm)
if value == 'missing':
if self.parmIsInt(parm):
value = self.missing_int
elif self.parmIsString(parm):
value = ' ' * self.getStrLen(parm)
else:
value = self.missing
if self._madParmObj.isError(parm):
if value == 'assumed':
if self.parmIsInt(parm):
value = self.assumed_int
else:
value = self.assumed
elif value == 'knownbad':
if self.parmIsInt(parm):
value = self.knownbad_int
else:
value = self.knownbad
elif value in ('assumed', 'knownbad'):
raise ValueError('It is illegal to set the non-error parm %s to %s' % (parm, value))
# make sure this is a two-d parm
try:
dim = self._recordSet[parm]
except ValueError:
raise ValueError('parm %s does not exist' % (str(parm)))
if dim not in (2, 3):
raise ValueError('parm %s is 1D, not 2D' % (str(parm)))
# if its ind parm, make sure its not nan
if parm in self._ind2DList:
if not isString:
if numpy.isnan(value):
raise ValueError('Cannot set ind parm %s to nan at row %i' % (parm, row))
# set it
self._dataset[parm][row] = value
def set2DParmValues(
self, parm, values)
set2DParmValues sets all 2D value in all rows for a given 2D parameter
Inputs:
parm - can be defined as code (integer) or case-insensitive mnemonic string (eg, "Gdalt")
value - list, tuple, or numpy array of int64 or float64 type. Must match len of nrows. User is responsible
for having set all special values to missing, assumed and knownbad as defined at top
of this class for either ints or floats
Outputs: None
def set2DParmValues(self, parm, values):
"""set2DParmValues sets all 2D value in all rows for a given 2D parameter
Inputs:
parm - can be defined as code (integer) or case-insensitive mnemonic string (eg, "Gdalt")
value - list, tuple, or numpy array of int64 or float64 type. Must match len of nrows. User is responsible
for having set all special values to missing, assumed and knownbad as defined at top
of this class for either ints or floats
Outputs: None
"""
# make sure this is a two-d parm
parm = self._madParmObj.getParmMnemonic(parm).lower()
isString = self._madParmObj.isString(parm)
try:
dim = self._recordSet[parm]
except ValueError:
raise ValueError('parm %s does not exist' % (str(parm)))
if dim not in (2, 3):
raise ValueError('parm %s is 1D, not 2D' % (str(parm)))
if parm in self._ind2DList:
if not isString:
if numpy.any(numpy.isnan(values)):
raise ValueError('Cannot set any ind parm %s value to nan: %s' % (parm, str(values)))
# set it
self._dataset[parm] = values
def setEndTimeList(
self, eYear, eMonth, eDay, eHour, eMin, eSec, eCentisec=0)
setEndTimeList changes the data record end time
Inputs: integers eYear, eMonth, eDay, eHour, eMin, eSec. eCentisec defaults to 0
Outputs: None
Affects: changes self._dataset fields ut2_unix, year, month, day, hour, min,sec
Prints warning if new start time after present end time
def setEndTimeList(self, eYear, eMonth, eDay, eHour, eMin, eSec, eCentisec=0):
"""setEndTimeList changes the data record end time
Inputs: integers eYear, eMonth, eDay, eHour, eMin, eSec. eCentisec defaults to 0
Outputs: None
Affects: changes self._dataset fields ut2_unix, year, month, day, hour, min,sec
Prints warning if new start time after present end time
"""
# check validity of input time
eCentisec = int(eCentisec)
if eCentisec < 0 or eCentisec > 99:
raise ValueError('Illegal eCentisec %i' % (eCentisec))
try:
eDT = datetime.datetime(eYear, eMonth, eDay, eHour, eMin, eSec, int(eCentisec*1E4))
except:
raise ValueError('Illegal datetime %s' % (str((eYear, eMonth, eDay, eHour, eMin, eSec, eCentisec))))
if eDT < self.getStartDatetime():
sys.stderr.write('Warning: New ending time %s before present starting time %s\n' % (str(eDT),
str(self.getStartDatetime())))
ut2_unix = (eDT - datetime.datetime(1970,1,1)).total_seconds()
self._dataset['ut2_unix'] = ut2_unix
# need to reset average time
aveDT = eDT - (eDT - self.getStartDatetime())/2
self._dataset['year'] = aveDT.year
self._dataset['month'] = aveDT.month
self._dataset['day'] = aveDT.day
self._dataset['hour'] = aveDT.hour
self._dataset['min'] = aveDT.minute
self._dataset['sec'] = aveDT.second
def setKindat(
self, newKindat)
setKindat sets the kind of data code (int) for a given data record.
Inputs: newKindat (integer)
Outputs: None
Affects: sets self._dataset['kindat']
def setKindat(self, newKindat):
"""setKindat sets the kind of data code (int) for a given data record.
Inputs: newKindat (integer)
Outputs: None
Affects: sets self._dataset['kindat']
"""
if int(newKindat) < 0:
raise ValueError('kindat cannot be negative: %i' % (int(newKindat)))
self._dataset['kindat'] = int(newKindat)
def setKinst(
self, newKinst)
setKinst sets the kind of instrument code (int) for a given data record.
Inputs: newKinst - new instrument code (integer)
Outputs: None
Affects: sets self._dataset['kinst']
def setKinst(self, newKinst):
"""setKinst sets the kind of instrument code (int) for a given data record.
Inputs: newKinst - new instrument code (integer)
Outputs: None
Affects: sets self._dataset['kinst']
"""
newKinst = int(newKinst)
if newKinst < 0:
raise ValueError('Kinst must not be less than 0, not %i' % (newKinst))
# verify and set kinst
instList = self._madInstObj.getInstrumentList()
found = False
for inst in instList:
if inst[2] == newKinst:
self._instrumentName = inst[0]
found = True
break
if found == False:
self._instrumentName = 'Unknown instrument'
sys.stderr.write('Warning: kinst %i not found in instTab.txt\n' % (newKinst))
self._dataset['kinst'] = newKinst
def setRecno(
self, newRecno)
setRecno sets the recno (int) for a given data record.
Inputs: newRecno (integer)
Outputs: None
Affects: sets self._dataset['recno']
def setRecno(self, newRecno):
"""setRecno sets the recno (int) for a given data record.
Inputs: newRecno (integer)
Outputs: None
Affects: sets self._dataset['recno']
"""
if int(newRecno) < 0:
raise ValueError('recno cannot be negative: %i' % (int(newRecno)))
self._dataset['recno'] = int(newRecno)
def setRow(
self, row, values)
setRow sets an entire row of data at once
Inputs:
row - row number to set
values - a tuple of values in the right format to match self._dataset.dtype
def setRow(self, row, values):
"""setRow sets an entire row of data at once
Inputs:
row - row number to set
values - a tuple of values in the right format to match self._dataset.dtype
"""
self._dataset[row] = values
def setStartTimeList(
self, sYear, sMonth, sDay, sHour, sMin, sSec, sCentisec=0)
setStartTimeList changes the data record start time
Inputs: integers sYear, sMonth, sDay, sHour, sMin, sSec. sCentisec defaults to 0
Outputs: None
Affects: changes self._dataset fields ut1_unix, year, month, day, hour, min,sec
Prints warning if new start time after present end time
def setStartTimeList(self, sYear, sMonth, sDay, sHour, sMin, sSec, sCentisec=0):
"""setStartTimeList changes the data record start time
Inputs: integers sYear, sMonth, sDay, sHour, sMin, sSec. sCentisec defaults to 0
Outputs: None
Affects: changes self._dataset fields ut1_unix, year, month, day, hour, min,sec
Prints warning if new start time after present end time
"""
# check validity of input time
sCentisec = int(sCentisec)
if sCentisec < 0 or sCentisec > 99:
raise ValueError('Illegal sCentisec %i' % (sCentisec))
try:
sDT = datetime.datetime(sYear, sMonth, sDay, sHour, sMin, sSec, int(sCentisec*1E4))
except:
raise ValueError('Illegal datetime %s' % (str((sYear, sMonth, sDay, sHour, sMin, sSec, sCentisec))))
if sDT > self.getEndDatetime():
sys.stderr.write('Warning: New starting time %s after present ending time %s\n' % (str(sDT),
str(self.getEndDatetime())))
ut1_unix = (sDT - datetime.datetime(1970,1,1)).total_seconds()
self._dataset['ut1_unix'] = ut1_unix
# need to reset average time
aveDT = sDT + (self.getEndDatetime() - sDT)/2
self._dataset['year'] = aveDT.year
self._dataset['month'] = aveDT.month
self._dataset['day'] = aveDT.day
self._dataset['hour'] = aveDT.hour
self._dataset['min'] = aveDT.minute
self._dataset['sec'] = aveDT.second
class MadrigalHeaderRecord
MadrigalHeaderRecord holds all the information in a Cedar header record.
class MadrigalHeaderRecord:
"""MadrigalHeaderRecord holds all the information in a Cedar header record."""
def __init__(self, kinst = None,
kindat = None,
sYear = None, sMonth = None, sDay = None,
sHour = None, sMin = None, sSec = None, sCentisec = None,
eYear = None, eMonth = None, eDay = None,
eHour = None, eMin = None, eSec = None, eCentisec = None,
jpar = None, mpar = None,
text = None,
madInstObj = None, madKindatObj = None,
expNotesLines=None):
"""__init__ creates a MadrigalCatalogRecord.
Note: all inputs have default values because there are two ways to populate this structure:
1) with all inputs from kinst to text when new data is being created, or
2) with catalog line list from existing Hdf5 file Experiment Notes metadata, plus non-default inputs
Inputs:
kinst - the kind of instrument code. A warning will be raised if not in instTab.txt.
kindat - kind of data code. Must be a non-negative integer.
sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec - experiment start time. sCentisec must be 0-99
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec - experiment end time. eCentisec must be 0-99
jpar - the number of 1d parameters in the following data records
mpar - the number of 2d parameters in the following data records
text - string containing text in catalog record. Length must be divisible by 80. No linefeeds
allowed.
madInstObj - a madrigal.metadata.MadrigalInstrument object. If None, one will be created.
Used to verify kinst.
madKindatObj - a madrigal.metadata.MadrigalKindat onject. If none, one will be created.
Used to verify kindat.
expNotesList - a list of all lines in an existing header section in "Experiment Notes"
metadata table. All the above attributes are parsed from these lines.
Outputs: None
Returns: None
"""
# create any needed Madrigal objects, if not passed in
if madInstObj is None:
self._madInstObj = madrigal.metadata.MadrigalInstrument()
else:
self._madInstObj = madInstObj
if madKindatObj is None:
self._madKindatObj = madrigal.metadata.MadrigalKindat()
else:
self._madKindatObj = madKindatObj
if expNotesLines != None:
# get all information from this dataset
self._parseExpNotesLines(expNotesLines)
if not kinst is None:
# kinst set via header record overrides kinst argument
try:
self.getKinst()
except AttributeError:
self.setKinst(kinst)
# verify kinst set, or raise error
try:
self.getKinst()
except AttributeError:
raise ValueError('kinst not set when MadrigalHeaderRecord created - required')
if not kindat is None:
# kindat set via header record overrides kindat argument
try:
self.getKindat()
except AttributeError:
self.setKindat(kindat)
try:
self.setTimeLists(sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec,
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec)
except:
pass
if not jpar is None:
self.setJpar(jpar)
if not mpar is None:
self.setMpar(mpar)
if not text is None:
self.setText(text)
def getType(self):
""" returns the type 'header'"""
return 'header'
def getKinst(self):
"""getKinst returns the kind of instrument code (int) for a given header record.
Inputs: None
Outputs: the kind of instrument code (int) for a given header record.
"""
return(self._kinst)
def setKinst(self, kinst):
"""setKinst sets the kind of instrument code (int) for a given header record.
Inputs: kind of instrument code (integer)
Outputs: None
Affects: sets the kind of instrument code (int) (self._kinst) for a given header record.
Prints warning if kinst not found in instTab.txt
"""
kinst = int(kinst)
# verify and set kinst
instList = self._madInstObj.getInstrumentList()
found = False
for inst in instList:
if inst[2] == kinst:
self._instrumentName = inst[0]
found = True
break
if found == False:
self._instrumentName = 'Unknown instrument'
sys.stderr.write('Warning: kinst %i not found in instTab.txt\n' % (kinst))
self._kinst = kinst
def getKindat(self):
"""getKindat returns the kind of data code (int) for a given header record.
Inputs: None
Outputs: the kind of data code (int) for a given header record.
"""
return(self._kindat)
def setKindat(self, kindat):
"""setKindat sets the mode of kind of data code (int) for a given header record.
Inputs: the kind of data code (int)
Outputs: None
Affects: sets the kind of data code (int) (self._kindat)
Exceptions: Raises ValueError if kindat less than 0
"""
self._kindat = int(kindat)
if self._kindat < 0:
raise ValueError('kindat must not be less than 0, not %i' % (self._kindat))
def getText(self):
"""getText returns the header text.
Inputs: None
Outputs: the header text.
"""
return(self._text)
def getTextLineCount(self):
"""getTextLineCount returns the number of 80 character lines in self._text
"""
if len(self._text) % 80 == 0:
return(int(len(self._text) / 80))
else:
return(int(1 + int(len(self._text) / 80)))
def setText(self, text):
"""setText sets the header text.
Inputs: text: text to be set. Must be length divisible by 80, and not contain line feeds.
For now, must not exceed 2^16 - 80 bytes to be able to be handled by Cedar format.
Outputs: None.
Affects: sets self._text
Raises TypeError if problem with text
"""
textTypes = [str]
if type(text) not in textTypes:
raise TypeError('text must be of type string')
if len(text) % 80 != 0:
raise TypeError('text length must be divisible by 80: len is %i' % (len(text)))
if text.find('\n') != -1:
raise TypeError('text must not contain linefeed character')
if len(text) > 65536 - 80:
raise TypeError('text exceeds ability of Cedar format to store')
self._text = text
def getJpar(self):
"""returns the number of one-dimensional parameters in the associated data records.
"""
return self._jpar
def setJpar(self, jpar):
""" set the number of one-dimensional parameters in the associated data records.
Must not be negative.
"""
self._jpar = int(jpar)
if self._jpar < 0:
raise TypeError('jpar must not be less than 0')
def getMpar(self):
"""returns the number of two-dimensional parameters in the associated data records.
"""
return self._mpar
def setMpar(self, mpar):
""" set the number of two-dimensional parameters in the associated data records.
Must not be negative.
"""
self._mpar = int(mpar)
if self._mpar < 0:
raise TypeError('mpar must not be less than 0')
def getStartTimeList(self):
"""getStartTimeList returns a tuple containing sYear, sMonth, sDay, sHour, sMin, sSec, and sCentisec
Inputs: None
Outputs: a tuple containing sYear, sMonth, sDay, sHour, sMin, sSec, and sCentisec.
"""
return((self._sYear,
self._sMonth,
self._sDay,
self._sHour,
self._sMin,
self._sSec,
self._sCentisec))
def getEndTimeList(self):
"""getEndTimeList returns a tuple containing eYear, eMonth, eDay, eHour, eMin, eSec, and eCentisec
Inputs: None
Outputs: a tuple containing eYear, eMonth, eDay, eHour, eMin, eSec, and eCentisec.
"""
return((self._eYear,
self._eMonth,
self._eDay,
self._eHour,
self._eMin,
self._eSec,
self._eCentisec))
def getLines(self):
"""getLines returns a numpy recarray of the format expected by the "Experiment Notes" dataset
"""
# templates
krechStr = 'KRECH 3002 Header Record, Version 3'
kinstTempStr = 'KINST %i %s'
kindatTempStr = 'KINDAT %i %s'
byearTempStr = 'IBYRT %04i Beginning year'
bmdTempStr = 'IBDTT %04i Beginning month and day'
bhmTempStr = 'IBHMT %04i Beginning UT hour and minute'
bcsTempStr = 'IBCST %04i Beginning centisecond'
eyearTempStr = 'IEYRT %04i Ending year'
emdTempStr = 'IEDTT %04i Ending month and day'
ehmTempStr = 'IEHMT %04i Ending UT hour and minute'
ecsTempStr = 'IECST %04i Ending centisecond'
numLines = int(self.getTextLineCount() + 12) # 8 times lines, KRECH, KINST, KINDAT, and final blank
textArr = numpy.recarray((numLines,), dtype=[('File Notes', h5py.special_dtype(vlen=str))])
for i in range(numLines-9):
if i == 0:
textArr[i]['File Notes'] = krechStr + ' ' * (80 - len(krechStr))
elif i == 1:
kinstName = self._madInstObj.getInstrumentName(self.getKinst())
kinstStr = kinstTempStr % (self.getKinst(), kinstName)
if len(kinstStr) > 80:
kinstStr = kinstStr[:80]
textArr[i]['File Notes'] = kinstStr + ' ' * (80 - len(kinstStr))
elif i == 2:
kindatStr = kindatTempStr % (self.getKindat(),
self._madKindatObj.getKindatDescription(self.getKindat(),
self.getKinst()))
if len(kindatStr) > 80:
kindatStr = kindatStr[:80]
textArr[i]['File Notes'] = kindatStr + ' ' * (80 - len(kindatStr))
else:
textArr[i]['File Notes'] = self.getText()[(i-3)*80:(i-2)*80]
# finally add time lines
sYear, sMonth, sDay, sHour, sMin, sSec, sCentisec = self.getStartTimeList()
eYear, eMonth, eDay, eHour, eMin, eSec, eCentisec = self.getEndTimeList()
ibdtt = sMonth*100 + sDay
ibhmt = sHour*100 + sMin
ibcst = sSec*100 + sCentisec
iedtt = eMonth*100 + eDay
iehmt = eHour*100 + eMin
iecst = eSec*100 + eCentisec
sYearStr = byearTempStr % (sYear)
textArr[i+1]['File Notes'] = sYearStr + ' ' * (80 - len(sYearStr))
sMDStr = bmdTempStr % (ibdtt)
textArr[i+2]['File Notes'] = sMDStr + ' ' * (80 - len(sMDStr))
sHMStr = bhmTempStr % (ibhmt)
textArr[i+3]['File Notes'] = sHMStr + ' ' * (80 - len(sHMStr))
sCSStr = bcsTempStr % (ibcst)
textArr[i+4]['File Notes'] = sCSStr + ' ' * (80 - len(sCSStr))
eYearStr = eyearTempStr % (eYear)
textArr[i+5]['File Notes'] = eYearStr + ' ' * (80 - len(eYearStr))
eMDStr = emdTempStr % (iedtt)
textArr[i+6]['File Notes'] = eMDStr + ' ' * (80 - len(eMDStr))
eHMStr = ehmTempStr % (iehmt)
textArr[i+7]['File Notes'] = eHMStr + ' ' * (80 - len(eHMStr))
eCSStr = ecsTempStr % (iecst)
textArr[i+8]['File Notes'] = eCSStr + ' ' * (80 - len(eCSStr))
textArr[i+9]['File Notes'] = ' ' * 80
return(textArr)
def setTimeLists(self, sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec,
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec):
"""setTimeList resets start and end times
Inputs:
sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec - experiment start time. sCentisec must be 0-99
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec - experiment end time. eCentisec must be 0-99
Outputs: None
Affects: sets all time attributes (see code).
Exceptions: Raises ValueError if startTime > endTime
"""
# verify times
sTime = datetime.datetime(sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec*10000)
eTime = datetime.datetime(eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec*10000)
if eTime < sTime:
raise ValueError('Starting time cannot be after ending time')
self._sTime = madrigal.metadata.getMadrigalUTFromDT(sTime)
self._eTime = madrigal.metadata.getMadrigalUTFromDT(eTime)
self._sYear = sYear
self._sMonth = sMonth
self._sDay = sDay
self._sHour = sHour
self._sMin = sMin
self._sSec = sSec
self._sCentisec = sCentisec
self._eYear = eYear
self._eMonth = eMonth
self._eDay = eDay
self._eHour = eHour
self._eMin = eMin
self._eSec = eSec
self._eCentisec = eCentisec
def _parseExpNotesLines(self, expNotesLines):
"""_parseExpNotesLines populates all attributes in MadrigalHeaderRecord
from text from metadata table "Experiment Notes"
"""
if len(expNotesLines) % 80 != 0:
raise ValueError('Len of expNotesLines must be divisible by 80, len %i is not' % (len(expNotesLines)))
self._text = '' # init to empty
delimiter = ' '
# default times
byear = None # to verify lines found
addItem = 0 # check for the case where there is a addition item in front of date field
bsec = 0
bcsec = 0
esec = 0
ecsec = 0
for i in range(int(len(expNotesLines) / 80)):
line = expNotesLines[i*80:(i+1)*80]
items = line.split()
if len(items) == 0:
# blank line
self.setText(self.getText() + line)
continue
elif items[0].upper() == 'KRECH':
# ignore
continue
elif items[0].upper() == 'KINST':
if int(items[1]) != 3:
self.setKinst(int(items[1]))
else:
self.setKinst(int(items[2]))
elif items[0].upper() == 'KINDAT':
try:
if int(items[1]) != 4:
self.setKindat(int(items[1]))
else:
self.setKindat(int(items[2]))
except:
self.setKindat(0)
# start time
elif items[0].upper() == 'IBYRT':
byear = int(items[1+addItem])
if byear < 1950:
# wrong column parsed
addItem = 1
byear = int(items[1+addItem])
elif items[0].upper() in ('IBDTT', 'IBDT'):
ibdte = int(items[1+addItem])
bmonth = ibdte / 100
bday = ibdte % 100
elif items[0].upper() == 'IBHMT':
ibhme = int(items[1+addItem])
bhour = ibhme / 100
bmin = ibhme % 100
elif items[0].upper() == 'IBCST':
ibcse = int(float(items[1+addItem]))
bsec = ibcse / 100
bcsec = ibcse % 100
# end time
elif items[0].upper() == 'IEYRT':
eyear = int(items[1+addItem])
elif items[0].upper() in ('IEDTT', 'IEDT'):
iedte = int(items[1+addItem])
emonth = iedte / 100
eday = iedte % 100
elif items[0].upper() == 'IEHMT':
iehme = int(items[1+addItem])
ehour = iehme / 100
emin = iehme % 100
elif items[0].upper() == 'IECST':
iecse = int(float(items[1+addItem]))
esec = iecse / 100
ecsec = iecse % 100
else:
self.setText(self.getText() + line)
try:
# set times
self.setTimeLists(byear, bmonth, bday, bhour, bmin, bsec, bcsec,
eyear, emonth, eday, ehour, emin, esec, ecsec)
except:
pass
def __str__(self):
""" returns a string representation of a MadrigalHeaderRecord """
retStr = 'Header Record:\n'
retStr += 'kinst = %i (%s)\n' % (self._kinst, self._instrumentName)
retStr += 'kindat = %i\n' % (self._kindat)
retStr += 'record start: %04i-%02i-%02i %02i:%02i:%02i.%02i\n' % (self._sYear,
self._sMonth,
self._sDay,
self._sHour,
self._sMin,
self._sSec,
self._sCentisec)
retStr += 'record end: %04i-%02i-%02i %02i:%02i:%02i.%02i\n' % (self._eYear,
self._eMonth,
self._eDay,
self._eHour,
self._eMin,
self._eSec,
self._eCentisec)
retStr += 'jpar = %i, mpar = %i' % (self._jpar, self._mpar)
for i in range(0, len(self._text) -1, 80):
retStr += '%s\n' % (self._text[i:i+80])
return(retStr)
def __cmp__(self, other):
"""cmpRecords compares two cedar records to allow them to be sorted
"""
if other is None:
return(1)
# compare record start times
fList = self.getStartTimeList()
sList = other.getStartTimeList()
fDT = datetime.datetime(*fList)
sDT = datetime.datetime(*sList)
result = cmp(fDT, sDT)
if result:
return(result)
# compare record type
typeList = ('catalog', 'header', 'data')
fType = self.getType()
sType = other.getType()
result = cmp(typeList.index(fType), typeList.index(sType))
if result:
return(result)
# compare record stop times
fList = self.getEndTimeList()
sList = other.getEndTimeList()
fDT = datetime.datetime(*fList)
sDT = datetime.datetime(*sList)
result = cmp(fDT, sDT)
if result:
return(result)
# compare kindat if both data
if fType == 'data' and sType == 'data':
result = cmp(self.getKindat(), other.getKindat())
if result:
return(result)
return(0)
Ancestors (in MRO)
- MadrigalHeaderRecord
- builtins.object
Static methods
def __init__(
self, kinst=None, kindat=None, sYear=None, sMonth=None, sDay=None, sHour=None, sMin=None, sSec=None, sCentisec=None, eYear=None, eMonth=None, eDay=None, eHour=None, eMin=None, eSec=None, eCentisec=None, jpar=None, mpar=None, text=None, madInstObj=None, madKindatObj=None, expNotesLines=None)
init creates a MadrigalCatalogRecord.
Note: all inputs have default values because there are two ways to populate this structure: 1) with all inputs from kinst to text when new data is being created, or 2) with catalog line list from existing Hdf5 file Experiment Notes metadata, plus non-default inputs
Inputs:
kinst - the kind of instrument code. A warning will be raised if not in instTab.txt.
kindat - kind of data code. Must be a non-negative integer.
sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec - experiment start time. sCentisec must be 0-99
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec - experiment end time. eCentisec must be 0-99
jpar - the number of 1d parameters in the following data records
mpar - the number of 2d parameters in the following data records
text - string containing text in catalog record. Length must be divisible by 80. No linefeeds
allowed.
madInstObj - a madrigal.metadata.MadrigalInstrument object. If None, one will be created.
Used to verify kinst.
madKindatObj - a madrigal.metadata.MadrigalKindat onject. If none, one will be created.
Used to verify kindat.
expNotesList - a list of all lines in an existing header section in "Experiment Notes"
metadata table. All the above attributes are parsed from these lines.
Outputs: None
Returns: None
def __init__(self, kinst = None,
kindat = None,
sYear = None, sMonth = None, sDay = None,
sHour = None, sMin = None, sSec = None, sCentisec = None,
eYear = None, eMonth = None, eDay = None,
eHour = None, eMin = None, eSec = None, eCentisec = None,
jpar = None, mpar = None,
text = None,
madInstObj = None, madKindatObj = None,
expNotesLines=None):
"""__init__ creates a MadrigalCatalogRecord.
Note: all inputs have default values because there are two ways to populate this structure:
1) with all inputs from kinst to text when new data is being created, or
2) with catalog line list from existing Hdf5 file Experiment Notes metadata, plus non-default inputs
Inputs:
kinst - the kind of instrument code. A warning will be raised if not in instTab.txt.
kindat - kind of data code. Must be a non-negative integer.
sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec - experiment start time. sCentisec must be 0-99
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec - experiment end time. eCentisec must be 0-99
jpar - the number of 1d parameters in the following data records
mpar - the number of 2d parameters in the following data records
text - string containing text in catalog record. Length must be divisible by 80. No linefeeds
allowed.
madInstObj - a madrigal.metadata.MadrigalInstrument object. If None, one will be created.
Used to verify kinst.
madKindatObj - a madrigal.metadata.MadrigalKindat onject. If none, one will be created.
Used to verify kindat.
expNotesList - a list of all lines in an existing header section in "Experiment Notes"
metadata table. All the above attributes are parsed from these lines.
Outputs: None
Returns: None
"""
# create any needed Madrigal objects, if not passed in
if madInstObj is None:
self._madInstObj = madrigal.metadata.MadrigalInstrument()
else:
self._madInstObj = madInstObj
if madKindatObj is None:
self._madKindatObj = madrigal.metadata.MadrigalKindat()
else:
self._madKindatObj = madKindatObj
if expNotesLines != None:
# get all information from this dataset
self._parseExpNotesLines(expNotesLines)
if not kinst is None:
# kinst set via header record overrides kinst argument
try:
self.getKinst()
except AttributeError:
self.setKinst(kinst)
# verify kinst set, or raise error
try:
self.getKinst()
except AttributeError:
raise ValueError('kinst not set when MadrigalHeaderRecord created - required')
if not kindat is None:
# kindat set via header record overrides kindat argument
try:
self.getKindat()
except AttributeError:
self.setKindat(kindat)
try:
self.setTimeLists(sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec,
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec)
except:
pass
if not jpar is None:
self.setJpar(jpar)
if not mpar is None:
self.setMpar(mpar)
if not text is None:
self.setText(text)
def getEndTimeList(
self)
getEndTimeList returns a tuple containing eYear, eMonth, eDay, eHour, eMin, eSec, and eCentisec
Inputs: None
Outputs: a tuple containing eYear, eMonth, eDay, eHour, eMin, eSec, and eCentisec.
def getEndTimeList(self):
"""getEndTimeList returns a tuple containing eYear, eMonth, eDay, eHour, eMin, eSec, and eCentisec
Inputs: None
Outputs: a tuple containing eYear, eMonth, eDay, eHour, eMin, eSec, and eCentisec.
"""
return((self._eYear,
self._eMonth,
self._eDay,
self._eHour,
self._eMin,
self._eSec,
self._eCentisec))
def getJpar(
self)
returns the number of one-dimensional parameters in the associated data records.
def getJpar(self):
"""returns the number of one-dimensional parameters in the associated data records.
"""
return self._jpar
def getKindat(
self)
getKindat returns the kind of data code (int) for a given header record.
Inputs: None
Outputs: the kind of data code (int) for a given header record.
def getKindat(self):
"""getKindat returns the kind of data code (int) for a given header record.
Inputs: None
Outputs: the kind of data code (int) for a given header record.
"""
return(self._kindat)
def getKinst(
self)
getKinst returns the kind of instrument code (int) for a given header record.
Inputs: None
Outputs: the kind of instrument code (int) for a given header record.
def getKinst(self):
"""getKinst returns the kind of instrument code (int) for a given header record.
Inputs: None
Outputs: the kind of instrument code (int) for a given header record.
"""
return(self._kinst)
def getLines(
self)
getLines returns a numpy recarray of the format expected by the "Experiment Notes" dataset
def getLines(self):
"""getLines returns a numpy recarray of the format expected by the "Experiment Notes" dataset
"""
# templates
krechStr = 'KRECH 3002 Header Record, Version 3'
kinstTempStr = 'KINST %i %s'
kindatTempStr = 'KINDAT %i %s'
byearTempStr = 'IBYRT %04i Beginning year'
bmdTempStr = 'IBDTT %04i Beginning month and day'
bhmTempStr = 'IBHMT %04i Beginning UT hour and minute'
bcsTempStr = 'IBCST %04i Beginning centisecond'
eyearTempStr = 'IEYRT %04i Ending year'
emdTempStr = 'IEDTT %04i Ending month and day'
ehmTempStr = 'IEHMT %04i Ending UT hour and minute'
ecsTempStr = 'IECST %04i Ending centisecond'
numLines = int(self.getTextLineCount() + 12) # 8 times lines, KRECH, KINST, KINDAT, and final blank
textArr = numpy.recarray((numLines,), dtype=[('File Notes', h5py.special_dtype(vlen=str))])
for i in range(numLines-9):
if i == 0:
textArr[i]['File Notes'] = krechStr + ' ' * (80 - len(krechStr))
elif i == 1:
kinstName = self._madInstObj.getInstrumentName(self.getKinst())
kinstStr = kinstTempStr % (self.getKinst(), kinstName)
if len(kinstStr) > 80:
kinstStr = kinstStr[:80]
textArr[i]['File Notes'] = kinstStr + ' ' * (80 - len(kinstStr))
elif i == 2:
kindatStr = kindatTempStr % (self.getKindat(),
self._madKindatObj.getKindatDescription(self.getKindat(),
self.getKinst()))
if len(kindatStr) > 80:
kindatStr = kindatStr[:80]
textArr[i]['File Notes'] = kindatStr + ' ' * (80 - len(kindatStr))
else:
textArr[i]['File Notes'] = self.getText()[(i-3)*80:(i-2)*80]
# finally add time lines
sYear, sMonth, sDay, sHour, sMin, sSec, sCentisec = self.getStartTimeList()
eYear, eMonth, eDay, eHour, eMin, eSec, eCentisec = self.getEndTimeList()
ibdtt = sMonth*100 + sDay
ibhmt = sHour*100 + sMin
ibcst = sSec*100 + sCentisec
iedtt = eMonth*100 + eDay
iehmt = eHour*100 + eMin
iecst = eSec*100 + eCentisec
sYearStr = byearTempStr % (sYear)
textArr[i+1]['File Notes'] = sYearStr + ' ' * (80 - len(sYearStr))
sMDStr = bmdTempStr % (ibdtt)
textArr[i+2]['File Notes'] = sMDStr + ' ' * (80 - len(sMDStr))
sHMStr = bhmTempStr % (ibhmt)
textArr[i+3]['File Notes'] = sHMStr + ' ' * (80 - len(sHMStr))
sCSStr = bcsTempStr % (ibcst)
textArr[i+4]['File Notes'] = sCSStr + ' ' * (80 - len(sCSStr))
eYearStr = eyearTempStr % (eYear)
textArr[i+5]['File Notes'] = eYearStr + ' ' * (80 - len(eYearStr))
eMDStr = emdTempStr % (iedtt)
textArr[i+6]['File Notes'] = eMDStr + ' ' * (80 - len(eMDStr))
eHMStr = ehmTempStr % (iehmt)
textArr[i+7]['File Notes'] = eHMStr + ' ' * (80 - len(eHMStr))
eCSStr = ecsTempStr % (iecst)
textArr[i+8]['File Notes'] = eCSStr + ' ' * (80 - len(eCSStr))
textArr[i+9]['File Notes'] = ' ' * 80
return(textArr)
def getMpar(
self)
returns the number of two-dimensional parameters in the associated data records.
def getMpar(self):
"""returns the number of two-dimensional parameters in the associated data records.
"""
return self._mpar
def getStartTimeList(
self)
getStartTimeList returns a tuple containing sYear, sMonth, sDay, sHour, sMin, sSec, and sCentisec
Inputs: None
Outputs: a tuple containing sYear, sMonth, sDay, sHour, sMin, sSec, and sCentisec.
def getStartTimeList(self):
"""getStartTimeList returns a tuple containing sYear, sMonth, sDay, sHour, sMin, sSec, and sCentisec
Inputs: None
Outputs: a tuple containing sYear, sMonth, sDay, sHour, sMin, sSec, and sCentisec.
"""
return((self._sYear,
self._sMonth,
self._sDay,
self._sHour,
self._sMin,
self._sSec,
self._sCentisec))
def getText(
self)
getText returns the header text.
Inputs: None
Outputs: the header text.
def getText(self):
"""getText returns the header text.
Inputs: None
Outputs: the header text.
"""
return(self._text)
def getTextLineCount(
self)
getTextLineCount returns the number of 80 character lines in self._text
def getTextLineCount(self):
"""getTextLineCount returns the number of 80 character lines in self._text
"""
if len(self._text) % 80 == 0:
return(int(len(self._text) / 80))
else:
return(int(1 + int(len(self._text) / 80)))
def getType(
self)
returns the type 'header'
def getType(self):
""" returns the type 'header'"""
return 'header'
def setJpar(
self, jpar)
set the number of one-dimensional parameters in the associated data records.
Must not be negative.
def setJpar(self, jpar):
""" set the number of one-dimensional parameters in the associated data records.
Must not be negative.
"""
self._jpar = int(jpar)
if self._jpar < 0:
raise TypeError('jpar must not be less than 0')
def setKindat(
self, kindat)
setKindat sets the mode of kind of data code (int) for a given header record.
Inputs: the kind of data code (int)
Outputs: None
Affects: sets the kind of data code (int) (self._kindat)
Exceptions: Raises ValueError if kindat less than 0
def setKindat(self, kindat):
"""setKindat sets the mode of kind of data code (int) for a given header record.
Inputs: the kind of data code (int)
Outputs: None
Affects: sets the kind of data code (int) (self._kindat)
Exceptions: Raises ValueError if kindat less than 0
"""
self._kindat = int(kindat)
if self._kindat < 0:
raise ValueError('kindat must not be less than 0, not %i' % (self._kindat))
def setKinst(
self, kinst)
setKinst sets the kind of instrument code (int) for a given header record.
Inputs: kind of instrument code (integer)
Outputs: None
Affects: sets the kind of instrument code (int) (self._kinst) for a given header record. Prints warning if kinst not found in instTab.txt
def setKinst(self, kinst):
"""setKinst sets the kind of instrument code (int) for a given header record.
Inputs: kind of instrument code (integer)
Outputs: None
Affects: sets the kind of instrument code (int) (self._kinst) for a given header record.
Prints warning if kinst not found in instTab.txt
"""
kinst = int(kinst)
# verify and set kinst
instList = self._madInstObj.getInstrumentList()
found = False
for inst in instList:
if inst[2] == kinst:
self._instrumentName = inst[0]
found = True
break
if found == False:
self._instrumentName = 'Unknown instrument'
sys.stderr.write('Warning: kinst %i not found in instTab.txt\n' % (kinst))
self._kinst = kinst
def setMpar(
self, mpar)
set the number of two-dimensional parameters in the associated data records.
Must not be negative.
def setMpar(self, mpar):
""" set the number of two-dimensional parameters in the associated data records.
Must not be negative.
"""
self._mpar = int(mpar)
if self._mpar < 0:
raise TypeError('mpar must not be less than 0')
def setText(
self, text)
setText sets the header text.
Inputs: text: text to be set. Must be length divisible by 80, and not contain line feeds. For now, must not exceed 2^16 - 80 bytes to be able to be handled by Cedar format.
Outputs: None.
Affects: sets self._text
Raises TypeError if problem with text
def setText(self, text):
"""setText sets the header text.
Inputs: text: text to be set. Must be length divisible by 80, and not contain line feeds.
For now, must not exceed 2^16 - 80 bytes to be able to be handled by Cedar format.
Outputs: None.
Affects: sets self._text
Raises TypeError if problem with text
"""
textTypes = [str]
if type(text) not in textTypes:
raise TypeError('text must be of type string')
if len(text) % 80 != 0:
raise TypeError('text length must be divisible by 80: len is %i' % (len(text)))
if text.find('\n') != -1:
raise TypeError('text must not contain linefeed character')
if len(text) > 65536 - 80:
raise TypeError('text exceeds ability of Cedar format to store')
self._text = text
def setTimeLists(
self, sYear, sMonth, sDay, sHour, sMin, sSec, sCentisec, eYear, eMonth, eDay, eHour, eMin, eSec, eCentisec)
setTimeList resets start and end times
Inputs:
sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec - experiment start time. sCentisec must be 0-99
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec - experiment end time. eCentisec must be 0-99
Outputs: None
Affects: sets all time attributes (see code).
Exceptions: Raises ValueError if startTime > endTime
def setTimeLists(self, sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec,
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec):
"""setTimeList resets start and end times
Inputs:
sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec - experiment start time. sCentisec must be 0-99
eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec - experiment end time. eCentisec must be 0-99
Outputs: None
Affects: sets all time attributes (see code).
Exceptions: Raises ValueError if startTime > endTime
"""
# verify times
sTime = datetime.datetime(sYear,sMonth,sDay,sHour,sMin,sSec,sCentisec*10000)
eTime = datetime.datetime(eYear,eMonth,eDay,eHour,eMin,eSec,eCentisec*10000)
if eTime < sTime:
raise ValueError('Starting time cannot be after ending time')
self._sTime = madrigal.metadata.getMadrigalUTFromDT(sTime)
self._eTime = madrigal.metadata.getMadrigalUTFromDT(eTime)
self._sYear = sYear
self._sMonth = sMonth
self._sDay = sDay
self._sHour = sHour
self._sMin = sMin
self._sSec = sSec
self._sCentisec = sCentisec
self._eYear = eYear
self._eMonth = eMonth
self._eDay = eDay
self._eHour = eHour
self._eMin = eMin
self._eSec = eSec
self._eCentisec = eCentisec
class convertToNetCDF4
class convertToNetCDF4:
def __init__(self, inputHdf5, outputNC):
"""convertToNetCDF4 converts a Madrigal HDF5 file to netCDF4 using Array Layout
rather than using Table Layout as cedar module does. Can handle large Hdf5 file
without large memory footprint, and is much faster than reading in using
madrigal.cedar.MadrigalCedarFile
Inputs:
inputHdf5 - filename of input Madrigal Hdf5 file
outputNC - output netCDF4 file
"""
madParmObj = madrigal.data.MadrigalParameters()
self._fi = h5py.File(inputHdf5, 'r')
if 'Array Layout' not in self._fi['Data']:
if os.path.getsize(inputHdf5) < 50000000:
# for smaller files we simply go through the slower full cedar conversion
cedarObj = MadrigalCedarFile(inputHdf5)
cedarObj.write('netCDF4', outputNC)
return
else:
# file is to big to load into memory at once, read only 10 records at once at write to file
# parm IndexDict is a dictionary with key = timestamps and ind spatial parm names,
# value = dictionary of keys = unique values, value = index
# temp only
total = 0
t = time.time()
parmIndexDict = self._getParmIndexDict()
self._fi.close()
madCedarObj = madrigal.cedar.MadrigalCedarFile(inputHdf5, maxRecords=10)
madCedarObj.dump('netCDF4', outputNC, parmIndexDict)
total += 10
while (True):
# temp only
print('%i done so far in %f secs' % (total, time.time()-t))
newRecs, isComplete = madCedarObj.loadNextRecords(10)
if isComplete:
break
madCedarObj.dump('netCDF4', outputNC, parmIndexDict)
if newRecs < 10:
break
total += newRecs
# compress
filename, file_extension = os.path.splitext(outputNC)
# tmp file name to use to run h5repack
tmpFile = filename + '_tmp' + file_extension
cmd = 'h5repack -i %s -o %s --filter=GZIP=4' % (outputNC, tmpFile)
try:
subprocess.check_call(shlex.split(cmd))
except:
traceback.print_exc()
return
shutil.move(tmpFile, outputNC)
return
self._fo = netCDF4.Dataset(outputNC, 'w', format='NETCDF4')
self._fo.catalog_text = self.getCatalogText()
self._fo.header_text = self.getHeaderText()
# write Experiment Parameters
experimentParameters = self._fi['Metadata']['Experiment Parameters']
for i in range(len(experimentParameters)):
name = experimentParameters['name'][i]
if type(name) in (bytes, numpy.bytes_):
name = name.decode("utf8")
# make text acceptable attribute names
name = name.replace(' ', '_')
name = name.replace('(s)', '')
self._fo.setncattr(name, experimentParameters['value'][i])
indParmList = [parm[0].lower() for parm in self._fi['Metadata']['Independent Spatial Parameters']]
# split parms - if any
has_split = 'Parameters Used to Split Array Data' in list(self._fi['Metadata'].keys())
arraySplittingMnemonics = []
if has_split:
arraySplittingParms = self._fi['Metadata']['Parameters Used to Split Array Data']
arrSplitParmDesc = ''
for i in range(len(arraySplittingParms)):
arrSplitParmDesc += '%s: ' % (arraySplittingParms[i]['mnemonic'].lower())
arrSplitParmDesc += '%s' % (arraySplittingParms[i]['description'].lower())
arraySplittingMnemonics.append(arraySplittingParms[i]['mnemonic'].lower())
if arraySplittingParms[i] != arraySplittingParms[-1]:
arrSplitParmDesc += ' -- '
self._fo.parameters_used_to_split_data = arrSplitParmDesc
if has_split:
names = list(self._fi['Data']['Array Layout'].keys())
groups = [self._fi['Data']['Array Layout'][name] for name in names]
else:
names = [None]
groups = [self._fi['Data']['Array Layout']]
# loop through each split array (or just top level, if none
for i in range(len(groups)):
name = names[i]
if not name is None:
nc_name = name.strip().replace(' ', '_')
thisGroup = self._fo.createGroup(nc_name)
hdf5Group = self._fi['Data']['Array Layout'][name]
else:
thisGroup = self._fo
hdf5Group = self._fi['Data']['Array Layout']
times = hdf5Group['timestamps']
# next step - create dimensions
dims = []
# first time dim
thisGroup.createDimension("timestamps", len(times))
timeVar = thisGroup.createVariable("timestamps", 'f8', ("timestamps",),
zlib=True)
timeVar.units = 'Unix seconds'
timeVar.description = 'Number of seconds since UT midnight 1970-01-01'
timeVar[:] = times
dims.append("timestamps")
# next ind parms, because works well with ncview that way
for indParm in indParmList:
if type(indParm) == bytes:
indParmString = indParm.decode('utf8')
else:
indParmString = indParm
if indParmString in arraySplittingMnemonics:
continue
thisGroup.createDimension(indParmString, len(hdf5Group[indParmString]))
if madParmObj.isInteger(indParmString):
thisVar = thisGroup.createVariable(indParmString, 'i8', (indParmSting,),
zlib=True)
thisVar[:] = hdf5Group[indParmString]
elif madParmObj.isString(indParmString):
slen = len(hdf5Group[indParmString][0])
dtype = 'S%i' % (slen)
thisVar = thisGroup.createVariable(indParmString, dtype, (indParmString,),
zlib=True)
for i in range(len(hdf5Group[indParmString])):
thisVar[i] = str(hdf5Group[indParmString][i])
else:
thisVar = thisGroup.createVariable(indParmString, 'f8', (indParmString,),
zlib=True)
thisVar[:] = hdf5Group[indParmString]
thisVar.units = madParmObj.getParmUnits(indParmString)
thisVar.description = madParmObj.getSimpleParmDescription(indParmString)
dims.append(indParmString)
# get all one d data
oneDParms = list(hdf5Group['1D Parameters'].keys())
for oneDParm in oneDParms:
if oneDParm in indParmList:
if oneDParm not in arraySplittingMnemonics:
continue
if oneDParm.find('Data Parameters') != -1:
continue
if madParmObj.isInteger(oneDParm):
oneDVar = thisGroup.createVariable(oneDParm, 'i8', (dims[0],),
zlib=True)
elif madParmObj.isString(oneDParm):
slen = len(hdf5Group['1D Parameters'][oneDParm][0])
dtype = 'S%i' % (slen)
oneDVar = thisGroup.createVariable(oneDParm, dtype, (dims[0],),
zlib=True)
else:
oneDVar = thisGroup.createVariable(oneDParm, 'f8', (dims[0],),
zlib=True)
oneDVar.units = madParmObj.getParmUnits(oneDParm)
oneDVar.description = madParmObj.getSimpleParmDescription(oneDParm)
try:
oneDVar[:] = hdf5Group['1D Parameters'][oneDParm]
except:
oneDVar[:] = hdf5Group['1D Parameters'][oneDParm][()]
# get all two d data
twoDParms = list(hdf5Group['2D Parameters'].keys())
for twoDParm in twoDParms:
if twoDParm.find('Data Parameters') != -1:
continue
if twoDParm in indParmList:
if twoDParm not in arraySplittingMnemonics:
continue
if madParmObj.isInteger(twoDParm):
twoDVar = thisGroup.createVariable(twoDParm, 'i8', dims,
zlib=True)
elif madParmObj.isString(twoDParm):
slen = len(hdf5Group['2D Parameters'][twoDParm][0])
dtype = 'S%i' % (slen)
twoDVar = thisGroup.createVariable(twoDParm, dtype, dims,
zlib=True)
else:
twoDVar = thisGroup.createVariable(twoDParm, 'f8', dims,
zlib=True)
twoDVar.units = madParmObj.getParmUnits(twoDParm)
twoDVar.description = madParmObj.getSimpleParmDescription(twoDParm)
# move the last dim in Hdf5 (time) to be the first now
reshape = list(range(len(dims)))
newShape = reshape[-1:] + reshape[0:-1]
data = numpy.transpose(hdf5Group['2D Parameters'][twoDParm], newShape)
twoDVar[:] = data
data = None
self._fo.close()
self._fi.close()
def getCatalogText(self):
"""getCatalogText returns the catalog record text as a string
"""
if not 'Experiment Notes' in list(self._fi['Metadata'].keys()):
return('')
notes = self._fi['Metadata']['Experiment Notes']
retStr = ''
for substr in notes:
if substr[0].find(b'Header information') != -1:
break
retStr += substr[0].decode('utf-8')
return(retStr)
def getHeaderText(self):
"""getHeaderText returns the header record text as a string
"""
if not 'Experiment Notes' in list(self._fi['Metadata'].keys()):
return('')
notes = self._fi['Metadata']['Experiment Notes']
retStr = ''
headerFound = False
for substr in notes:
if substr[0].find(b'Header information') != -1:
headerFound = True
if headerFound:
retStr += substr[0].decode('utf-8')
return(retStr)
def _getParmIndexDict(self):
"""_getParmIndexDict returns a dictionary with key = timestamps and ind spatial parm names,
value = dictionary of keys = unique values, value = index of that value
"""
retDict = {}
parmList = ['ut1_unix'] + [parm[0].lower() for parm in self._fi['Metadata']['Independent Spatial Parameters']]
for parm in parmList:
if type(parm) == bytes:
parm = parm.decode('utf-8')
values = self._fi['Data']['Table Layout'][parm]
unique_values = numpy.unique(values)
sorted_values = numpy.sort(unique_values)
retDict[parm] = collections.OrderedDict()
for value, key in numpy.ndenumerate(sorted_values):
if type(key) in (numpy.bytes_, bytes):
key = key.decode('utf-8')
retDict[parm][key] = value[0]
return(retDict)
Ancestors (in MRO)
- convertToNetCDF4
- builtins.object
Static methods
def __init__(
self, inputHdf5, outputNC)
convertToNetCDF4 converts a Madrigal HDF5 file to netCDF4 using Array Layout rather than using Table Layout as cedar module does. Can handle large Hdf5 file without large memory footprint, and is much faster than reading in using madrigal.cedar.MadrigalCedarFile
Inputs: inputHdf5 - filename of input Madrigal Hdf5 file outputNC - output netCDF4 file
def __init__(self, inputHdf5, outputNC):
"""convertToNetCDF4 converts a Madrigal HDF5 file to netCDF4 using Array Layout
rather than using Table Layout as cedar module does. Can handle large Hdf5 file
without large memory footprint, and is much faster than reading in using
madrigal.cedar.MadrigalCedarFile
Inputs:
inputHdf5 - filename of input Madrigal Hdf5 file
outputNC - output netCDF4 file
"""
madParmObj = madrigal.data.MadrigalParameters()
self._fi = h5py.File(inputHdf5, 'r')
if 'Array Layout' not in self._fi['Data']:
if os.path.getsize(inputHdf5) < 50000000:
# for smaller files we simply go through the slower full cedar conversion
cedarObj = MadrigalCedarFile(inputHdf5)
cedarObj.write('netCDF4', outputNC)
return
else:
# file is to big to load into memory at once, read only 10 records at once at write to file
# parm IndexDict is a dictionary with key = timestamps and ind spatial parm names,
# value = dictionary of keys = unique values, value = index
# temp only
total = 0
t = time.time()
parmIndexDict = self._getParmIndexDict()
self._fi.close()
madCedarObj = madrigal.cedar.MadrigalCedarFile(inputHdf5, maxRecords=10)
madCedarObj.dump('netCDF4', outputNC, parmIndexDict)
total += 10
while (True):
# temp only
print('%i done so far in %f secs' % (total, time.time()-t))
newRecs, isComplete = madCedarObj.loadNextRecords(10)
if isComplete:
break
madCedarObj.dump('netCDF4', outputNC, parmIndexDict)
if newRecs < 10:
break
total += newRecs
# compress
filename, file_extension = os.path.splitext(outputNC)
# tmp file name to use to run h5repack
tmpFile = filename + '_tmp' + file_extension
cmd = 'h5repack -i %s -o %s --filter=GZIP=4' % (outputNC, tmpFile)
try:
subprocess.check_call(shlex.split(cmd))
except:
traceback.print_exc()
return
shutil.move(tmpFile, outputNC)
return
self._fo = netCDF4.Dataset(outputNC, 'w', format='NETCDF4')
self._fo.catalog_text = self.getCatalogText()
self._fo.header_text = self.getHeaderText()
# write Experiment Parameters
experimentParameters = self._fi['Metadata']['Experiment Parameters']
for i in range(len(experimentParameters)):
name = experimentParameters['name'][i]
if type(name) in (bytes, numpy.bytes_):
name = name.decode("utf8")
# make text acceptable attribute names
name = name.replace(' ', '_')
name = name.replace('(s)', '')
self._fo.setncattr(name, experimentParameters['value'][i])
indParmList = [parm[0].lower() for parm in self._fi['Metadata']['Independent Spatial Parameters']]
# split parms - if any
has_split = 'Parameters Used to Split Array Data' in list(self._fi['Metadata'].keys())
arraySplittingMnemonics = []
if has_split:
arraySplittingParms = self._fi['Metadata']['Parameters Used to Split Array Data']
arrSplitParmDesc = ''
for i in range(len(arraySplittingParms)):
arrSplitParmDesc += '%s: ' % (arraySplittingParms[i]['mnemonic'].lower())
arrSplitParmDesc += '%s' % (arraySplittingParms[i]['description'].lower())
arraySplittingMnemonics.append(arraySplittingParms[i]['mnemonic'].lower())
if arraySplittingParms[i] != arraySplittingParms[-1]:
arrSplitParmDesc += ' -- '
self._fo.parameters_used_to_split_data = arrSplitParmDesc
if has_split:
names = list(self._fi['Data']['Array Layout'].keys())
groups = [self._fi['Data']['Array Layout'][name] for name in names]
else:
names = [None]
groups = [self._fi['Data']['Array Layout']]
# loop through each split array (or just top level, if none
for i in range(len(groups)):
name = names[i]
if not name is None:
nc_name = name.strip().replace(' ', '_')
thisGroup = self._fo.createGroup(nc_name)
hdf5Group = self._fi['Data']['Array Layout'][name]
else:
thisGroup = self._fo
hdf5Group = self._fi['Data']['Array Layout']
times = hdf5Group['timestamps']
# next step - create dimensions
dims = []
# first time dim
thisGroup.createDimension("timestamps", len(times))
timeVar = thisGroup.createVariable("timestamps", 'f8', ("timestamps",),
zlib=True)
timeVar.units = 'Unix seconds'
timeVar.description = 'Number of seconds since UT midnight 1970-01-01'
timeVar[:] = times
dims.append("timestamps")
# next ind parms, because works well with ncview that way
for indParm in indParmList:
if type(indParm) == bytes:
indParmString = indParm.decode('utf8')
else:
indParmString = indParm
if indParmString in arraySplittingMnemonics:
continue
thisGroup.createDimension(indParmString, len(hdf5Group[indParmString]))
if madParmObj.isInteger(indParmString):
thisVar = thisGroup.createVariable(indParmString, 'i8', (indParmSting,),
zlib=True)
thisVar[:] = hdf5Group[indParmString]
elif madParmObj.isString(indParmString):
slen = len(hdf5Group[indParmString][0])
dtype = 'S%i' % (slen)
thisVar = thisGroup.createVariable(indParmString, dtype, (indParmString,),
zlib=True)
for i in range(len(hdf5Group[indParmString])):
thisVar[i] = str(hdf5Group[indParmString][i])
else:
thisVar = thisGroup.createVariable(indParmString, 'f8', (indParmString,),
zlib=True)
thisVar[:] = hdf5Group[indParmString]
thisVar.units = madParmObj.getParmUnits(indParmString)
thisVar.description = madParmObj.getSimpleParmDescription(indParmString)
dims.append(indParmString)
# get all one d data
oneDParms = list(hdf5Group['1D Parameters'].keys())
for oneDParm in oneDParms:
if oneDParm in indParmList:
if oneDParm not in arraySplittingMnemonics:
continue
if oneDParm.find('Data Parameters') != -1:
continue
if madParmObj.isInteger(oneDParm):
oneDVar = thisGroup.createVariable(oneDParm, 'i8', (dims[0],),
zlib=True)
elif madParmObj.isString(oneDParm):
slen = len(hdf5Group['1D Parameters'][oneDParm][0])
dtype = 'S%i' % (slen)
oneDVar = thisGroup.createVariable(oneDParm, dtype, (dims[0],),
zlib=True)
else:
oneDVar = thisGroup.createVariable(oneDParm, 'f8', (dims[0],),
zlib=True)
oneDVar.units = madParmObj.getParmUnits(oneDParm)
oneDVar.description = madParmObj.getSimpleParmDescription(oneDParm)
try:
oneDVar[:] = hdf5Group['1D Parameters'][oneDParm]
except:
oneDVar[:] = hdf5Group['1D Parameters'][oneDParm][()]
# get all two d data
twoDParms = list(hdf5Group['2D Parameters'].keys())
for twoDParm in twoDParms:
if twoDParm.find('Data Parameters') != -1:
continue
if twoDParm in indParmList:
if twoDParm not in arraySplittingMnemonics:
continue
if madParmObj.isInteger(twoDParm):
twoDVar = thisGroup.createVariable(twoDParm, 'i8', dims,
zlib=True)
elif madParmObj.isString(twoDParm):
slen = len(hdf5Group['2D Parameters'][twoDParm][0])
dtype = 'S%i' % (slen)
twoDVar = thisGroup.createVariable(twoDParm, dtype, dims,
zlib=True)
else:
twoDVar = thisGroup.createVariable(twoDParm, 'f8', dims,
zlib=True)
twoDVar.units = madParmObj.getParmUnits(twoDParm)
twoDVar.description = madParmObj.getSimpleParmDescription(twoDParm)
# move the last dim in Hdf5 (time) to be the first now
reshape = list(range(len(dims)))
newShape = reshape[-1:] + reshape[0:-1]
data = numpy.transpose(hdf5Group['2D Parameters'][twoDParm], newShape)
twoDVar[:] = data
data = None
self._fo.close()
self._fi.close()
def getCatalogText(
self)
getCatalogText returns the catalog record text as a string
def getCatalogText(self):
"""getCatalogText returns the catalog record text as a string
"""
if not 'Experiment Notes' in list(self._fi['Metadata'].keys()):
return('')
notes = self._fi['Metadata']['Experiment Notes']
retStr = ''
for substr in notes:
if substr[0].find(b'Header information') != -1:
break
retStr += substr[0].decode('utf-8')
return(retStr)
def getHeaderText(
self)
getHeaderText returns the header record text as a string
def getHeaderText(self):
"""getHeaderText returns the header record text as a string
"""
if not 'Experiment Notes' in list(self._fi['Metadata'].keys()):
return('')
notes = self._fi['Metadata']['Experiment Notes']
retStr = ''
headerFound = False
for substr in notes:
if substr[0].find(b'Header information') != -1:
headerFound = True
if headerFound:
retStr += substr[0].decode('utf-8')
return(retStr)
class convertToText
class convertToText:
def __init__(self, inputHdf5, outputTxt, summary='plain', showHeaders=False,
filterList=None, missing=None, assumed=None, knownbad=None):
"""convertToText converts a Madrigal HDF5 file to a text file. Designed to be able
to handle large files without a large memory footprint
Inputs:
inputHdf5 - filename of input Madrigal Hdf5 file
outputTxt - output text file
summary - type of summary line to print at top. Allowed values are:
'plain' - text only mnemonic names, but only if not showHeaders
'html' - mnemonic names wrapped in standard javascript code to allow descriptive popups
'summary' - print overview of file and filters used. Also text only mnemonic names,
but only if not showHeaders
None - no summary line
showHeaders - if True, print header in format for each record. If False, the default,
do not.
filterList - a list of madrigal.derivation.MadrigalFilter objects to be described in the
summary. Default is None, in which case not described in summary. Ignored if summary
is not 'summary'
missing, assumed, knownbad - how to print Cedar special values. Default is None for
all, so that value printed in value in numpy table as per spec.
"""
madCedarObj = madrigal.cedar.MadrigalCedarFile(inputHdf5, maxRecords=10)
madCedarObj.writeText(outputTxt, summary=summary, showHeaders=showHeaders, filterList=filterList,
missing=missing, assumed=assumed, knownbad=knownbad, append=True,
firstWrite=True)
while (True):
newRecs, isComplete = madCedarObj.loadNextRecords(10)
if isComplete:
break
madCedarObj.writeText(outputTxt, summary=summary, showHeaders=showHeaders,
missing=missing, assumed=assumed, knownbad=knownbad,
append=True, firstWrite=False)
if newRecs < 10:
break
Ancestors (in MRO)
- convertToText
- builtins.object
Static methods
def __init__(
self, inputHdf5, outputTxt, summary='plain', showHeaders=False, filterList=None, missing=None, assumed=None, knownbad=None)
convertToText converts a Madrigal HDF5 file to a text file. Designed to be able to handle large files without a large memory footprint
Inputs:
inputHdf5 - filename of input Madrigal Hdf5 file
outputTxt - output text file
summary - type of summary line to print at top. Allowed values are:
'plain' - text only mnemonic names, but only if not showHeaders
'html' - mnemonic names wrapped in standard javascript code to allow descriptive popups
'summary' - print overview of file and filters used. Also text only mnemonic names,
but only if not showHeaders
None - no summary line
showHeaders - if True, print header in format for each record. If False, the default,
do not.
filterList - a list of madrigal.derivation.MadrigalFilter objects to be described in the
summary. Default is None, in which case not described in summary. Ignored if summary
is not 'summary'
missing, assumed, knownbad - how to print Cedar special values. Default is None for
all, so that value printed in value in numpy table as per spec.
def __init__(self, inputHdf5, outputTxt, summary='plain', showHeaders=False,
filterList=None, missing=None, assumed=None, knownbad=None):
"""convertToText converts a Madrigal HDF5 file to a text file. Designed to be able
to handle large files without a large memory footprint
Inputs:
inputHdf5 - filename of input Madrigal Hdf5 file
outputTxt - output text file
summary - type of summary line to print at top. Allowed values are:
'plain' - text only mnemonic names, but only if not showHeaders
'html' - mnemonic names wrapped in standard javascript code to allow descriptive popups
'summary' - print overview of file and filters used. Also text only mnemonic names,
but only if not showHeaders
None - no summary line
showHeaders - if True, print header in format for each record. If False, the default,
do not.
filterList - a list of madrigal.derivation.MadrigalFilter objects to be described in the
summary. Default is None, in which case not described in summary. Ignored if summary
is not 'summary'
missing, assumed, knownbad - how to print Cedar special values. Default is None for
all, so that value printed in value in numpy table as per spec.
"""
madCedarObj = madrigal.cedar.MadrigalCedarFile(inputHdf5, maxRecords=10)
madCedarObj.writeText(outputTxt, summary=summary, showHeaders=showHeaders, filterList=filterList,
missing=missing, assumed=assumed, knownbad=knownbad, append=True,
firstWrite=True)
while (True):
newRecs, isComplete = madCedarObj.loadNextRecords(10)
if isComplete:
break
madCedarObj.writeText(outputTxt, summary=summary, showHeaders=showHeaders,
missing=missing, assumed=assumed, knownbad=knownbad,
append=True, firstWrite=False)
if newRecs < 10:
break