Data Storage Options | Moore Good Ideas - LabVIEW Consulting Services

Published December 3, 2020

This article is part of the Flight Test series:

Typical Data Rates per Hour

Flight test aircraft can generate massive amounts of data, especially when audio and video streams are added to the acquisition system. A minimal flight test telemetry package consists of a 2 Mb/s stream of digital and analog recorded data. Saving at this data rate produces about 1 GB/hour. Add multiple data streams, higher rate data streams, or audio and video recording, and modern flight tests can generate 40 GB/hour. One aircraft manufacturer reported that over 90% of their entire corporate IT network infrastructure was dedicated to the storage and processing of flight test data.

Consider how you will manage the storage of multiple streams of audio or video before adding the capability to your flight aircraft as data storage will cost multiples more than the hardware and software used for acquisition.

File Format Efficiency

Given the very large volumes of flight test data, care must be taken to minimize data format inefficiency and data duplication. For example, 16 bit analog to digital (A:D) converters are the normal in NI hardware and flight test acquisitions systems as a whole, but older systems may run 12-bit A:D conversions. A 12-bit A:D discretizes the data into 4096 steps for a resolution of 0.024% and 16 bit A:D discretizes the data into 65,536 steps for a resolution of 0.0015%. In either case, this data can be stored in a u16 with no loss of precision. Is it worth the level of effort to develop an in-house file format to save 12-bit data natively? Probably not. However, arbitrarily storing all data in a 64-bit datatype would be 19% to 25% space efficient depending on the source analog data. That is in no way acceptable when you consider how many TB of an expensive NAS server would be filled with useless insignificant figures.

The use of a 32-bit floating point is common in flight test data systems as it has sufficient precision to hold analog data after scaling and calibration, and it can hold all the data bits of common ARINC 429 data (the SDI, the data field, and the SSM bits fit within the 24 bits of precision available in a 32-bit float). There is a tradeoff to be made though. Do we store u16 data and apply calibrations and compute derived equations when a user requests the data, or do we store the processed result in a 32bit float so that it is available with less access delay?

Data Traceability

One requirement of your data storage system is that it has to maintain the original source data without modification so if you start with a 16-bit analog measurement, and then apply a calibration and save the result, you will now be holding the original 16 bit value and the 32 bit value. You have tripled your data storage space for the sake of speeding up data access. This might be an acceptable approach though, and indeed the best and possibly only approach. What if you were asked to reproduce data from a 20-year old test program for further analysis? What if the algorithms used to convert data to engineering units had changed? This would be rare for a linear calibration, but what about strain gauges, thermocouples, derived parameter processing, or digital filtering? Would it not be better to have a record of the raw data and the previously processed result just in case that processing method has changed in the time since? It is more likely that you will be able to open a file in 30 years than it is that you could make a processing algorithm work in 30 years.

A good practice is to store the source data in the most efficient native data type available which would be u16 for analog or u32 for ARINC, and then store floating point data in a 32-bit float. To date, humanity has only invented one common airborne instrument capable of higher precision than a 32-bit float can hold, and that is the Global Positioning System. When supplemented with the Wide Area Augmentation System or other differential GPS corrections, the Latitude and Longitude measurement has sufficient precision to require 64 bits of floating-point precision. Luckily, it is common practice to split Latitude and Longitude into a coarse and a fine component which can be stored in two 32-bit floats and then recombined together when the full precision is needed. The coarse values can be used alone for most enroute navigational purposes, but we can usually expect position to be biased up to about 30 feet in the direction of “Null Island” when the fine component is ignored. Full precision location is normally only needed for approach, landing, and ground operations.

Now that we have discussed some considerations for our data storage, let us consider some specific options.

Flight Test Data Storage Options

The options for Flight Test data file formats are as follows, in order of the best to the worst, from the LabVIEW developer’s perspective.

LabVIEW Native TDMS

The LabVIEW TDMS file should be your prime consideration for data storage in LabVIEW applications though it does come with some negatives when applied to Flight Test.

Positives:

Easiest file format to use in LabVIEW.
Can be opened and viewed with an Excel plugin - read only, but a nice capability to alleviate concerns related to file compatibility in the future.
The file format is published by National Instruments allowing organizations to program access libraries in other languages if needed, though this would perhaps be a daunting effort.
The available data formats make this a rather space-efficient option.
Application of on-access calibrations and scaling are supported natively.
Single write / simultaneous multiple read is supported.
Data access is very fast.

Negatives:

Data is organized by Groups of Channels, but additional hierarchy is not available. This is frankly unfortunate as it is a significant limitation to an otherwise excellent and easy to implement data storage format that could be used for flight test. Flight Test normally needs Groups of Groups of Channels at a minimum, and in some applications even more levels. The particulars of data grouping in a file format is worthy of an entire paper in itself, but feel free to reach out to us to discuss this in detail.
Appending of data is not easily supported. The TDMS native functions are based on writing a waveform defined by start time, dt, and an array of data. This is highly space efficient, but if you stop recording and restart recording, you would need to start a new file as the TDMS doesn’t allow groups of groups (recording 1, recording 2, etc). There are work arounds, but this should be supported within the file format without a work around.
Finally, use of the TDMS file format effectively results in vendor lock-in.

Hierarchical Data Format, HDF5

The HDF5 file format is a common industry file format for the storage of scientific data. In many ways it is similar to the LabVIEW TDMS file, but there are plugins and libraries for HDF5 in nearly all modern programming languages. Having said that, it is also not straightforward for an end user to open and manipulate HDF5 stored data without some effort.

Positives:

Groups of groups of groups (…) are supported. This allows for one file container to hold multiple data events without any effort needed to segregate data.
Second most highly integrated file format in LabVIEW, after the TDMS file.
Supported data types are comprehensive, easily surpassing TDMS.
Supports network packet recording, useful for projects that use 1553 and other network based data transmissions.
Can be opened and viewed with virtually any programming language as well as many commercial third-party applications. No concerns for file format obsolescence or vendor lock-in.
Files can be hosted on the HDF Server with built in user authentication and file edit restrictions.
The file format is open source and has been extended in the past as needed by large flight test organizations with specific needs.
The available data formats are space efficient and more extensive than even the TDMS file.
Single write / simultaneous multiple read is supported.
Data access is very fast.

Negatives:

Initial effort to begin using HDF files is higher than TDMS, but the feature set is less restrictive, especially related to multiple levels of Group hierarchy.

MATLAB Native .mat Files

The MATLAB .mat file is reported to save all data in a compressed 64-bit float. This is a compromise if true. Data storage space would be reasonably efficient, but at the cost of CPU load and access delay, especially when scanning for time indexes prior to being able to start a data read.

Positives:

None from the perspective of a LabVIEW developer.

Negatives:

The file format allows engineers to easily access the data file, manipulate that data, and save it without any record of the change. This may sound convenient, but it must be avoided. Original source data must never be manipulated. It can only be flagged as invalid, or processed into new parameters. Overwrite of raw acquired data can never be allowed if the data will be used in the future for evidence of regulatory or government contract compliance.

ASCII Text Files

ASCII files are useful for data sharing and occasionally reporting, but they should never be used as a data storage format.

Positives:

Often requested if Excel is the only reporting tool available to the engineer, but the negatives will never make this an acceptable data storage file format.

Negatives:

Precision of the stored data must be pre-defined on a per-channel basis, not globally. Some channels will allot too many decimal places and take up too much memory, others not enough and precision will be lost. This is inevitable, either too much or too little precision. The most space efficient you can get is a format of %.4e or 1.2345E+3 for a typical parameter which is 9 ASCII characters including the delimiter. That is equivalent to 9 bytes or 72 bits, but with a precision roughly on par with a 16-bit float. So, ASCII data storage is probably only about 25% efficient at best, and allows for a high risk of recording with insufficient or excess precision, especially if floating point notation is used instead of scientific notation.
Like MATLAB native files, the ASCII text file format allows engineers to easily access the data file, manipulate that data, and save it without any record of the change. As such, ASCII text files are unacceptable for evidence of regulatory or government contract compliance.

This article is part of the Flight Test series:

Typical Data Rates per Hour

File Format Efficiency

Data Traceability

Flight Test Data Storage Options

LabVIEW Native TDMS

Positives:

Negatives:

Hierarchical Data Format, HDF5

Positives:

Negatives:

MATLAB Native .mat Files

Positives:

Negatives:

ASCII Text Files

Positives:

Negatives:

Project Success

MGI Products

Categories