Skip to content

ratal/mdfreader

Repository files navigation

MDFREADER


Abstract:

This module imports MDF files (Measured Data Format V3.x and V4.x), typically from INCA (ETAS), CANape or CANoe. It is widely used in the automotive industry to record data from ECUs. The main module mdfreader.py inherits from two module pairs (one per MDF version): the first reads the file's block structure (mdfinfoX), and the second reads the raw data (mdfXreader). It can optionally run multithreaded and was designed for efficient batch processing of large endurance-evaluation files for data mining.

Performance:

When Cython is available (strongly recommended), mdfreader uses several low-level optimisations:

  • Fast CN/CC/SI/TX metadata reader (read_cn_chain_fast in dataRead.pyx): walks the entire MDF4 channel linked list in a single Cython function using POSIX pread() (no Python file-object dispatch, no GIL during I/O) and C packed-struct memcpy parsing. A fast <TX>…</TX> bytes scan replaces lxml.objectify for the common MD-block pattern (~95% of files). Result: 3–4× speedup on large files compared to the pure-Python path.

  • SymBufReader: a Cython bidirectional-buffered wrapper around the raw file object. MDF4 metadata blocks are linked by backward-pointing pointers; SymBufReader keeps a 64 KB buffer centred on the current position so that most seeks are served from cache without a kernel read().

  • Vectorised data reading: sorted channel groups are read in a single readinto() call into a flat uint8 buffer that is then reinterpreted as a structured record array — zero copies, no per-chunk Python loop.

Typical timings on a 184 MB / 36 000-channel MDF4 file:

Scenario Time
Pure Python path ~1.9 s
v4.2 with Cython ~1.9 s
v4.3 (this version) ~0.6 s

The structure of the mdf object inheriting from python dict

For each channel mdf[channelName] the following keys exist:

Key Description
data numpy array of channel values
unit unit string
master name of the master (time/angle/…) channel
masterType master channel type: 0=None, 1=Time, 2=Angle, 3=Distance, 4=Index
description channel description string
conversion present when convert_after_read=False; dict describing raw→physical mapping

mdf.masterChannelList is a dict mapping each master channel name to the list of channels sampled at the same raster.

Mdfreader module methods:

  • resample channels to one sampling frequency
  • merge files
  • plot one channel, several channels on one graph (list) or several channels on subplots (list of lists)

It is also possible to export mdf data into:

  • CSV file (Excel dialect by default)
  • NetCDF file for compatibility with Uniplot (needs netcdf4, Scientific.IO)
  • HDF5 (needs h5py)
  • Excel 95–2003 (needs xlwt — very slow for large files)
  • Excel 2007/2010 (needs openpyxl — can also be slow with large files)
  • Matlab .mat (needs hdf5storage)
  • MDF file — allows creating, converting or modifying data, units and descriptions
  • Pandas DataFrame(s) (command line only, not in mdfconverter) — one DataFrame per raster

Compatibility:

Python 3.9+ — tested on Linux and Windows (x86-64)

Requirements:

Core: numpy, lxml, sympy

lxml is used for MDF4 metadata XML blocks. When Cython is compiled, the fast path handles the common <TX>…</TX> pattern directly from bytes and only falls back to lxml for complex XML (CDATA, namespaces).

Reading channels defined by a formula requires sympy.

Cython is strongly advised. It compiles dataRead.pyx, which provides:

  • fast metadata parsing via pread() + C packed structs
  • the SymBufReader bidirectional file buffer
  • bit-exact reading for non-byte-aligned or record-padded channels
  • VLSD/VLSC string data reading helpers

If Cython compilation fails, bitarray is used as a fallback (slower, pure Python).

Export requirements (optional): scipy, h5py, hdf5storage, openpyxl, pandas, fastparquet

Data compression in memory (optional): blosc

Graphical converter: PyQt5

Installation:

From PyPI:

pip install mdfreader

From source:

pip install cython numpy        # build prerequisites
python setup.py build_ext --inplace
python setup.py develop

Graphical interface: mdfconverter

A PyQt5 GUI to convert batches of files. Launch with:

mdfconverter

Right-click a channel in the list to plot it. Channels can be dragged between columns. A .lab channel-list file can be imported. Multiple files can be merged into one and resampled.

Memory-saving options:

For large files or limited memory:

  • Channel list only — pass channel_list=['ch1', 'ch2']; call mdfreader.MdfInfo(file) to get the full channel list without loading data.
  • Raw data mode — pass convert_after_read=False; data stays as stored in the MDF file and is converted on-the-fly by get_channel_data, plot, export_to_*, etc.
  • Blosc compression — pass compression=True (default level 9) to compress data in memory after reading.
  • No-data skeleton — pass no_data_loading=True to build the channel metadata dict without reading any samples; data is fetched on demand via get_channel_data.

For data visualisation, a dataPlugin for Veusz (≥ 1.16) is also available; follow the instructions in Veusz's documentation and the plugin file's header.

Command example in ipython:

    import mdfreader
    # loads whole mdf file content in yop mdf object.
    yop=mdfreader.Mdf('NameOfFile')
    # you can print file content in ipython with a simple:
    yop
    # alternatively, for max speed and smaller memory footprint, read only few channels
    yop=mdfreader.Mdf('NameOfFile', channel_list=['channel1', 'channel2'], convert_after_read=False)
    # also possible to keep data compressed for small memory footprint, using Blosc module
    yop=mdfreader.Mdf('NameOfFile', compression=True)
    # for interactive file exploration, possible to read the file but not its data to save memory
    yop=mdfreader.Mdf('NameOfFile', no_data_loading=True) # channel data will be loaded from file if needed
    # parsing xml metadata from mdf4.x for many channels can take more than just reading data.
    # You can reduce to minimum metadata reading with below argument (no source information, attachment, etc.) 
    yop=mdfreader.Mdf('NameOfFile', metadata=0)  # 0: full, 2: minimal
    # only for mdf4.x, you can search for the mdf key of a channel name that can have been recorded by different sources
    yop.get_channel_name4('channelName', 'source path or name')  # returns list of mdf keys
    # to yield one channel and keep its content in mdf object
    yop.get_channel('channelName')
    # to yield one channel numpy array
    yop.get_channel_data('channelName')
    # to get file mdf version
    yop.MDFVersionNumber
    # to get file structure or attachments, you can create a mdfinfo instance
    info=mdfreader.MdfInfo()
    info.list_channels('NameOfFile') # returns only the list of channels
    info.read_info('NameOfFile') # complete file structure object
    yop.info # same class is stored in mdfreader class
    # to list channels names after reading
    yop.keys()
    # to list channels names grouped by raster, below dict mdf attribute contains
    # pairs (key=masterChannelName : value=listOfChannelNamesForThisMaster)
    yop.masterChannelList
    # quick plot or subplot (with lists) of channel(s)
    yop.plot(['channel1',['channel2','channel3']])
    # file manipulations
    yop.resample(0.1)
    # or
    yop.resample(master_channel='master3')
    # keep only data between begin and end
    yop.cut(begin=10, end=15)
    # export to other file formats :
    yop.export_to_csv(sampling=0.01)
    yop.export_to_NetCDF()
    yop.export_to_hdf5()
    yop.export_to_matlab()
    yop.export_to_xlsx()
    yop.export_to_parquet()
    # return pandas dataframe from master channel name
    yop.return_pandas_dataframe('master_channel_name')
    # converts data groups into pandas dataframes and keeps it in mdf object
    yop.convert_to_pandas()
    # drops all the channels except the one in argument
    yop.keep_channels({'channel1','channel2','channel3'})
    # merge 2 files
    yop2=mdfreader.Mdf('NameOfFile_2')
    yop.merge_mdf(yop2)
    # can write mdf file after modifications or creation from scratch
    # write4 and write3 also allow to convert file versions
    yop.write('NewNameOfFile')  # write in same version as original file after modifications
    yop.write4('NameOfFile', compression=True)  # write mdf version 4.1 file, data compressed
    yop.write3()  # write mdf version 3 file
    yop.attachments  # to get attachments, embedded or paths to files 
Coverity Scan Build Status

About

Read Measurement Data Format (MDF) versions 3.x and 4.x file formats in python

Topics

Resources

License

Unknown, GPL-3.0 licenses found

Licenses found

Unknown
LICENSE
GPL-3.0
COPYING

Stars

Watchers

Forks

Packages