Skip to content

How to write parser plugins

NOMAD uses parsers to convert raw code input and output files into NOMAD's common Archive format. This is the documentation on how to develop such a parser.

Getting started

Fork and clone the parser example project as described in before. Follow the original how-to on writing a parser.

Parser plugin metadata

A Parser describes a NOMAD parser that can be loaded as a plugin.

The parser itself is references via python_name. For Parser instances python_name must refer to a Python class that has a parse function. The other properties are used to create a MatchingParserInterface. This comprises general metadata that allows users to understand what the parser is, and metadata used to decide if a given file "matches" the parser.

name type
name str A short descriptive human readable name for the plugin.
description str A human readable description of the plugin.
plugin_documentation_url str The URL to the plugins main documentation page.
plugin_source_code_url str The URL of the plugins main source code repository.
python_package str Name of the python package that contains the plugin code and a plugin metadata file called nomad_plugin.yaml.
plugin_type str The type of the plugin. This has to be the string parser for parser plugins.
default: parser
parser_class_name str The fully qualified name of the Python class that implements the parser. This class must have a function def parse(self, mainfile, archive, logger).
parser_as_interface int By default the parser metadata from this config (and the loaded nomad_plugin.yaml) is used to instantiate a parser interface that is lazy loading the actual parser and performs the mainfile matching. If the parser interface matching based on parser metadata is not sufficient and you implemented your own is_mainfile parser method, this setting can be used to use the given parser class directly for parsing and matching.
default: False
mainfile_contents_re str A regular expression that is applied the content of a potential mainfile. If this expression is given, the parser is only considered for a file, if the expression matches.
mainfile_name_re str A regular expression that is applied the name of a potential mainfile. If this expression is given, the parser is only considered for a file, if the expression matches.
default: .*
mainfile_mime_re str A regular expression that is applied the mime type of a potential mainfile. If this expression is given, the parser is only considered for a file, if the expression matches.
default: text/.*
mainfile_binary_header bytes Matches a binary file if the given bytes are included in the file.
mainfile_binary_header_re bytes Matches a binary file if the given binary regular expression bytes matches the file contents.
mainfile_alternative int If True, the parser only matches a file, if no other file in the same directory matches a parser.
default: False
mainfile_contents_dict dict Is used to match structured data files like JSON or HDF5.
supported_compressions List[str] Files compressed with the given formats (e.g. xz, gz) are uncompressed and matched like normal files.
default: []
domain str The domain value dft will apply all normalizers for atomistic codes. Deprecated.
default: dft
level int The order by which the parser is executed with respect to other parsers.
default: 0