How to write parser plugins

NOMAD uses parsers to convert raw code input and output files into NOMAD's common Archive format. This is the documentation on how to develop such a parser.

Getting started¶

Fork and clone the parser example project as described in before. Follow the original how-to on writing a parser.

Parser plugin metadata¶

A Parser describes a NOMAD parser that can be loaded as a plugin.

The parser itself is references via python_name. For Parser instances python_name must refer to a Python class that has a parse function. The other properties are used to create a MatchingParserInterface. This comprises general metadata that allows users to understand what the parser is, and metadata used to decide if a given file "matches" the parser.

name	type
name	`str`	A short descriptive human readable name for the plugin.
description	`str`	A human readable description of the plugin.
plugin_documentation_url	`str`	The URL to the plugins main documentation page.
plugin_source_code_url	`str`	The URL of the plugins main source code repository.
python_package	`str`	Name of the python package that contains the plugin code and a plugin metadata file called `nomad_plugin.yaml`.
plugin_type	`str`	The type of the plugin. This has to be the string `parser` for parser plugins. default: `parser`
parser_class_name	`str`	The fully qualified name of the Python class that implements the parser. This class must have a function `def parse(self, mainfile, archive, logger)`.
parser_as_interface	`int`	By default the parser metadata from this config (and the loaded nomad_plugin.yaml) is used to instantiate a parser interface that is lazy loading the actual parser and performs the mainfile matching. If the parser interface matching based on parser metadata is not sufficient and you implemented your own is_mainfile parser method, this setting can be used to use the given parser class directly for parsing and matching. default: `False`
mainfile_contents_re	`str`	A regular expression that is applied the content of a potential mainfile. If this expression is given, the parser is only considered for a file, if the expression matches.
mainfile_name_re	`str`	A regular expression that is applied the name of a potential mainfile. If this expression is given, the parser is only considered for a file, if the expression matches. default: `.*`
mainfile_mime_re	`str`	A regular expression that is applied the mime type of a potential mainfile. If this expression is given, the parser is only considered for a file, if the expression matches. default: `text/.*`
mainfile_binary_header	`bytes`	Matches a binary file if the given bytes are included in the file.
mainfile_binary_header_re	`bytes`	Matches a binary file if the given binary regular expression bytes matches the file contents.
mainfile_alternative	`int`	If True, the parser only matches a file, if no other file in the same directory matches a parser. default: `False`
mainfile_contents_dict	`dict`	Is used to match structured data files like JSON or HDF5.
supported_compressions	`List[str]`	Files compressed with the given formats (e.g. xz, gz) are uncompressed and matched like normal files. default: `[]`
domain	`str`	The domain value `dft` will apply all normalizers for atomistic codes. Deprecated. default: `dft`
level	`int`	The order by which the parser is executed with respect to other parsers. default: `0`