How to write parser plugins
NOMAD uses parsers to convert raw code input and output files into NOMAD's common Archive format. This is the documentation on how to develop such a parser.
Getting started¶
Fork and clone the parser example project as described in before. Follow the original how-to on writing a parser.
Parser plugin metadata¶
A Parser describes a NOMAD parser that can be loaded as a plugin.
The parser itself is references via python_name
. For Parser instances python_name
must refer to a Python class that has a parse
function. The other properties are
used to create a MatchingParserInterface
. This comprises general metadata that
allows users to understand what the parser is, and metadata used to decide if a
given file "matches" the parser.
name | type | |
---|---|---|
name | str |
A short descriptive human readable name for the plugin. |
description | str |
A human readable description of the plugin. |
plugin_documentation_url | str |
The URL to the plugins main documentation page. |
plugin_source_code_url | str |
The URL of the plugins main source code repository. |
python_package | str |
Name of the python package that contains the plugin code and a plugin metadata file called nomad_plugin.yaml . |
plugin_type | str |
The type of the plugin. This has to be the string parser for parser plugins.default: parser |
parser_class_name | str |
The fully qualified name of the Python class that implements the parser. This class must have a function def parse(self, mainfile, archive, logger) . |
parser_as_interface | int |
By default the parser metadata from this config (and the loaded nomad_plugin.yaml) is used to instantiate a parser interface that is lazy loading the actual parser and performs the mainfile matching. If the parser interface matching based on parser metadata is not sufficient and you implemented your own is_mainfile parser method, this setting can be used to use the given parser class directly for parsing and matching.default: False |
mainfile_contents_re | str |
A regular expression that is applied the content of a potential mainfile. If this expression is given, the parser is only considered for a file, if the expression matches. |
mainfile_name_re | str |
A regular expression that is applied the name of a potential mainfile. If this expression is given, the parser is only considered for a file, if the expression matches.default: .* |
mainfile_mime_re | str |
A regular expression that is applied the mime type of a potential mainfile. If this expression is given, the parser is only considered for a file, if the expression matches.default: text/.* |
mainfile_binary_header | bytes |
Matches a binary file if the given bytes are included in the file. |
mainfile_binary_header_re | bytes |
Matches a binary file if the given binary regular expression bytes matches the file contents. |
mainfile_alternative | int |
If True, the parser only matches a file, if no other file in the same directory matches a parser.default: False |
mainfile_contents_dict | dict |
Is used to match structured data files like JSON or HDF5. |
supported_compressions | List[str] |
Files compressed with the given formats (e.g. xz, gz) are uncompressed and matched like normal files.default: [] |
domain | str |
The domain value dft will apply all normalizers for atomistic codes. Deprecated.default: dft |
level | int |
The order by which the parser is executed with respect to other parsers.default: 0 |