PENMAN notation, originally called Sentence Plan Notation in the PENMAN project ([KAS1989]), is a serialization format for the directed, rooted graphs used to encode semantic dependencies, most notably in the Abstract Meaning Representation (AMR) framework. It looks similar to Lisp’s S-Expressions in using parentheses to indicate nested structures. For example, here is an AMR for “He drives carelessly.”:
(d / drive-01 :ARG0 (h / he) :manner (c / care-04 :polarity -))
Described below are a breakdown of the parts of the PENMAN graph above as well as a formal grammar description of PENMAN graphs in general.
The following diagram explains what each part of the graph above is:
; ┌────────────────────────── Variable (this one is the graph's top) ; │ ┌──────────────────── Instance relation ; ┴ ────┴───── (d / drive-01 ; ┬ ───┬──── ; | └─────────────────── Concept (node label) ; └──────────────────────── Indicates the node's concept ; ┌────────────────── Edge relation ; ──────┴─────── :ARG0 (h / he) ; ──┬── ; └────────────────────── Role (edge label) :manner (c / care-04 ; ┌──────── Attribute relation ; ─────┴───── :polarity -)) ; ┬ ; └─── Atom (or "constant")
The linearized form can only describe projective structures such as
trees, so in order to capture non-projective graphs, nodes get
identifiers (called variables; e.g.,
which can be referred to later to establish a reentrancy.
<node> ::= '(' <id> '/' <node-label> <edge>* ')' <edge> ::= ':'<edge-label> (<const>|<id>|<node>)
A more complete description is given by the following PEG grammar. In addition to being more complete, it also extends the grammar to allow for surface alignments.
# Syntactic productions (whitespace is allowed around non-terminals) Start <- Node Node <- '(' Variable NodeLabel? Relation* ')' NodeLabel <- '/' Concept Alignment? Concept <- Constant Relation <- Role Alignment? (Node / Atom Alignment?) Atom <- Variable / Constant Constant <- String / Symbol Variable <- Symbol # Lexical productions (whitespace is not allowed) Symbol <- NameChar+ Role <- ':' NameChar* Alignment <- '~' ([a-zA-Z] '.'?)? Digit+ (',' Digit+)* String <- '"' (!'"' ('\\' . / .))* '"' NameChar <- ![ \n\t\r\f\v()/:~] . Digit <- [0-9]
This grammar has some seemingly unnecessary ambiguity in that both the
Constant alternatives for
Atom can resolve to
Symbol, but it is written this way to accommodate syntax variants
that further restrict the form of variables. Also, the distinction
between edge relations and attribute relations is semantic: if the
target of a relation is the variable of some other node, then it is an
edge, otherwise it is an attribute.
Note that the implementation in the Penman package deviates from this
grammar in that the
Alignment production is not parsed together
with the rest of the structure. Instead, the
~ character is
NameChar and alignments are thus part of the
Atom tokens. They are later detected and extracted during
graph interpretation (see
Robert T. Kaspar. A Flexible Interface for Linking Applications to Penman’s Sentence Generator. Speech and Natural Language: Proceedings of a Workshop Held at Philadelphia, Pennsylvania. http://www.aclweb.org/anthology/H89-1022. February 21-23, 1989.
Michael Wayne Goodman. AMR Normalization for Fairer Evaluation. Proceedings of the 33rd Pacific Asia Conference on Language, Information, and Computation (PACLIC 33). https://arxiv.org/pdf/1909.01568.pdf. 2019.