PENMAN Notation¶
PENMAN notation, originally called Sentence Plan Notation in the PENMAN project ([KAS1989]), is a serialization format for the directed, rooted graphs used to encode semantic dependencies, most notably in the Abstract Meaning Representation (AMR) framework. It looks similar to Lisp’s S-Expressions in using parentheses to indicate nested structures. For example, here is an AMR for “He drives carelessly.”:
(d / drive-01
:ARG0 (h / he)
:manner (c / care-04
:polarity -))
Let’s break that down a bit:
; ┌────────────────────────── Variable (this one is the graph's top)
; │ ┌──────────────────────── Indicates the node's concept
; │ │ ┌─────────────────── Concept (node label)
; ┴ ┴ ───┴────
(d / drive-01
; ┌────────────────── Edge relation
; ──────┴───────
:ARG0 (h / he)
; ──┬──
; └────────────────────── Role (edge label)
:manner (c / care-04
; ┌──────── Attribute relation
; ─────┴─────
:polarity -))
; ┬
; └─── Atom (or "constant")
The linearized form can only describe projective structures such as
trees, so in order to capture non-projective graphs, nodes get
identifiers (called variables; e.g., d
, h
, and c
above)
which can be referred to later to establish a reentrancy.
PENMAN notation can be very roughly described with the following BNF grammar (from [GOO2019]):
<node> ::= '(' <id> '/' <node-label> <edge>* ')'
<edge> ::= ':'<edge-label> (<const>|<id>|<node>)
A more complete description is given by the following PEG grammar. In addition to being more complete, it also extends the grammar to allow for surface alignments.
# Syntactic productions (whitespace is allowed around non-terminals)
Start <- Node
Node <- '(' Variable NodeLabel? Relation* ')'
NodeLabel <- '/' Concept Alignment?
Concept <- Atom
Relation <- Role Alignment? (Node / Atom Alignment?)
Atom <- Variable / Constant
Constant <- String / Float / Integer / Symbol
Variable <- Symbol
# Lexical productions (whitespace is not allowed)
Symbol <- NameChar+
Role <- ':' NameChar*
Alignment <- '~' ([a-zA-Z] '.'?)? Digit+ (',' Digit+)*
String <- '"' (!'"' ('\\' . / .))* '"'
Float <- Decimal Exponent? / Integer Exponent
Decimal <- [-+]? (Digit+ '.' Digit* / '.' Digit+ )
Exponent <- [eE] Integer
Integer <- [-+]? Digit+
NameChar <- ![ \n\t\r\f\v()/,:~] .
Digit <- [0-9]
This grammar has some seemingly unnecessary ambiguity in that both the
Variable
and Constant
alternatives for Atom
can resolve to
Symbol
, but it is written this way to accommodate syntax variants
that further restrict the form of variables. Also, the distinction
between edge relations and attribute relations is semantic: if the
target of a relation is the variable of some other node, then it is an
edge, otherwise it is an attribute.
- KAS1989
Robert T. Kaspar. A Flexible Interface for Linking Applications to Penman’s Sentence Generator. Speech and Natural Language: Proceedings of a Workshop Held at Philadelphia, Pennsylvania. http://www.aclweb.org/anthology/H89-1022. February 21-23, 1989.
- GOO2019
Michael Wayne Goodman. AMR Normalization for Fairer Evaluation. Proceedings of the 33rd Pacific Asia Conference on Language, Information, and Computation (PACLIC 33). https://arxiv.org/pdf/1909.01568.pdf. 2019.