Using the penman Command#

The penman command allows users to perform most reformatting tasks and predefined transformations without having to write any code. For example, the following reformats a graph in one line to a more conventional presentation:

$ penman --indent 3 --compact <<< '(s / sleep-01 :polarity - :ARG0 (i / i))'
(s / sleep-01 :polarity -
   :ARG0 (i / i))

The command becomes available at the command-line after installing Penman (see Installation and Setup). This guide will explain how to use the command for several use cases.

Command Usage#

The penman command reads in data from stdin or from one or more files then prints the results to stdout. By default, the command will do nothing but apply the default formatting to the graphs, but any input content that is not a graph or a metadata comment will be discarded. To see what features are available for the current version and how to call the command, run penman --help:

usage: penman [-h] [-V] [-v] [-q] [--model FILE | --amr | --noop] [--check]
              [--indent N] [--compact] [--triples] [--make-variables FMT]
              [--rearrange KEY] [--reconfigure KEY] [--canonicalize-roles]
              [--reify-edges] [--dereify-edges] [--reify-attributes]
              [--indicate-branches]
              [FILE [FILE ...]]

Read and write graphs in the PENMAN notation.

positional arguments:
  FILE                  read graphs from FILEs instead of stdin

optional arguments:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -v, --verbose         increase verbosity
  -q, --quiet           suppress output on <stdout> and <stderr>
  --model FILE          JSON model file describing the semantic model
  --amr                 use the AMR model
  --noop                use the no-op model
  --check               check graphs for compliance with the model

formatting options:
  --indent N            indent N spaces per level ("no" for no newlines)
  --compact             compactly print node attributes on one line
  --triples             print graphs as triple conjunctions

normalization options:
  --make-variables FMT  recreate node variables with FMT (e.g.: '{prefix}{j}')
  --rearrange KEY       reorder the branches of the tree
  --reconfigure KEY     reconfigure the graph layout with reordered triples
  --canonicalize-roles  canonicalize role forms
  --reify-edges         reify all eligible edges
  --dereify-edges       dereify all eligible edges
  --reify-attributes    reify all attributes
  --indicate-branches   insert triples to indicate tree structure

Reformatting#

There are two options for reformatting graphs that use newlines and indentation to make them more friendly to human eyes. The --indent option controls how much each nesting level indents and the --compact option determines whether attributes immediately following a concept appear on the same line as the concept or on their own lines. For this section, consider the following graph:

$ x="(w / want-01 :polarity - :ARG0 (c / child) :ARG1 (g / go :ARG0 c))"

Default Formatting#

By default, compact mode is off and --indent has the special value -1, which performs “adaptive indenting”. This appears as follows:

$ echo "$x" | penman
(w / want-01
   :polarity -
   :ARG0 (c / child)
   :ARG1 (g / go
            :ARG0 c))

Changing the Indentation#

Giving a specific indent number makes Penman always indent that number of spaces:

$ echo "$x" | penman --indent 3
(w / want-01
   :polarity -
   :ARG0 (c / child)
   :ARG1 (g / go
      :ARG0 c))

Compact Attributes#

Compact mode puts attributes on the same line as the concept of their node, but only if they appear in that position in the tree:

$ echo "$x" | penman --compact
(w / want-01 :polarity -
   :ARG0 (c / child)
   :ARG1 (g / go
            :ARG0 c))

Single-Line Graphs#

With --indent=no, Penman outputs a full graph on one line. This can be useful for programs that read data line-by-line or for creating bilingually aligned files:

$ echo "$x" | penman
(w / want-01
   :polarity -
   :ARG0 (c / child)
   :ARG1 (g / go
            :ARG0 c))
$ echo "$x" | penman | penman --indent=no
(w / want-01 :polarity - :ARG0 (c / child) :ARG1 (g / go :ARG0 c))

Note that --indent=0 is not the same as --indent=no. The former delimits parts with a single newline but no leading space whereas the latter delimits parts with a single space and no newlines. Also, the --compact option is relevant when --indent has a numeric value but not for --indent=no.

Specifying a Model#

While the formatting options do not require knowledge of the semantic model, others, such as --check and many transformations, do require it. For Abstract Meaning Representation (AMR) graphs, the --amr option uses the built-in AMR model:

$ penman --amr [...]

This model contains information about AMR’s valid roles, canonical role inversions (such as :domain to :mod), and relation reifications. Also available is the no-op model via --noop, which does not deinvert tree edges when interpreting the graph so that a role like :ARG0-of is the role used in the graph triples.

Other models can be given by using the --model option with a path to a JSON file containing the model information:

$ penman --model=xyz.json [...]

Custom models can be used for variations of AMR (e.g., different versions or task-specific definitions) or even for different semantic frameworks altogether.

Checking for Model Compliance#

With a model specified, a graph can be checked for compliance with respect to the model using the --check option. For graphs already in PENMAN notation, the only relevant test is whether a role is defined by the model. When graphs are constructed programatically, there are additional checks for graphical well-formedness, such as for an appropriate graph-top being set and for graph connectedness. When used as a command, the exit code of the command will be 0 when there are no errors or 1 when any errors are found. This helps make the check be scriptable. Also, the individual errors are inserted as metadata comments on each graph to help users resolve errors:

$ good="(s / swim-01 :ARG0 (i / i))"                          # I swim.
$ bad="(s / swim-01 :ARG0 (i / i) :stroke (b / backstroke))"  # I swim backstroke.
$ if ( echo "$good" | penman --amr --check ); then
>   echo "valid"
> else
>   echo "invalid"
> fi
(s / swim-01
   :ARG0 (i / i))
valid
$ if ( echo "$bad" | penman --amr --check ); then
>   echo "valid"
> else
>   echo "invalid"
> fi
# ::error-1 (s :stroke b) invalid role
(s / swim-01
   :ARG0 (i / i)
   :stroke (b / backstroke))
invalid

Transforming Graphs#

Penman’s transformations work either on the tree or the graph representation.

Relabeling Nodes#

The simplest transformation maps variables to a new form with the --make-variables option. In English AMR the variables use the first letter of the concept and, if it is not unique, the 1-based index starting from the second when traversing the tree in depth-first order. AMR’s primary evaluation tool smatch relabels all nodes internally so one side uses a0, a1, etc. and the other side uses b0, b1, etc. Penman allows users to specify the variable format with three template variables:

  • {prefix} uses the first character of a node’s concept

  • {i} is the 0-based index of a node’s occurrence

  • {j} is the 1-based index of a node’s occurrence, where index 1 is blank

Unlike the other transformations, --make-variables does not require a model:

$ original="(x0 / chase-01 :ARG0 (x1 / cat) :ARG1 (x2 / mouse))"
$ echo "$original" | penman --make-variables='a{i}'
(a0 / chase-01
    :ARG0 (a1 / cat)
    :ARG1 (a2 / mouse))
$ echo "$original" | penman --make-variables='{prefix}{j}'
(c / chase-01
   :ARG0 (c2 / cat)
   :ARG1 (m / mouse))

Rearranging Branches#

Tree branches can be rearranged without changing the overall tree structure using the --rearrange option. It takes the name of a method for sorting the branches on a node:

$ original="(c / chase-01 :ARG1 (m / mouse) :polarity - :ARG0 (c2 / cat))"
$ echo "$original" | penman --rearrange=attributes-first
(c / chase-01
   :polarity -
   :ARG1 (m / mouse)
   :ARG0 (c2 / cat))
$ echo "$original" | penman --rearrange=alphanumeric
(c / chase-01
   :ARG0 (c2 / cat)
   :ARG1 (m / mouse)
   :polarity -)

The sorting methods can be combined in prioritized order:

$ echo "$original" | penman --rearrange=attributes-first,alphanumeric
(c / chase-01
   :polarity -
   :ARG0 (c2 / cat)
   :ARG1 (m / mouse))

Reconfiguring the Tree#

In Penman, the epigraph is a side-channel of information that allows it to configure (reconstruct) the original tree that led to a graph representation. The --reconfigure option first discards this epigraphical information then configures the tree afresh, which may lead to more drastic restructuring than just rearranging tree branches. Like --rearrange, it takes a sorting method as its argument. Often it is helpful to use --rearrange with --reconfigure, so the reconfigured tree still follows an expected branch order:

$ original="(s / sell-01 :ARG0 (i / i) :ARG1 (b / book :ARG1-of (r / read :ARG0 i)))"
$ echo "$original" | penman
(s / sell-01
 :ARG0 (i / i)
 :ARG1 (b / book
          :ARG1-of (r / read
                      :ARG0 i)))
$ echo "$original" | penman --reconfigure=random --rearrange=alphanumeric
(s / sell-01
   :ARG0 (i / i
            :ARG0-of (r / read
                        :ARG1 (b / book)))
   :ARG1 b)

Note that --reconfigure does not change which variable is the graph’s top. This is because the resulting graph should encode the same information, and the top node is treated specially. For example, in AMR it is considered the focused node. A reconfigured graph will return a perfect score with the original using a metric like smatch.

Normalizations#

The remaining options are normalizations that may alter the content of the graph. The --canonicalize-roles option will replace roles that the model defines as equivalent, such as :domain-of and :mod in AMR:

$ echo "(c / chapter :domain-of 7)" | penman --amr --canonicalize-roles
(c / chapter
   :mod 7)

Penman can handle relations that are over-inverted one time, but does not check further than that. The --canonicalize-roles option will try harder to resolve over-inversions. For this functionality, a model is not strictly necessary unless the over-inverted role itself needs to be canonicalized:

$ echo "(b / bark-01 :ARG0-of-of (d / dog))" | penman
(b / bark-01
   :ARG0 (d / dog))
$ echo "(b / bark-01 :ARG0-of-of-of-of (d / dog))" | penman
(b / bark-01
   :ARG0-of-of (d / dog))
$ echo "(b / bark-01 :ARG0-of-of-of-of (d / dog))" | penman --canonicalize-roles
(b / bark-01
   :ARG0 (d / dog))

The --reify-edges option converts edges into nodes for edges that have a reification defined in the model:

$ echo "(c / chapter :mod 7)" | penman --amr --reify-edges
(c / chapter
   :ARG1-of (_ / have-mod-91
               :ARG2 7))

The _ (_2, etc.) variables indicate which have been reified. Combine with --make-variables to use standard variable names (e.g., h in this example). The --dereify-edges is the reverse of --reify-edges:

$ echo "(c / chapter :mod 7)" | penman --amr --reify-edges | penman --amr --dereify-edges
(c / chapter
   :mod 7)

The --reify-attributes option reifies attribute relations (those where the value is a constant) so the constant value becomes the concept of a new node:

$ echo "(c / chapter :mod 7)" | penman --amr --reify-attributes
(c / chapter
   :mod (_ / 7))

Finally, the --indicate-branches option inserts relations that hint at the original tree structure. This can be useful if a tool that produces PENMAN graphs, like an AMR parser, wants to use a tool like smatch to compare its output to gold trees and not just gold graphs.