0% found this document useful (0 votes)
50 views

Dependency Parsing 2: CMSC 723 / LING 723 / INST 725

- Transition-based dependency parsing uses a stack and buffer to build dependency trees incrementally through transition operations like shift and reduce. It can be framed as a structured prediction problem. - Two main approaches are transition-based parsing, which predicts transitions, and graph-based parsing, which scores dependency graphs directly. - Transition-based parsers use a classifier to predict the next transition as an oracle. They are trained on examples generated by simulating gold trees. Features typically represent the top of the stack and buffer.

Uploaded by

Ritik Bh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views

Dependency Parsing 2: CMSC 723 / LING 723 / INST 725

- Transition-based dependency parsing uses a stack and buffer to build dependency trees incrementally through transition operations like shift and reduce. It can be framed as a structured prediction problem. - Two main approaches are transition-based parsing, which predicts transitions, and graph-based parsing, which scores dependency graphs directly. - Transition-based parsers use a classifier to predict the next transition as an oracle. They are trained on examples generated by simulating gold trees. Features typically represent the top of the stack and buffer.

Uploaded by

Ritik Bh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Dependency

Parsing 2
CMSC 723 / LING 723 / INST 725

Marine Carpuat

Fig credits: Joakim Nivre, Dan


Jurafsky & James Martin
Dependency Parsing
• Formalizing dependency trees

• Transition-based dependency parsing


• Shift-reduce parsing
• Transition system
• Oracle
• Learning/predicting parsing actions
Data-driven dependency parsing
Goal: learn a good predictor of dependency graphs
Input: sentence
Output: dependency graph/tree G = (V,A)

Can be framed as a structured prediction task


- very large output space
- with interdependent labels

2 dominant approaches: transition-based parsing and graph-based


parsing
Transition-based dependency parsing
• Builds on shift-reduce parsing
[Aho & Ullman, 1927]

• Configuration
• Stack
• Input buffer of words
• Set of dependency relations

• Goal of parsing
• find a final configuration where
• all words accounted for
• Relations form dependency tree
Transition operators
• Transitions: produce a new • Start state
configuration given current • Stack initialized with ROOT node
configuration • Input buffer initialized with words
in sentence
• Dependency relation set = empty
• Parsing is the task of
• Finding a sequence of transitions
• End state
• That leads from start state to
desired goal state • Stack and word lists are empty
• Set of dependency relations = final
parse
Arc Standard Transition System
• Defines 3 transition operators [Covington, 2001; Nivre 2003]
• LEFT-ARC:
• create head-dependent rel. between word at top of stack and 2nd word
(under top)
• remove 2nd word from stack
• RIGHT-ARC:
• Create head-dependent rel. between word on 2nd word on stack and word on
top
• Remove word at top of stack
• SHIFT
• Remove word at head of input buffer
• Push it on the stack
Arc standard transition systems
• Preconditions
• ROOT cannot have incoming arcs
• LEFT-ARC cannot be applied when ROOT is the 2nd element in stack
• LEFT-ARC and RIGHT-ARC require 2 elements in stack to be applied
Transition-based Dependency Parser
• Assume an oracle

• Parsing complexity
• Linear in sentence
length!

• Greedy algorithm
• Unlike Viterbi for POS
tagging
Transition-Based Parsing Illustrated
Where to we get an oracle?
• Multiclass classification problem
• Input: current parsing state (e.g., current and previous configurations)
• Output: one transition among all possible transitions
• Q: size of output space?

• Supervised classifiers can be used


• E.g., perceptron
• Open questions
• What are good features for this task?
• Where do we get training examples?
Generating Training Examples
• What we have in a treebank • What we need to train an oracle
• Pairs of configurations and
predicted parsing action
Generating training examples
• Approach: simulate parsing to generate reference tree

• Given
• A current config with stack S, dependency relations Rc
• A reference parse (V,Rp)
• Do
Let’s try it out
Features
• Configuration consist of stack, buffer, current set of relations

• Typical features
• Features focus on top level of stack
• Use word forms, POS, and their location in stack and buffer
Features example
• Given configuration • Example of useful features
Features example
Research highlight:
Dependency parsing with stack-LSTMs
• From Dyer et al. 2015: http://www.aclweb.org/anthology/P15-1033

• Idea
• Instead of hand-crafted feature
• Predict next transition using recurrent neural networks to learn
representation of stack, buffer, sequence of transitions
Research highlight:
Dependency parsing with stack-LSTMs
Research highlight:
Dependency parsing with stack-LSTMs
Alternate Transition Systems
Note: A different way of writing arc-standard
transition system
A weakness of arc-standard parsing

Right dependents cannot be attached to their head


until all their dependents have been attached
Arc Eager Parsing
• LEFT-ARC:
• Create head-dependent rel. between word at front of buffer and word at top of
stack
• pop the stack
• RIGHT-ARC:
• Create head-dependent rel. between word on top of stack and word at front of
buffer
• Shift buffer head to stack
• SHIFT
• Remove word at head of input buffer
• Push it on the stack
• REDUCE
• Pop the stack
Arc Eager Parsing Example
Trees & Forests
• A dependency forest (here) is a dependency graph satisfying
• Root
• Single-Head
• Acyclicity
• but not Connectedness
Properties of this transition-based
parsing algorithm

- Correctness
- For every complete transition sequence, the resulting graph is a projective
dependency forest (soundness)
- For every projective dependency forest G, there is a transition sequence that
generates G (completeness)

- Trick: forest can be turned into tree by adding links to ROOT0


Dealing with
non-projectivity
Projectivity
• Arc from head to dependent is projective
• If there is a path from head to every word between head and
dependent

• Dependency tree is projective


• If all arcs are projective
• Or equivalently, if it can be drawn with no crossing edges

• Projective trees make computation easier


• But most theoretical frameworks do not assume projectivity
• Need to capture long-distance dependencies, free word order
Arc-standard parsing can’t produce non-
projective trees
How frequent are non-projective structures?
• Statistics from CoNLL shared task
• NPD = non projective dependencies
• NPS = non projective sentences
How to deal with non-projectivity?
(1) change the transition system

• Add new transitions


• That apply to 2nd word of the stack
• Top word of stack is treated as context

[Attardi 2006]
How to deal with non-projectivity?
(2) pseudo-projective parsing
Solution:
• “projectivize” a non-projective tree by creating
new projective arcs
• That can be transformed back into non-projective
arcs in a post-processing step
How to deal with non-projectivity?
(2) pseudo-projective parsing
Solution:
• “projectivize” a non-projective tree by creating
new projective arcs
• That can be transformed back into non-projective
arcs in a post-processing step
Graph-based parsing
Graph concepts refresher
Directed Spanning Trees
Maximum Spanning Tree
• Assume we have an arc factored model
i.e. weight of graph can be factored as sum or product of weights of its arcs

• Chu-Liu-Edmonds algorithm can find the maximum spanning tree for


us!
• Greedy recursive algorithm
• Naïve implementation: O(n^3)
Chu-Liu-Edmonds illustrated
Chu-Liu-Edmonds illustrated
Chu-Liu-Edmonds illustrated
Chu-Liu-Edmonds illustrated
Chu-Liu-Edmonds illustrated
Arc weights as linear classifiers
Example of classifier features
How to score a graph G
using features?
Arc-factored model By definition of arc weights
assumption as linear classifiers
How can we learn
the classifier from data?
Dependency Parsing: what you should know
• Formalizing dependency trees

• Transition-based dependency parsing


• Shift-reduce parsing
• Transition system: arc standard, arc eager
• Oracle
• Learning/predicting parsing actions

• Graph-based dependency parsing

• A flexible framework that allows many extensions


• RNNs vs feature engineering, non-projectivity
Extension: dynamic oracle
Problem with standard classifier-based oracle:
- It is “static”
- ie tied to optimal config sequence that produces gold tree
- What if there are multiple sequences for a single gold tree?
- How can we recover if the parser deviates from gold sequence?

One solution: “dynamic oracle” [Goldberg & Nivre 2012]

See also Locally Optimal Learning to Search [Chang et al. ICML 2015]
Extension: dynamic oracle
Problem with standard

See [Goldberg & Nivre 2012] for details

You might also like