Dependency Parsing 2: CMSC 723 / LING 723 / INST 725
Dependency Parsing 2: CMSC 723 / LING 723 / INST 725
Parsing 2
CMSC 723 / LING 723 / INST 725
Marine Carpuat
• Configuration
• Stack
• Input buffer of words
• Set of dependency relations
• Goal of parsing
• find a final configuration where
• all words accounted for
• Relations form dependency tree
Transition operators
• Transitions: produce a new • Start state
configuration given current • Stack initialized with ROOT node
configuration • Input buffer initialized with words
in sentence
• Dependency relation set = empty
• Parsing is the task of
• Finding a sequence of transitions
• End state
• That leads from start state to
desired goal state • Stack and word lists are empty
• Set of dependency relations = final
parse
Arc Standard Transition System
• Defines 3 transition operators [Covington, 2001; Nivre 2003]
• LEFT-ARC:
• create head-dependent rel. between word at top of stack and 2nd word
(under top)
• remove 2nd word from stack
• RIGHT-ARC:
• Create head-dependent rel. between word on 2nd word on stack and word on
top
• Remove word at top of stack
• SHIFT
• Remove word at head of input buffer
• Push it on the stack
Arc standard transition systems
• Preconditions
• ROOT cannot have incoming arcs
• LEFT-ARC cannot be applied when ROOT is the 2nd element in stack
• LEFT-ARC and RIGHT-ARC require 2 elements in stack to be applied
Transition-based Dependency Parser
• Assume an oracle
• Parsing complexity
• Linear in sentence
length!
• Greedy algorithm
• Unlike Viterbi for POS
tagging
Transition-Based Parsing Illustrated
Where to we get an oracle?
• Multiclass classification problem
• Input: current parsing state (e.g., current and previous configurations)
• Output: one transition among all possible transitions
• Q: size of output space?
• Given
• A current config with stack S, dependency relations Rc
• A reference parse (V,Rp)
• Do
Let’s try it out
Features
• Configuration consist of stack, buffer, current set of relations
• Typical features
• Features focus on top level of stack
• Use word forms, POS, and their location in stack and buffer
Features example
• Given configuration • Example of useful features
Features example
Research highlight:
Dependency parsing with stack-LSTMs
• From Dyer et al. 2015: http://www.aclweb.org/anthology/P15-1033
• Idea
• Instead of hand-crafted feature
• Predict next transition using recurrent neural networks to learn
representation of stack, buffer, sequence of transitions
Research highlight:
Dependency parsing with stack-LSTMs
Research highlight:
Dependency parsing with stack-LSTMs
Alternate Transition Systems
Note: A different way of writing arc-standard
transition system
A weakness of arc-standard parsing
- Correctness
- For every complete transition sequence, the resulting graph is a projective
dependency forest (soundness)
- For every projective dependency forest G, there is a transition sequence that
generates G (completeness)
[Attardi 2006]
How to deal with non-projectivity?
(2) pseudo-projective parsing
Solution:
• “projectivize” a non-projective tree by creating
new projective arcs
• That can be transformed back into non-projective
arcs in a post-processing step
How to deal with non-projectivity?
(2) pseudo-projective parsing
Solution:
• “projectivize” a non-projective tree by creating
new projective arcs
• That can be transformed back into non-projective
arcs in a post-processing step
Graph-based parsing
Graph concepts refresher
Directed Spanning Trees
Maximum Spanning Tree
• Assume we have an arc factored model
i.e. weight of graph can be factored as sum or product of weights of its arcs
See also Locally Optimal Learning to Search [Chang et al. ICML 2015]
Extension: dynamic oracle
Problem with standard