Event Extraction by Answering (Almost) Natural Questions

Introduction

The goal of event extraction is to create structured information from unstructured one.

So, basically, it tries to answer questions like: “What is happening?”, ”Who?”, “What?” is involved in something..

Let’s see a practical example to get a better understanding:

Input ( unstructured information ):

“As part of the 11-billion-dollar sale of USA Interactive's film and television operations to the French company and its parent in December 2001, Interactive USA received 2.5 billion dollars in preferred shares in Vivendi Universal Entertainment.”

Output ( structured information ):

Event type: Transaction - Transfer - Ownership

Trigger: sale

Buyer: French company

Seller: Interactive USA

Artifact: operations

The trigger is essentially the word from the unstructured data that gave away the event type ( or whatever made the system come to the current conclusion ).
The arguments here: Buyer, Seller, Artifact are all arguments that have a semantic value for the current event type: Transaction - Transfer - Ownership

The problems so far:

Previous approaches rely too much on entity ( entity example: French company ) information for the extraction of arguments. This means that they can use pre-trained models to identify entities ( such as persons, places, organisations and so on ) and only b that ( once we have the entities established ), we can assign arguments roles to them. The problem here is that if we misidentify the entity or we mislabel the semantic class of an entity it’s game over.
We need two steps: identify the entities and categorize them and after that assign the semantic class. If we get something wrong, the next step of identification will build up on the previous one so we now have error propagation.
We can’t exploit the similarities between arguments that may be related but are part of different event types.

To get a better understanding of the flow that is imposed by the previous approaches: They first identify the event trigger, then the entities ( with no semantics, just: French company and operations ), and then assign some semantic classes ( argument roles ) to them: Buyer, Seller, Artifact and so on.

Different approach

Let’s try a different approach: QA ( Question Answering )

We are going to use two BERTs:

One for detecting the trigger
One that will answer the questions and find out the artifacts ( with argument roles already attached ) of the event type

Improvements:

This approach does not need to identify any entities as a prerequirement.
The QA templates permit transferring knowledge between similar event types
It can also work on zero-shot settings ( data that was not included in the training datasets, meaning that it can extract arguments for events it has never seen before )

General structure:

The first BERT is used to extract a single token from the unstructured data and associate it to a type ( from a pre-defined set of event types ).
The second BERT will try to answer the questions and identify the artifacts of the event type.
A dynamic threshold is applied to only retain the candidates that are above it.

How do you ask the questions?

For identifying the trigger we can use one of the following:

What is trigger
Trigger
Action
Verb

So for Action ( the third option from the ones presented above ), it would be like this: [CLS] Action [SEP] As part of the 11 billion-dollar sale ... [SEP]

Note: [CLS] and [SEP] are part of the standard BERT-style format.

After that, for identifying the arguments we can use:

The role name ( like: agent, place ).
Use the WH question: “WHO? is the” for persons “WHERE? is the” for places.
Use a more natural sounding question described in the ACE (Automatic Content Extraction) annotation guidelines. Use the trigger itself in the questions: WHO is the person in <trigger>?

What models are there to answer the questions?

So, as said before, we use BERT for both trigger and arguments detection:

BERT_QA_Tigger
BERT_QA_Arg

First we decide and use a template ( for asking the questions ). It is then translated into the BERT-like sequences:

[CLS]<question>[SEP]<sentence>[SEP].

That then gets contextualized.

Now, the output layer of both QA models is different:

BERT_QA_Trigger predicts the event type for each token in the sentence (or None, if it is not an event trigger)

For the trigger prediction we use a parameter matrix Wtr that is defined on R (H x T) (where T is the total number of event types + 1 (for the non-trigger token)) and H is the hidden size of the transformer. We use a softmax layer to convert the logits into a multi-class probability ( for every event type basically ). Now, when trying to test on new, unseen data, we apply argmax on the probabilities to get the highest probability ( obviously 🙂).

BERT_QA_Args predicts the start and end offsets for the argument span (so the start and end index of the words that represent an argument)

We use 2 new matrices here: Ws( Weights start ), We( Weights end ), and then use softmax on them to convert the logits into a multi-class probability ( of a token (word) to be selected as a start / end span ).
When training we use the sum of the start token loss and the end token loss. We obviously try to minimize that.
When testing, this deed seems to be more complicated than expected: there are usually a lot or no spans to be selected from. That is why we use a dynamic threshold.

Now, let’s describe how we are going to get the arguments.

We will use 2 algorithms:

Harvest all valid argument spans candidates ( for each argument role ) Enumerate all possible combinations of start, end ( via 2 imbricated loops ). We then eliminate some of them that do not fit some conditions:

Start and end must be in the input sentence.
The length of the span itself must ( obviously 🙂) be smaller than the maximum allowed.
All arguments spans should have a higher probability than that of the [CLS] (no argument) token.
Calculate the relative score for the candidate span selected.

Filter out candidate spans:

Get a threshold.
Get rid of the spans that have a greater score than the previously computed threshold.

Experiments

Experiments were conducted on:

ACE 2005 corpus with 5272 event trigger and 9612 arguments ( fully annotated )

Evaluation

In order to evaluate, we consider the following:

A trigger is correctly identified if its offset match those of the gold-standard trigger and it is also correctly classified (33 total events)
An event argument is correctly identified if its semantic role is correctly identified (22 in total)

The performance of the argument extraction is directly impacted by the one of the trigger extraction.

Now, in order to better understand how the dynamic threshold impacts the performance of the framework, experiments were conducted with and without it:

Last row ( of Table 3 ) shows performance with it.
The one above, without. Also, this shows how good the last 2 question templates perform as well.

To accommodate for unseen roles, the following experiment was conducted:

80% of the argument roles were kept. 20% were removed and only seen in testing.

As seen in Table 5, the BERT_QA model is substantially better than other methods of event extraction.

In order to see the impact that different question forming templates have, experiments were conducted and the differences were not very high ( as seen in Table 6 ).

But, using the “in <trigger>” template, consistently improved the performance, and it makes sense, because it indicates where the trigger is in the sentence.

Because the template 3 is using descriptions for argument roles in the annotation guideline, it encodes more semantic information about the role, thus giving the best performance.

Error Analysis

Complex sentence structures seem to be problematic:

E.G: “[She] visited the store and [Tom] did too.”

Tom is not extracted as an entity ( only “She” is )

E.G: “Canadian authorities arrested two Vancouver-area men on Friday and charged them in the deaths of [329 passengers and crew members of an Air-India Boeing 747 that blew up over the Irish Sea in 1985, en route from Canada to London] “

The victim was not extracted in full ( 329 passengers and crew members of an Air-India Boeing 747 that blew up over the Irish Sea in 1985, en route from Canada to London )

Conclusion

Most methods go through these 3 steps:

Trigger detection
Entity recognition
Argument role assignment.

The presented framework skips the entity recognition stage.

So, in

“Apple announced the launch of its new iPhone in California”

a regular framework would identify:

Trigger: announced
Entities: Apple, iPhone, California
And then adding roles: Apple-Subject, iPhone-Object, California-Location.

The current QA framework skips directly to assigning roles with no entities step needed.

It identifies:

Trigger: announced
And based on a few guidelines it starts to ask questions like: What was announced? or Where the announcement was made? To then directly identify the semantic classes and the entities in one step.

sistemeinteligente2024

miercuri, 20 martie 2024