Conjunctions in named entities

A recent article on lingpipe discussed conjuncted named entities such as Johnson and Johnson and Wallace and Gromit. They suggest that maybe a way of treating this is as a frozen expression. I assume that means relying on statistical measures to see this Multi-Word-Expression repeating enough times to be treated as a unit.

In the United Nations corpus, things can get even more interesting. Let’s look at a relatively easy example: draft resolution A/56/L.28 and Add.1.

Is this a one document (one draft resolution) or two? And if two, then which two? The first one is obviously A/56/L.28. But Add.1 is not a valid document symbol, it is actually an (additive?) coreference to the first one and resolves to A/56/L.28/Add.1?

The answer (as good as I can make it so far) could lie in FRBR distinction between Expression and Manifestation. A resolution is an expression of Member States’s proposals and negotiations. To some degree, it evolves over several meetings. However between the discussions, the latest version or changes need to be reported to make sure they are formally registered and also to ensure the next round of discussions could have latest documents to work from.

In our case, the first time the draft resolution had to be presented it was published under A/56/L.28 (which incidentally means a limited distribution document 28 of the General Assembly’s 56th regular session). So, the initial Manifestation of the draft resolution became this physical document with a distinct symbol assigned.

But apart from its text, draft resolution has a list of sponsoring Member States. That list can change as draft resolution gains sponsors. These additional sponsors were in the Addendum A/56/L.28/Add.1. But the addendum does not make sense without the original document, so actually both physical documents represent one logical draft resolution, which is reflected in the grammar of the text (draft resolution, not resolutions).

What this means for named entity annotations and for recognition algorithms is hard to say and is something I am looking at with my PhD research.

3 thoughts on “Conjunctions in named entities”

  1. I posted a comment on the LingPipe blog entry. I agree with you that the

    draft resolution A/56/L.28 and Add.1

    refers to one object, namely a resolution, that can be voted on.
    They could have written

    draft resolution A/56/L.28 as modified by Add.1

    and it would have meant the same, but that is not necessary, because shared knowledge of the conventions for resolutions makes the “and” have the same effect.

  2. Thanks for the comment Chris. Original article is a good place for that discussion. I will add a comment there.

  3. What does it mean to treat “Wallace and Gromit” as a unit ? If Wallace and Gromit have a Grand Day Out, we can infer that Wallace had a Grand Day Out. If Wallace and Gromit lifted a piano, ate a pound of Wensleydale cheese, or played a duet, we can’t infer that Wallace lifted a piano, ate a pound of Wensleydale cheese, or sang a duet. “Wallace and Gromit” does not name one thing, composed of a (clay) man and a (clay) dog, a thing that has a dog as a part. Nor is it the name of two things (which would have to be Wallace and Gromit, and neither has this name, so both don’t). “Wallace and Gromit” is a plural referring expression.

Comments are closed.