10 November 2008

Names on Nodes: Entities

Here's a UML diagram of the latest class schema for the core Names on Nodes entities:

The white arrows indicate inheritance, i.e., "is-a" relationships. For example, a PhyloDefinition is a type of Definition. The black diamonds indicate composition, i.e., "has" relationships. For example, a Definition has any number of Anchor entities, each of which has exactly one Signifier entity.

Some comments on the major classes of entity:

Signifier.—This is what everything revolves around. A Signifier signifies a set of organisms, that is, a taxon. Signifier entities may be scientific names, specimens, character states, or taxonomic units in systematic studies.

Several Signifier entities may share the same SignifierIdentity, indicating that they are different ways of referring to the exact same thing. For example, Felis leo (ICZN) and Panthera leo (ICZN) are objective synonyms. (Subjective synonyms do not share the same identity.)

Authority.—Every Signifier is unique within an Authority. Authority entities may be publications, nomenclatural codes, personal opinions, specimen repositories, or bioinformatics files. Every Authority is associated with a unique URI (e.g., a web address, a DOI, an ISBN number, etc.).

Like Signifier entities, different Authority entities may share an identity (AuthorityIdentity). These Identity entities are hidden from other entities, so that Authority and Signifier entities can be equated or differentiated without affecting other entities in the database.

Relator.—A Relator is a set of Relation entities, each of which represent a statement about two Signifier entities, either Inclusion (i.e., a is a superset of b) or Parentage (i.e., a is immediately ancestral to b).

Definition.—A Definition defines a Signifier according to an Authority, and may have any number of Anchor entities, each of which tells whether a given Signifier is objectively a subset of the defined Signifier.

RankDefinition.—A RankDefinition consists of a rank and some number of internal Anchor entities. For example, under the ICZN (an Authority), Hominidae (a Signifier) is defined as the family (a rank) typified by Homo (a Signifier referenced by an internal Anchor).

PhyloDefinition.—A PhyloDefinition consists of a formula, expressed prosaically and mathematically. For example, according to Gauthier & de Queiroz 2001 (an Authority), Aves ( a Signifier) is defined as, "the crown clade stemming from the most recent common ancestor of Ratitae (Struthio camelus Linnaeus 1758), Tinamidae (Tetrao [Tinamus] major Gmelin 1789), and Neognathae (Vultur gryphus Linnaeus 1758)." This Definition is specified by three internal Anchor entities, respectively referencing the species Struthio camelus, Tetrao major, and Vultur gryphus (all of which are Signifier entities).

Dataset.—The relations in a Dataset are based on observation or hypothesis. A Dataset entity's Authority may be a bioinformatics file, a publication, or a personal opinion. As with Signifier entities, every Dataset can be uniquely identified by a qualified name, combining the URI of its Authority and a local name.

Context.—Applying phylogenetic definitions requires a Context, which is essentially a set of Dataset entities. All Definition and DefinitionApplication entities are implicitly included under every Context.

DefinitionApplication.—This is sort of the crux of the whole idea behind this project: that a PhyloDefinition can be automatically applied under a given Context.

No comments:

Post a Comment