01 February 2009

Computer Programs as Taxonomists

After some struggling, the Names on Nodes service can finally synthesize a taxonomy from raw data. Basically, I fed it:
  1. Basic information about the nomenclatural codes (names, associated URIs, etc.).
  2. Similar basic information about a specimen collection (the Yale Peabody Museum's vertebrate paleontology collection).
  3. Similar basic information about two publications (Padian & al. 1999; Gauthier & de Queiroz 2001).
  4. Similar basic information about a website (Taxon Search: Taxon Search Archive Page - Stem Archosauria 1.0, i.e., Sereno 2005).
  5. A list of a bunch of zoological species names, including some objective synonymies (e.g., Tinamus major = Tetrao major).
  6. An entry for specimen YPM-VP 1450.
  7. A rank-based definition declaring that specimen to be the type of Ichthyornis dispar.
  8. Phylogenetic definitions for several clade names from Padian & al. 1999, Gauthier & de Queiroz 2001, and Sereno 2005.
  9. Two different phylogenies for Eumaniraptora: one at genus level, the other at species level.
  10. Instructions for correlating the operational taxonomic units in the phylogenies with formal taxonomic names.
Given all that, it was able to produce this:

Some of the tricky parts:
  • Recognizing superset-subset relations for applied phylogenetic definition (e.g., that Aves sensu Sereno 2005 is a proper superset of Aves sensu Gauthier & de Queiroz 2001).
  • Not listing taxa redundantly (e.g., not listing Ichthyornis dispar as a direct subset of Aves sensu Sereno 2005, since it's included within Ichthyornis).
  • Applying species-based definitions to a genus-level phylogeny.
  • Finding subjective/operational synonyms.
The testing continues....

6 comments:

  1. Pretty awesome. What were your two different phylogenies that you used?

    ReplyDelete
  2. Thanks. Whoops, meant to put those in the post. Here they are:

    ((Dromaeosaurus, Velociraptor), (Troodon, (Archaeopteryx,(Confuciusornis, (Hesperornis,(Ichthyornis, ((Passer, Vultur), (Struthio, Tinamus)))))))))

    ((archaeopteryx_lithographica, confuciusornis_sanctus), (troodon_formosus, (microraptor_gui, (dromaeosaurus_albertensis, deinonychus_antirrhopus, velociraptor_mongoliensis))))

    ReplyDelete
  3. While I'm at it, here are the prose definitions:

    Dromaeosauridae sensu Padian & al. 1999: "The most recent common ancestor of Dromaeosaurus and Velociraptor and all of its descendants."

    Dromaeosauridae sensu Sereno 2005: "The most inclusive clade containing Dromaeosaurus albertensis Matthew and Brown 1922 but not Troodon formosus Leidy 1856, Ornithomimus edmontonicus Sternberg 1933, Passer domesticus (Linnaeus 1758)."

    Neornithes sensu Sereno 2005: "The least inclusive clade containing Struthio camelus Linnaeus 1758 and Passer domesticus (Linnaeus 1758)."

    Aves sensu Gauthier & de Queiroz 2001: "'Aves' refers to the crown clade stemming from the most recent common ancestor of Ratitae (Struthio camelus Linnaeus 1758), Tinamidae (Tetrao [Tinamus] major Gmelin 1789), and Neognathae (Vultur gryphus Linnaeus 1758)."

    Palaeognathae gauthier & de Queiroz 2001: "'Palaeognathae' refers to the crown clade stemming from the most recent common ancestor of Tinamidae (Tetrao [Tinamus] major Gmelin 1789) and Ratitae (Struthio camelus Linnaeus 1758)."

    Aves sensu Sereno 2005: "The least inclusive clade containing Archaeopteryx lithographica Meyer 1861 and Passer domesticus (Linnaeus 1758)."

    Panpalaeognathae sensu Gauthier & de Queiroz 2001: "Therefore, we propose the name 'Panpalaeognathae' (new clade name), defined as the most inclusive clade containing both Tinamidae (Tetrao [Tinamus] major Gmelin 1789) and Ratitae (Struthio camelus Linnaeus 1758) but neither Galloanserae (Phasianus [Gallus] gallus Linnaeus 1758) nor Neoaves (Vultur gryphus Linnaeus 1758)."

    ReplyDelete
  4. And here are the MathML versions of the definitions.

    Dromaeosauridae sensu Padian & al. 1999:

    <apply
     xmlns="http://www.w3.org/1998/Math/MathML">
      <csymbol definitionURL
       ="http://namesonnodes.org/2009/phylo/math::nodeClade"/>
      <csymbol definitionURL
       ="urn:isbn:0853010064::Dromaeosaurus"/>
      <csymbol definitionURL
       ="urn:isbn:0853010064::Velociraptor"/>
    </apply>


    Dromaeosauridae sensu Sereno 2005:

    <apply
     xmlns="http://www.w3.org/1998/Math/MathML">
      <csymbol definitionURL
       ="http://namesonnodes.org/2009/phylo/math::branchClade"/>
      <csymbol definitionURL
       ="urn:isbn:0853010064::Dromaeosaurus_albertensis"/>
      <apply>
        <union/>
     <csymbol definitionURL
      ="urn:isbn:0853010064::Troodon_formosus"/>
     <csymbol definitionURL
      ="urn:isbn:0853010064::Ornithomimus_edmontonicus"/>
     <csymbol definitionURL
      ="urn:isbn:0853010064::Passer_domesticus"/>
      </apply>
    </apply>


    Neornithes sensu Sereno 2005:

    <apply
     xmlns="http://www.w3.org/1998/Math/MathML">
      <csymbol definitionURL
       ="http://namesonnodes.org/2009/phylo/math::nodeClade"/>
      <csymbol definitionURL
       ="urn:isbn:0853010064::Struthio_camelus"/>
      <csymbol definitionURL
       ="urn:isbn:0853010064::Passer_domesticus"/>
    </apply>


    Aves sensu Gauthier & de Queiroz 2001:

    <math
     xmlns="http://www.w3.org/1998/Math/MathML">
      <declare>
        <ci>Ratitae</ci>
        <csymbol definitionURL
         ="urn:isbn:0853010064::Struthio_camelus"/>
      </declare>
      <declare>
        <ci>Tinamidae</ci>
     <csymbol definitionURL
      ="urn:isbn:0853010064::Tetrao_major"/>
      </declare>
      <declare>
        <ci>Neognathae</ci>
     <csymbol definitionURL
      ="urn:isbn:0853010064::Vultur_gryphus"/>
      </declare>
      <apply>
        <csymbol definitionURL
      ="http://namesonnodes.org/2009/phylo/math::nodeClade"/>
     <ci>Ratitae</ci>
     <ci>Tinamidae</ci>
     <ci>Neognathae</ci>
      </apply>
    </math>


    Palaeognathae sensu Gauthier & de Queiroz 2001:

    <math
     xmlns="http://www.w3.org/1998/Math/MathML">
      <declare>
        <ci>Ratitae</ci>
        <csymbol definitionURL
         ="urn:isbn:0853010064::Struthio_camelus"/>
      </declare>
      <declare>
        <ci>Tinamidae</ci>
     <csymbol definitionURL
      ="urn:isbn:0853010064::Tetrao_major"/>
      </declare>
      <apply>
        <csymbol definitionURL
      ="http://namesonnodes.org/2009/phylo/math::nodeClade"/>
     <ci>Ratitae</ci>
     <ci>Tinamidae</ci>
      </apply>
    </math>


    Aves sensu Sereno 2005:

    <apply
     xmlns="http://www.w3.org/1998/Math/MathML">
      <csymbol definitionURL
       ="http://namesonnodes.org/2009/phylo/math::nodeClade"/>
      <csymbol definitionURL
       ="urn:isbn:0853010064::Archaeopteryx_lithographica"/>
      <csymbol definitionURL
       ="urn:isbn:0853010064::Passer_domesticus"/>
    </apply>


    Panpalaeognathae Gauthier & de Queiroz 2001:

    <math
     xmlns="http://www.w3.org/1998/Math/MathML">
      <declare>
        <ci>Ratitae</ci>
     <csymbol definitionURL
      ="urn:isbn:0853010064::Struthio_camelus"/>
      </declare>
      <declare>
        <ci>Tinamidae</ci>
     <csymbol definitionURL

      ="urn:isbn:0853010064::Tetrao_major"/>
      </declare>
      <declare>
        <ci>Galloanserae</ci>
     <csymbol definitionURL
      ="urn:isbn:0853010064::Phasianus_gallus"/>
      </declare>
      <declare>
        <ci>Neoaves</ci>
     <csymbol definitionURL
      ="urn:isbn:0853010064::Vultur_gryphus"/>
      </declare>
      <apply>
        <csymbol definitionURL
      ="http://namesonnodes.org/2009/phylo/math::branchClade"/>
        <apply>
       <union/>
       <ci>Ratitae</ci>
       <ci>Tinamidae</ci>
     </apply>
     <apply>
       <union/>
       <ci>Galloanserae</ci>
       <ci>Neoaves</ci>
     </apply>
      </apply>
    </math>
    <apply
     xmlns="http://www.w3.org/1998/Math/MathML">
      <csymbol definitionURL
       ="http://namesonnodes.org/2009/phylo/math::nodeClade"/>
      <csymbol definitionURL
       ="urn:isbn:0853010064::Dromaeosaurus"/>
      <csymbol definitionURL
       ="urn:isbn:0853010064::Velociraptor"/>
    </apply>


    Dromaeosauridae sensu Sereno 2005:

    <apply
     xmlns="http://www.w3.org/1998/Math/MathML">
      <csymbol definitionURL
       ="http://namesonnodes.org/2009/phylo/math::branchClade"/>
      <csymbol definitionURL
       ="urn:isbn:0853010064::Dromaeosaurus_albertensis"/>
      <apply>
        <union/>
     <csymbol definitionURL
      ="urn:isbn:0853010064::Troodon_formosus"/>
     <csymbol definitionURL
      ="urn:isbn:0853010064::Ornithomimus_edmontonicus"/>
     <csymbol definitionURL
      ="urn:isbn:0853010064::Passer_domesticus"/>
      </apply>
    </apply>


    Neornithes sensu Sereno 2005:

    <apply
     xmlns="http://www.w3.org/1998/Math/MathML">
      <csymbol definitionURL
       ="http://namesonnodes.org/2009/phylo/math::nodeClade"/>
      <csymbol definitionURL
       ="urn:isbn:0853010064::Struthio_camelus"/>
      <csymbol definitionURL
       ="urn:isbn:0853010064::Passer_domesticus"/>
    </apply>


    Aves sensu Gauthier & de Queiroz 2001:

    <math
     xmlns="http://www.w3.org/1998/Math/MathML">
      <declare>
        <ci>Ratitae</ci>
        <csymbol definitionURL
         ="urn:isbn:0853010064::Struthio_camelus"/>
      </declare>
      <declare>
        <ci>Tinamidae</ci>
     <csymbol definitionURL
      ="urn:isbn:0853010064::Tetrao_major"/>
      </declare>
      <declare>
        <ci>Neognathae</ci>
     <csymbol definitionURL
      ="urn:isbn:0853010064::Vultur_gryphus"/>
      </declare>
      <apply>
        <csymbol definitionURL
      ="http://namesonnodes.org/2009/phylo/math::nodeClade"/>
     <ci>Ratitae</ci>
     <ci>Tinamidae</ci>
     <ci>Neognathae</ci>
      </apply>
    </math>


    Palaeognathae sensu Gauthier & de Queiroz 2001:

    <math
     xmlns="http://www.w3.org/1998/Math/MathML">
      <declare>
        <ci>Ratitae</ci>
        <csymbol definitionURL
         ="urn:isbn:0853010064::Struthio_camelus"/>
      </declare>
      <declare>
        <ci>Tinamidae</ci>
     <csymbol definitionURL
      ="urn:isbn:0853010064::Tetrao_major"/>
      </declare>
      <apply>
        <csymbol definitionURL
      ="http://namesonnodes.org/2009/phylo/math::nodeClade"/>
     <ci>Ratitae</ci>

     <ci>Tinamidae</ci>
      </apply>
    </math>


    Aves sensu Sereno 2005:

    <apply
     xmlns="http://www.w3.org/1998/Math/MathML">
      <csymbol definitionURL
       ="http://namesonnodes.org/2009/phylo/math::nodeClade"/>
      <csymbol definitionURL
       ="urn:isbn:0853010064::Archaeopteryx_lithographica"/>
      <csymbol definitionURL
       ="urn:isbn:0853010064::Passer_domesticus"/>
    </apply>


    Panpalaeognathae Gauthier & de Queiroz 2001:

    <math
     xmlns="http://www.w3.org/1998/Math/MathML">
      <declare>
        <ci>Ratitae</ci>
     <csymbol definitionURL
      ="urn:isbn:0853010064::Struthio_camelus"/>
      </declare>
      <declare>
        <ci>Tinamidae</ci>
     <csymbol definitionURL
      ="urn:isbn:0853010064::Tetrao_major"/>
      </declare>
      <declare>
        <ci>Galloanserae</ci>
     <csymbol definitionURL
      ="urn:isbn:0853010064::Phasianus_gallus"/>
      </declare>
      <declare>
        <ci>Neoaves</ci>
     <csymbol definitionURL
      ="urn:isbn:0853010064::Vultur_gryphus"/>
      </declare>
      <apply>
        <csymbol definitionURL
      ="http://namesonnodes.org/2009/phylo/math::branchClade"/>
        <apply>
       <union/>
       <ci>Ratitae</ci>
       <ci>Tinamidae</ci>
     </apply>
     <apply>
       <union/>
       <ci>Galloanserae</ci>
       <ci>Neoaves</ci>
     </apply>
      </apply>
    </math>

    ReplyDelete
  5. Hi Mike. I'd like to ask a silly question: what is your ultimate intention with this stuff? From my point of view this looks like a means to getting a kind of consensus tree for dinosaurs/amniotes and hence is a sort of supertree approach. I've often dreamed of doing something along those lines myself, i.e. a database structure (including published cladograms, with labelled nodes as well as lists of taxonomic validity) with scripts that automate the kinds of tasks we did for the recent dinosaur supertree. However, I am nowhere near competent enough with SQL and the like yet. I suspect that your intentions are somewhat different, but by the looks of things your stuff does most of the same tasks.

    I am therefore looking forwards to the eventual publication of this monster!

    ReplyDelete
  6. Actually, the goal is not to have one consensus tree, but to let anyone pick which datasets they want to use. If you look at this post, you'll see there's something called a "Context" which refers to a set of "Dataset" objects. (A context implicitly also includes all definitions, and all applications of definitions under that context.) So Names on Nodes will allow everyone to piece together their own phylogenetic contexts. For your supertree, you'd probably just upload the NEXUS file or Newick tree string produced by your analysis and have that as the only dataset in your context. (Although you could augment with with a dataset reflecting your own taxonomic opinions for rank-based taxa, e.g., "Tyrannosaurus ICZN 4 includes Tarbosaurus bataar ICZN 4," etc.)

    If I could sum up my goal in one sentence, it would be: "To fulfill phylogenetic nomenclature's promise of objective application."

    This monster's taken a while. Hopefully I can put up an alpha version sometime this year.

    ReplyDelete