11 January 2009

More phylogeny-related questions that can be written as SQL

Pursuant to my last post (Using SQL to make phylogenetic queries), I've started to dream up some other questions that Names on Nodes will be able to answer:

"What are all dinosaur species that lived during the Maastrichtian Age?"


Possible SQL:

SELECT dinosaur.name
FROM node_clade(ARRAY(
SELECT *
FROM resolve_signifier_identities(ARRAY[
'urn:isbn:0853010064::Hylaeosaurus_armatus',
'urn:isbn:0853010064::Iguanodon_bernissartensis',
'urn:isbn:0853010064::Megalosaurus_bucklandii'
])
), 0) AS dinosaur_identity
JOIN signifier AS dinosaur
ON dinosaur.identity_id = dinosaur_identity
JOIN signifierannotation
ON dinosaur.id = signifierannotation.signifier_id
AND signifierannotation.role
= 'http://namesonnodes.org/quantities::years_bp'
JOIN quantity AS years_bp
ON signifierannotation.id = years_bp.id
JOIN signifier_categories
ON signifier_categories.signifier_id = dinosaur.id
WHERE years_bp.value <= 70600000
AND years_bp.value >= 65500000
AND signifier_categories.element = 'species'
GROUP BY dinosaur.name
ORDER BY dinosaur.name;

"What are the largest named subtaxa of Angiospermae?"


Possible SQL:

SELECT angiosperm.name
FROM coarsest(ARRAY(
SELECT * FROM subsets(
resolve_signifier_identity(
'urn:isbn:00800694/146::Angiospermae'),
, 0)
EXCEPT
SELECT resolve_signifier_identity(
'urn:isbn:00800694/146::Angiospermae')
), 0) AS angiosperm_identity
JOIN signifier AS angiosperm
ON angiosperm.identity_id = angiosperm_identity
GROUP BY angiosperm.name
ORDER BY angiosperm.name;

"Which catalogued specimens represent stem-mammals?"


Possible SQL:

SELECT catalogue.name, specimen.name
FROM signifier AS specimen
JOIN authority
ON specimen.authority_id = authority.id
JOIN authorityidentity AS catalogue
ON authority.identity_id = catalogue.id
JOIN signifier_categories
ON signifier_categories.signifier_id = specimen.id
AND signifier_categories.element = 'specimen'
WHERE specimen.identity_id = ANY(
SELECT total(
ARRAY(SELECT
resolve_signifier_identity(
'urn:isbn:0853010064::Mammalia')),
ARRAY(SELECT
resolve_signifier_identity(
'urn:doi:10.1051/jp4:20020447::Anthropocene')),
0)
EXCEPT
SELECT subsets(
ARRAY(SELECT resolve_signifier_identity(
'urn:isbn:0853010064::Mammalia')),
0)
)
GROUP BY catalogue.name, specimen.name
ORDER BY catalogue.name, specimen.name;

"How many anseriform species are flightless?"


Possible SQL:

SELECT COUNT(anseriform.identity_id)
AS flightless_anseriform_count
FROM signifier AS anseriform
JOIN signifier_categories
ON signifier_categories.signifier_id = anseriform.id
AND signifier_categories.element = 'species'
WHERE anseriform.identity_id = ANY(
SELECT * FROM subsets(ARRAY(
SELECT resolve_signifier_identity(
'urn:isbn:0853010064::Anseriformes')
),
0)
)
AND anseriform.identity_id = ANY(
SELECT * FROM subsets(ARRAY(
SELECT resolve_signifier_identity(
'biofile:9A75AD84C56B43AD681991EBA56D8B4E6E36B311'
|| '::data/flight/absent')
),
0)
);


Many more possibilities. Note that the "0" in the queries above means "use all available information". Other numbers can be used to restrict information to certain datasets.

No comments:

Post a Comment