In exploring TOL-200M, it was noticed that there are 223,284 entries that have a resolution path of:
kingdom: Plantae
phylum: null
class: null
order: null
family: null
genus: Plantae
species: null
Example entry resolution:
{'uuid': 'e571a2d3-07f3-45fe-b372-67c0594de0c8',
'scientific_name': 'Plantae',
'common_name': 'plants; plants',
'source_dataset': 'gbif',
'source_id': '1038946915',
'resolution_status': 'EXACT_MATCH_PRIMARY_SOURCE_ACCEPTED_AUTHOR_DISAMBIGUATION',
'kingdom': 'Plantae',
'phylum': '',
'class': '',
'order': '',
'family': '',
'genus': 'Plantae',
'species': '',
'resolution_path': 'RESOLVED',
'resolution_strategy': 'ExactMatchPrimarySourceAcceptedAuthorDisambiguation',
'final_query_term': 'Plantae',
'final_query_rank': 'scientific_name',
'final_data_source_id': 11,
'meta_selected_record_id': None,
'meta_candidate_count': None,
'meta_accepted_record_id': None,
'meta_matched_result_id': '11387779',
'meta_matched_full_name': 'Plantae',
'meta_author_disambiguation': 'true',
'resolution_failure_reason': None,
'meta_original_status': None,
'meta_force_failed_to_input': None,
'meta_original_attempt_key': None,
'source_file': 'part-00000-3990aff1-4728-49f9-bd76-087ff566f4fc-c000.snappy.resolved.parquet',
'meta_matched_current_name': None,
'meta_synonym_matched': None,
'meta_accepted_name': None,
'meta_disambiguated_record_id': None}
Corresponding Input:
>>> row_dict
{'source_id': '1038946915', 'uuid': 'e571a2d3-07f3-45fe-b372-67c0594de0c8', 'scientific_name': 'Plantae', 'phylum': None, 'class': None, 'order': None, 'family': None, 'genus': None, 'species': None, 'kingdom': 'Plantae', 'common_name': 'plants; plants'}
(in source_taxa/source=gbif/part-00000-3990aff1-4728-49f9-bd76-087ff566f4fc-c000.snappy.parquet)
GNVerifier query result for "Plantae"
$ docker run --rm -i gnames/gnverifier:v1.2.5 -j 1 --format compact --capitalize --all_matches --sources 11 "Plantae" | jq
{
"id": "827f5f3d-f332-5d4e-9ec9-6dbf1b07bdd9",
"name": "Plantae",
"cardinality": 1,
"matchType": "Exact",
"results": [
{
"dataSourceId": 11,
"dataSourceTitleShort": "GBIF Backbone Taxonomy",
"curation": "AutoCurated",
"recordId": "11387779",
"outlink": "https://gbif.org/species/11387779",
"entryDate": "2024-01-11",
"sortScore": 9.410739851747246,
"matchedNameID": "827f5f3d-f332-5d4e-9ec9-6dbf1b07bdd9",
"matchedName": "Plantae",
"matchedCardinality": 1,
"matchedCanonicalSimple": "Plantae",
"matchedCanonicalFull": "Plantae",
"currentRecordId": "11387779",
"currentNameId": "827f5f3d-f332-5d4e-9ec9-6dbf1b07bdd9",
"currentName": "Plantae",
"currentCardinality": 1,
"currentCanonicalSimple": "Plantae",
"currentCanonicalFull": "Plantae",
"taxonomicStatus": "Accepted",
"isSynonym": false,
"classificationPath": "Plantae|Plantae",
"classificationRanks": "kingdom|genus",
"classificationIds": "6|11387779",
"editDistance": 0,
"stemEditDistance": 0,
"matchType": "Exact",
"scoreDetails": {
"cardinalityScore": 1,
"infraSpecificRankScore": 0,
"fuzzyLessScore": 1,
"curatedDataScore": 0.33333334,
"authorMatchScore": 0.14285715,
"acceptedNameScore": 1,
"parsingQualityScore": 1
}
},
{
"dataSourceId": 11,
"dataSourceTitleShort": "GBIF Backbone Taxonomy",
"curation": "AutoCurated",
"recordId": "6",
"outlink": "https://gbif.org/species/6",
"entryDate": "2024-01-11",
"sortScore": 9.410739851747246,
"matchedNameID": "827f5f3d-f332-5d4e-9ec9-6dbf1b07bdd9",
"matchedName": "Plantae",
"matchedCardinality": 1,
"matchedCanonicalSimple": "Plantae",
"matchedCanonicalFull": "Plantae",
"currentRecordId": "6",
"currentNameId": "827f5f3d-f332-5d4e-9ec9-6dbf1b07bdd9",
"currentName": "Plantae",
"currentCardinality": 1,
"currentCanonicalSimple": "Plantae",
"currentCanonicalFull": "Plantae",
"taxonomicStatus": "Accepted",
"isSynonym": false,
"classificationPath": "Plantae",
"classificationRanks": "kingdom",
"classificationIds": "6",
"editDistance": 0,
"stemEditDistance": 0,
"matchType": "Exact",
"scoreDetails": {
"cardinalityScore": 1,
"infraSpecificRankScore": 0,
"fuzzyLessScore": 1,
"curatedDataScore": 0.33333334,
"authorMatchScore": 0.14285715,
"acceptedNameScore": 1,
"parsingQualityScore": 1
}
}
],
"curation": "AutoCurated"
}
The exact_match_primary_source_accepted_author_disambiguation.py profile incorrectly matches this case (suspect: step 5 not specific enough).
In exploring TOL-200M, it was noticed that there are 223,284 entries that have a resolution path of:
Example entry resolution:
Corresponding Input:
(in
source_taxa/source=gbif/part-00000-3990aff1-4728-49f9-bd76-087ff566f4fc-c000.snappy.parquet)GNVerifier query result for "Plantae"
The
exact_match_primary_source_accepted_author_disambiguation.pyprofile incorrectly matches this case (suspect: step 5 not specific enough).