[This email originated from outside of OSU. Use caution with links and attachments.]
I thank you for your help on this, but I hope you can see, I’m in a bind here. Someone want’s to cite specific entries in SCAN in a publication, and it can’t be done in a reasonable fashion. I’m going to have to tell the authors
to download all the data, report “collection”, “catalogNumber” and “recordId” in their paper. Then the authors have to instruct any readers of the paper, that if those readers want to find those entries, the reader will have to download all the data for the
taxon from SCAN, then search within it and match all three of those, because I still don’t have confirmation that “catalogNumber” is unique and stays with the specimen.
You've nailed it. As a general rule, all aggregators have been lenient
when it comes to this very situation, preferring to erect few or no
bars for entry while simultaneously requiring their own technical
solutions to deal with the often ambiguous "UTC12345" catalogNumbers.
Some like GBIF will use an artificial datasetKey as a seed to mint a
gbifID (functionally equivalent to a recordID) even while there
*might* be a globally unique occurrenceID (because not ALL publishers
have them & so all are dragged down to a lowest common denominator)
but then get into trouble when the dataset is republished whole or in
part under a new brand. The owner of the specimen data is not aware of
the downstream consequences because they were not the ones responsible
for having created the GUID. We have not fully grasped what persistent
linking to individual specimens *really* entails. "Just get the data
out and we'll deal with the messes later" is the mantra.
What you observe here Mike is a direct result of a decentralized network.
As an aside, I refresh all specimen data from GBIF every two weeks in
support of Bionomia,
. A major part of this
refresh process is to deal with the broken links that ALWAYS ensue.
There are often 50k+ such broken links & I spend 2-3 days sorting out
what happened - there are no redirects I can use. Want to guess which
field of data I often fallback on to restitch specimen record to