I would also point out that the library example given is good on certain points, but does not recognize the fact that we seldom (if ever) need to cite a specific copy of a book/reference.  

So, yes, the library likely has a specific identifier for each and every copy of every book/reference they have, and when they get a new book, the slap a barcode tag or mag strip into the book so they can manage their inventory.  But that unique ID is again, likely local (or regional to a consortium of libraries, not universal).   This works because end users typically don’t need to refer to a particular copy of a book, 

eg., "Copy 3 of 'To Kill a Mockingbird' housed at Centerville Public Library, branch 2”.  If you did need t cite a particular book in a particular library (perhaps it has hand written marginalia by a famous person, or was checked out on a particular date and is evidence in a murder)…you’d find yourself in much the same problem as insect specimen records… you ‘have’ a way to cite it, but it’s simply not going to have a universal GUID, and certainly not one that is guaranteed to be persistent/resolvable over time.


On Mar 22, 2021

I thank you for your help on this, but I hope you can see, I’m in a bind here. Someone want’s to cite specific entries in SCAN in a publication, and it can’t be done in a reasonable fashion. I’m going to have to tell the authors to download all the data, report “collection”, “catalogNumber” and “recordId” in their paper. Then the authors have to instruct any readers of the paper, that if those readers want to find those entries, the reader will have to download all the data for the taxon from SCAN, then search within it and match all three of those, because I still don’t have confirmation that “catalogNumber” is unique and stays with the specimen.

You've nailed it. As a general rule, all aggregators have been lenient
when it comes to this very situation, preferring to erect few or no
bars for entry while simultaneously requiring their own technical
solutions to deal with the often ambiguous "UTC12345" catalogNumbers.
Some like GBIF will use an artificial datasetKey as a seed to mint a
gbifID (functionally equivalent to a recordID) even while there
*might* be a globally unique occurrenceID (because not ALL publishers
have them & so all are dragged down to a lowest common denominator)
but then get into trouble when the dataset is republished whole or in
part under a new brand. The owner of the specimen data is not aware of
the downstream consequences because they were not the ones responsible
for having created the GUID. We have not fully grasped what persistent
linking to individual specimens *really* entails. "Just get the data
out and we'll deal with the messes later" is the mantra.

What you observe here Mike is a direct result of a decentralized network.

As an aside, I refresh all specimen data from GBIF every two weeks in
support of Bionomia, . A major part of this
refresh process is to deal with the broken links that ALWAYS ensue.
There are often 50k+ such broken links & I spend 2-3 days sorting out
what happened - there are no redirects I can use. Want to guess which
field of data I often fallback on to restitch specimen record to
collector/determiner? catalogNumber.