Just to complement Torsten, for years now Plazi has been promoting the extraction of specimen data from the published record and to push them to GBIF. By now, this is for some journal an automated process that pushes observation data from a publication at the moment it has been uploaded to TreatmentBank  to GBIF as Darwin Core Archives. This can be followed here http://www.gbif.org/publisher/7ce8aef0-9e92-11dc-8738-b8a03c50a862 . Unfortunately, the name of recently described species only appear, once GBIF updated its taxonomic backbone. We hope that this might be changed sometimes in the not too far future which would allow to make observation data immediately upon publication accessible.


What would be interesting is that scientists would add persistent identifiers to each of the published record so that they can be extracted as well, and hopefully resolved to get to the original source of the cited specimen.


This is the case in Biodiversity Data Journal and prospective publishing.


Plazi mines PDFs, extracts and make accessible resident data. Part of it is through a fully automate process, which might end up in some errors. However, these errors can easily be corrected. In a similar editing step, observation record numbers can be used to annotate observation records, which then would also be pushed to GBIF.


Currently, 17,701 article have been mined resulting in 164,911 taxonomic treatments and an estimated well over 10 Millions of facts.


More details follow during the Saturday afternoon session at the ECN presentations by Plazi, Pensoft et al.







From: Entomological Collections Network Listserve [mailto:[log in to unmask]] On Behalf Of Dikow, Torsten
Sent: Tuesday, September 13, 2016 3:03 PM
To: [log in to unmask]
Subject: Re: Guidance on digitizing specimens for a project


Hi Nico,


I am publishing material examined lists from taxonomic revisions, with specimens originating from several natural history collections some of which upload their data GBIF and some don’t, through an IPT instance installed at my institution. As you will see (http://collections.nmnh.si.edu/ipt/), this GBIF IPT has several data-sets and among them the huge occurrence data-set from the entire NMNH collections.


I try as much as possible to utilize the original institutional unique specimen identifier by asking the curators to send these labels to me before attaching my “personal research identifier” as every single specimen needs to fulfill the Darwin Core Triplet during GBIF validation (institutionCode, collectionCode, unique identifier).


Twelve years back, Rudolf Meier and myself lobbied for a specimen depository from taxonomic revisions similar to GenBank (http://dx.doi.org/10.1111/j.1523-1739.2004.00233.x). I would say that utilizing a GBIF IPT instance at an institutional level fulfills this role and provides data to GBIF, which was originally conceived for natural history collection data only.


Miller et al. 2012 (http://dx.doi.org/10.1186/1741-7007-10-87) promote that publishers might be a better way to provide published specimen occurrence data to GBIF than individual researchers as most taxonomists likely do not want to deal with an extra step of uploading the specimen data through a GBIF IPT. Obviously, journals published by Pensoft do exactly this service (and many others) for the taxonomists.


I will touch on this (and other topics) during my talk at the upcoming ECN meeting.


Cheers, Torsten


Thank you, all.


   Sorry I dropped the ball there for a few days. I received several interesting off-line answers in addition. 


   I think I should also try to clarify. First off, for the (some) botanists - in entomology there is much less of a tradition of "creating duplicates" (of purportedly the same individual..thinking about branches of an oak tree here). Insect specimens overwhelmingly remain and travel "entire" (even following dissection). I hope that distinction is fair enough to most.


   Here is the conflict, as simple as I can state it. There is an institution that the specimens ultimately belong to, and that loans them out to a researcher. Then there is a researcher, not affiliated with the institution, who right now has resources and arguably needs to "publish" the specimens via iDigBio, GBIF, etc. (as well as other outlets such as a research journal).


   Let's assume that the owning institution just really does not have the resources right now. Not even to put a locally unique specimen identifier on it (or, it does that, but there is no digital counterpart). And the researcher does. Beyond writing a kind, explanatory e-mail, and figuring things out (idiosyncratically), is there some more widespread accepted practice for resolving this conflict? Answering "I use this or that portal that I happen to have access to and which does it for me", is not really a generally applicable answer, right?


   If not, should we as a community (in our most hopeful moments, anyway) consider creating one or more that are very open for contributions? Something like an open portal for digitizing and iDigBio-/GBIF-publishing research-relevant specimens of/for owner institutions that "just can't right now, sorry". 






. . . . .
Torsten Dikow, Ph.D.
Research Entomologist for Diptera

w 202.633.1005 [log in to unmask]



. . . access to research data at ORCiD http://orcid.org/0000-0003-4816-2909