Jennifer Lin – 2018 March 26
Each year brings with it a significant spike, an influx of new entrants, perhaps reflecting an increase in submissions at the end of the previous year. After January, volume drops down dramatically and gradually rises once more over the course of the year. We see smaller spikes at the March, June, and September mark. (Since this was a brief exercise, I did not dive into any formal research conducted on the nature of publishing cycles.) ### Metadata Coverage The next question is a look at how the population is broken up into different demographics. For this, I analyzed four key sub-populations of ORCID, funding information, license, abstract metadata. The following graph shows the percentage of new parties (i.e., works registered at Crossref containing these metadata) across four specific segments.
I ran [Karthik Ram’s script](https://gist.github.com/karthik/7e7875af0ecaa4327d3d61f550de94e0) which employed [rOpenSci’s r client](https://github.com/ropensci/rcrossref) for the [Crossref REST API](https://github.com/CrossRef/rest-api-doc/). Data are based on publication date rather than deposit date and represent all updates to the metadata record for the baseline view. The census graph shows extensive empty space on the top half, indicating there is ample room for continual growth in these communities. The ORCID population is expanding the fastest, followed by license and funding. Abstracts are a minority group and quite visibly needs a population boost here in Crossref-land. This view does not capture the percentages across content types nor does it take into account the differential rate of growth between content types (e.g., journal article, book, report, conference proceeding, dissertation, dataset, component, posted content, peer review) as the Crossref corpus has grown. While ORCID, funding, and license information are available for all full content types (viz., excludes components), this matters for abstracts. Abstracts are part of the metadata schema of all relevant content types. This excludes those which do not apply: dataset, component, and peer reviews. All things considered though, the relative impact on the total percentage of metadata deposited (or not deposited) is miniscule given the small sums for these works. ### Calling the real demographers & cartographers This mini-pseudo-lustrum was the result of a few hours of play. The graphs have raised more questions than answers. We welcome more serious and earnest efforts to dive into the metadata and conduct a more detailed, reliable investigation on the size, distribution and composition of the population through our [REST API](http://github.com/CrossRef/rest-api-doc). Next month, we will roll out reports on metadata coverage based on individual members. This “play” census came out of a session with Karthik Ram, one of the founders of [rOpenSci](https://ropensci.org/), as we talked about struggle to build better tools for researchers. (rOpenSci is an exciting and influential non-profit that builds open source software for research with a community of users and developers and educates scientists about transparent research practices.) With each round of cocktails, it became clear that a critical subset of the issues boiled down to the problem of limited information about research publications. Why, that is what Crossref does! Indeed. Publishers register their content with Crossref and provide the metadata about the works they publish. Over the past few years, we have been working with our members to broaden the coverage of the metadata as well as improve their metadata quality. This issue is not exclusive to Crossref - [Metadata 2020](http://www.metadata2020.org/) rallies stakeholders across the research enterprise to push for change together. To represent the full breadth and depth of the scholarly communications enterprise, Crossref aims to capture the richness of what our members publish through the content they register. So publishers, powerfully represent your services and make sure [your metadata is complete and correct](/services/content-registration/) for discovery systems, indexing platforms, research evaluation systems, analytics tools, and the great number of Crossref metadata consumers far and wide.
2020 March 27
2020 March 24