what do huge numbers + powerful computers tell us about scholarly literature? (peer-reviewed Monday)

ResearchBlogging.org A little more than a month ago, I saw a reference to an article called Complexity and Social Science (by a LOT of authors).  The title intrigued me, but when I clicked through I found out that it was about a different kind of complexity than I had been expecting.

Still, because the authors had made the pre-print available, I started to read it anyway and found myself making my way through the whole thing. The article is about what might be possible with computers and data and powerful computers able to crunch lots of data – what might be possible for the social sciences, not just the life sciences or the physical sciences. The reason it grabbed me was this here -

Computational social science could easily become the almost exclusive domain of private companies and government agencies. Alternatively, there might emerge a “Dead Sea Scrolls” model, with a privileged set of academic researchers sitting on private data from which they produce papers that cannot be critiqued or replicated. Neither scenario will serve the long-term public interest in the accumulation, verification, and dissemination of knowledge.

See, the paper opens by making the point that research in fields like biology and physics have been incontrovertibly transformed by “capacity to collect and analyze massive amounts of data” but while lots and lots of people are doing stuff online every day – stuff that leaves “breadcrumbs” that can be noticed, counted, tracked and analyzed, the literature in the social sciences includes precious few examples of that kind of data analysis.  Which isn’t to say that it isn’t happening – it is and we know it is, but it’s the googles and the facebooks and the NSA’s that are doing it. The quotation about gets at the implications of that.

The article is brief and well worth a scan even if you, like me, need a primer to really understand the kind of analysis they are talking about.  I read it, bookmarked it, briefly thought about writing about it here but couldn’t really come up with the information literacy connection I wanted (there is definitely stuff there – if nowhere else it’s in the discussion of privacy, but the connection I wasn’t looking for wasn’t there for me at that moment) so I didn’t.

But then last week, I saw this article, Clickstream Data Yields High-Resolution Maps of Science, linked in the ResearchBlogs twitter feed (and since then at Visual Complexity, elearnspaceStephen’s Web, Orgtheory.net, and EcoTone).

And they connect – because while this specific type of inquiry isn’t one of the examples listed in the Science article, this is exactly what happens when you turn the huge amounts of data available, all of those digital breadcrumbs, into a big picture of what people are doing on the web — in this case what they are doing when they work with the scholarly literature. And it’s a really cool picture:

The research is based on data gathered from “scholarly web portals” – from publishers, journals, aggregators and institutions.  The researchers collected nearly 1 billion interactions from these portals, and used them to develop a journal clickstream model, which was then visualized as a network.

For librarians, this is interesting because it adds richness to our picture of how people, scholars, engage with the scholarly literature – dimensions not captured by traditional measures of impact data.  For example, what people cite and what they actually access on the web aren’t necessarily the same thing, and a focus on citation as the only measure of significance has always provided only a part of whatever picture there is out there.  Beyond this, as the authors point out, clickstream data allows analysis of scholarly activity in real-time, while to do citation analysis one has to wait out the months-and-years delay of the publication cycle.

It’s also interesting in that it includes data not just from the physical or natural sciences, but from the social sciences and humanities as well.

What I also like about this, as an instruction librarian, is the picture that it provides of how scholarship connects.  It’s another way of providing context to students who don’t really know what disciplines are, don’t really know that there are a lot of different scholarly discourses, and who don’t really have the tools yet to contextualize the scholarly literature they are required to use in their work.  Presenting it as a visual network only highlights this potential for this kind of research more.

And finally – and pulling this back to the Science article mentioned at the top, this article is open – published in an open-access journal and I have to think that the big flurry of attention is has received in the blogs I read, blogs with no inherent disciplinary or topical connection to each other, is in part because of that.


Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabasi, A., Brewer, D., Christakis, N., Contractor, N., Fowler, J., Gutmann, M., Jebara, T., King, G., Macy, M., Roy, D., & Van Alstyne, M. (2009). SOCIAL SCIENCE: Computational Social Science Science, 323 (5915), 721-723 DOI: 10.1126/science.1167742

Bollen, J., Van de Sompel, H., Hagberg, A., Bettencourt, L., Chute, R., Rodriguez, M., & Balakireva, L. (2009). Clickstream Data Yields High-Resolution Maps of Science PLoS ONE, 4 (3) DOI: 10.1371/journal.pone.0004803

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s