What social media data should I use in my research? A response to Choi et al (2016).

Firstly, apologies for not blogging in quite a while; I’ve been finishing off my PhD which I’m super happy to announce I passed, with no corrections B-). It’s been a long process but I’m really proud of the finished product and I’m working on getting publications and a book out from it ASAP. Stay tuned for more news!

Secondly, and to get to the point of this post, a great article has just been published entitled “What social media data should I use in my research?: a comparative analysis of Twitter, YouTube, Reddit, and the New York Times comments”.

It’s been put out by a group of researchers from the State University of New Jersey. Namely Dongho Choi, Ziad Matni, and Chirag Shah. It was presented at the 79th ASIS&T Annual Meeting in Copenhagen a few months ago (October 2016). The full link to the article can be found here.

It’s a really great article, and it is truly truly great to see people moving towards a broader definition of social media. For far too long, Facebook and Twitter have held a relatively unchallenged monopoly over social media research. It’s easy to understand why; they are currently the most popular platforms by some distance in the western world. They also put out a staggering wealth of content to analyse and utilize. In many ways, they present perfect spaces through which to understand a range of issues, and they produce rich and detailed data.

However, thanks to the pioneering work of researchers such as Paul Hodkinson, Deborah Lupton, Sonja Utz, Rachel Kowert, Nicole Ellison, Xuan Zhao, Caleb T. Carr, and many others, digital research is again spreading out and looking at the social internet in its messy and overlapping entirety. That means embracing multiple platforms and exploring a range of spaces that contain various social elements. This should be encouraged, especially as recent statistical research from PEW (Lenhart, 2015) shows that young people are increasingly present on multiple platforms. Users are not using one platform alone; they exist in and across multiple spaces, and are increasingly using a broad array of platforms beyond Facebook and Twitter alone. As such, in order to understand the experiences of users online, a broader focus is needed, lest digital research gets left a decade behind the progressing reality of social media for many users.

In order to understand social spaces, there is a need to understand them in their everyday embedded mundanity. That is, there’s a need to explore the reality of social media for many people. It’s great to see that one aspect of this is being embraced in digital sociology in that researchers are moving beyond a reliance upon Facebook and Twitter alone.

This leads me on to another important point though that also needs to be considered, especially since the meteoric rise of ‘big data’ research. Whilst Choi et al. do highlight a range of platforms, they miss one major point that needs to be considered. Namely, that there is not only a need to consider which platforms to gather data from, but also a need to look at what counts as social media data in the first place. Whilst it is so tempting to collect the content produced on social media platforms and analyse this data, there is a desperate need in digital research to realize that this content does not actively reflect the everyday mundane reality of social media for users.

What about the reader flicking through Twitter on the toilet? Or the commuter watching YouTube videos and reading the comments (never read the comments….)? Or the bored student closing Reddit down, only to open it back up 30 seconds later? Or the ex-partner hate-browsing Instagram? How do we gather this data? Is this type of use even worth accounting for when there is a wealth of growing data out there to analyse?

This may seem like a relatively odd aspect to highlight, but when we look into the research in this area, it becomes clear that actually making and posting content, be it posts, tweets, images, Snapchats, or Vines (remember them?), is only a small part of a user’s engagement with social media. Nonetheless, this small aspect of research has attracted most of the attention for online researchers, who collect this content and use it to speak on social experiences online. However, researchers such as Renee Barnes and Jonathan Bright suggest that actually producing content is a minor part of the social media experiences. Whilst there is undoubtedly a wealth of data out there to analyse from user-produced content alone, we should stop and ask ourselves what it means that we are only focusing upon this produced content.

It appears that in order to consider the social uses of the internet it is crucial that the focus of research is not upon content production alone, and more importantly that other uses are acknowledged, and not seen as secondary or devalued. Though this produced content online is rich, obvious, and plentiful, this doesn’t mean that this is the ‘average’ use and experience of social media, and that we should only pay attention to these loud voices online. Though this data is shouting at us, we should take a step back and consider what the reality of social media is beyond this loud data.

As Kate Crawford (2009) points out, there is often a temptation in digital research to listen to those who speak loudly and who actively participate by producing content, but this “privileging of voice” (Crawford, 2009, 527) denies the many nuanced uses of social media beyond merely producing content. Even those who produce a lot of content will also use social media in manners beyond this alone. When attempts to account for these uses have been made, they often serve to minimize them or place them as secondary uses (Norman et al., 2015). For example, the terms ‘peripheral participants’ (Zhang & Storck, 2001) and ‘non-public participants’ (Nonnecke & Preece, 2003) have been used to describe these users. As Crawford (2009) argues of these definitions: “they continue to define this majority group by what they are not: not public, not at the center. As terms, they fail to offer a sense of what is being done, and why it is important to online participation” (Crawford, 2009, 527).

My own research revealed a large range of uses beyond content production that needed to be accounted for when considering the reality of social media for users. Below I’ve included a picture of the coding of my own data that shows some of the many uses of social media that might have been missed by focusing upon produced content alone.


These are only a few of the many uses of social media beyond content production. I’m sure you can think of more yourself. Considering them allows for a potentially deeper understanding of the reality of social media and the increasingly important role it plays in our lives. So, in response to Choi et al.’s question of what data to include in social media research, I would suggest that not only do we need to continue to move beyond Facebook and Twitter alone, but we also need to actively challenge the prioritization of voice, and pay attention to the reality of social media. This means acknowledging that content production forms a small part of the social media experience, and that we should not be so quick to focus upon this aspect of the social media experience alone. In doing so, we risk capturing and reporting on only a small aspect of the social media experience. At best, this ignores the wealth of experiences beyond producing data, and at worst, it means that our data is potentially invalid as it does not reflect or represent the social media experiences of users.


3 thoughts on “What social media data should I use in my research? A response to Choi et al (2016).

  1. Pingback: Warum der Kampf gegen #hatespeech und #fakenews auf Facebook irreführend ist – und welche Alternativen sich bieten – Avada Classic

  2. Hi Harry,
    Before going further, I have to confess to following the herd and having my primary focus as Twitter, although In my defence, it was the platform identified by those who are at the centre of my research.
    Now, you make a point in here regarding the collection of data which has troubled me – the hidden, obscured, invisible aspects of participation in social media. As researchers we do indeed often gather and analyse the data produced, ignoring or sidelining the context and circumstances in which they were produced.
    I had a shot at trying to explore that (https://cpdin140.wordpress.com/about/participant-information-audio-arcs/), but without much success (it’s based on ‘think-aloud’ methods). When I discussed this with an interviewee later, they said how trying to do that simply wouldn’t have been manageable, given the way they interacted with social media. Unfortunately I’d assembled a method based on the way I access social media; the range and variety of ways that other folks are involved is going to need a much more flexible and adaptable method than my initial attempt I suspect. More work to do.


  3. Pingback: Online news, fake news, filter bubbles, and mainstream media. Or how I learnt to stop retweeting Trump.  | Harry T Dyer

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s