Data collection for the propaganda project begins in earnest

Great research requires data. Here’s what this means for Propaganda, including how the data collection is evolving.

At 5 am my alarm went off, and I actually got hauled my ass out of bed like it was a Regular Working Day. There was some stuff I had to do for work before the day kicked off, but it was all done before 6.30 am and I was out of the woods.

So I sat and began copying transcriptions off YouTube videos. I started this at around 0645, and didn’t stop until I was worried that the postie would turn up and I’d have to answer the door in my dressing gown. At 1030.

The propaganda project is a giant – thing – and for ages I’ve felt like I’ve been doing nothing. So I figured that if I start collecting data, then at least by the time I’m ready to begin coding it, I’ll have something to work with.

What is data collection for this project?

‘Collecting data’, for the purposes of this project, involves two things:

  1. Getting written artefacts of as much media as possible
  2. Capturing project-related meta data about those artefacts
  3. (eventually) adding interview transcripts to the mix.

The first challenge is that not every YouTube video posting (whether entertainment, or news, or instruction, or commentary) actually has a transcript.

The second challenge is that Australian media is fucking awful at providing transcripts of… well, of anything much. The ABC Australia radio and podcast sites have loads of shows, and no transcripts. There are only selected transcripts of selected episodes from selected shows. So you can get some of the political and news interviews, but nothing about broader Australian culture.

The absence of transcripts is a problem in terms of accessibility. But it’s also a problem for researchers (or readers who are up before everyone else and don’t have headphones for example). In order to bring news or social commentary into the mix, you have to transcribe it.

How I started to use Google Voice to transcribe videos

Transcribing anything is expensive, especially at volume, even if you use a fabulous service like

For example, my company is a Rev partner. We use them a lot! (And recommend them a lot!) But even at $1 USD, transcribing high volumes gets pricey. Right now, $1 USD is close to $1.50 AUD +conversion fees, so even though it’s leagues cheaper than a local service, it’s enough to sit it squarely outside of my current budget.

And this, boys and girls, is how come I started drilling the internet for alternative techniques.

Of all of the methods available, by far the smartest so far is using Google’s Voice to Text inside Google Docs. It’s far easier than transcribing by hand, but what you gain sometimes in speed and ease you sacrifice in accuracy and punctuation.

Or action.

Google Voice suspends itself sometimes, like it’s waiting for something it wants to work on. For example, getting Google Voice to transcribe Alex Jones is an interesting exercise on its own. It’s probably (likely) coincidence, but every time Jones started talking about censorship issues with big tech companies (including Google), Google Voice recorded actual gobbledegook, or just didn’t do anything. When he was talking about searching legislation, it was fine. My inner tinfoil-hat-wearer loved watching it and speculating on that, even thought a singular instance is meaningless. Ha!

However, being able to record transcripts of the un-transcripted, for free, is a massive win given the project is unfunded. If you have spare capital and you’d like to ease the pressure (or speed up the process!) you’re welcome to throw me a donation, which I’ll use for transcriptions from Rev:

Why do I need transcripts anyway?

The reason for the transcribing is because, in order to conduct any meaningful qualitative analyses, you have to have a documented artefact. If you don’t have it written, you can’t create a coding schema; and without a coding schema, you have no analysis.

The propaganda project will be built on a multivariate analysis of a high volume of primary and secondary artefacts from across all facets of society. They will include all perspectives, from the most extreme on both ends of the continuum, to the most moderate. It will include podcasts, vodcasts, news, radio, TV, commentary of all kinds, films both fiction and non-fiction, music, magazines, books both fiction and non-fiction, policy, legislation, curricula, advertising of all kinds.

The earliest conceptualisation of sample size

While looking for guidance on sample size in qualitative analysis, given it’s been almost 15 years since my last research-based degree, I found Jim Macnamara’s Media content analysis: Its uses, benefits and best practice methodology, which was published in the Asia Pacific Public Relations Journal.

In the paper, Macnamara talks about the very serious reasons why qualitative researchers who use media content analysis don’t seek statistical significance. It’s because the methodology is so big – in the sense that it has so many parts to consider, and so much depth – and takes so much time, that gaining anything close to statistical significance is almost impossible. It’s why so many qualitative researchers use small sample sizes.

The propaganda project can’t afford small sample sizes. As something that’s looking at something that functions at a level of social creation and social programming, it must include samples from as many parts of social life as possible. If it doesn’t, then the study is hamstrung right from the beginning.

This is how I decided that capturing 50 specimens from each source is probably enough. Therefore, out of just 10 sources, I’ll already have 500 specimens. This then led to the design of the set of metadata captured alongside: It will enable a close examination of “balance” (if feasible) in terms of source types, channel types, and content types.

There is much more work to do in terms of defining other qualities of the sample, such as timeframes and issues, and so on. But at least it’s gotten started. While I was doing it, an idea struck me to include non-Western monocultures like Japan, so that the framework can be tested against other social environments.

But it’s already a gigantic thing, so we’ll see.

The day literally disappeared – but now I’m actually doing something!

Half the day disappeared in no time. And by 2.30 pm, which is about the time of day when I record patrons’ only podcasts about the process, I was feeling dual pains: One, not enough time; Two, immense excitement.

At the very least, now I feel like I am actually doing something!

Reading is doing, thinking is doing, but nothing is as doing-oriented as capturing data.

It reminds me of how much work is actually involved in research.

It’s astonishing how much work is actually involved in rigorous research-based writing. Every time I do it, it surprises me.

It reminds me of Hofstadter’s Law:

Everything takes longer than you expect, even when taking into account Hofstadter’s Law.

Hofstadter, Douglas. Godel, Escher, Bach.

So what’s next?

I’m finding that the propaganda project, which I consider essential in terms of my life’s work, isn’t exactly stretching my capabilities. The next task for me as a creator is really finding the balance in the creative work.

There are two projects that desperately require work – The Integration Project and Ultimatum. The former needs to be substantively rewritten; the latter requires recruitment attention and sound design development. Both of them have to move forwards to some sort of conclusion within the next three months, or I’ll be annoyed at myself!

This means that the next stage is to work out where everything fits around my week; what do I absolutely need to do every day, and why; and create a rationalisation for the week.

A plan, if you will.

Doing a million things isn’t difficult if you have a plan and a schedule, and up until now I’ve had very little of both for my creative work. Perhaps it’s time to treat my hobbies like my job, and level up its organisation.

As I said to one of my advisors this week: I like to pretend I don’t need to be organised, but the truth is that I really fucking love it.

Did you enjoy this? Get a letter from me via email every week:

If you want letters in your real-life letterbox sign up here instead.

  • This field is for validation purposes and should be left unchanged.