Categories
Research

Examining far-right memory practices by use of digital methods

“I believe that when combining digital methods with the humanities, a lot of really great research can be done.”

Phillip Stenmann Baun, Aarhus University

Ph.d. student at the Department of Global Studies (Aarhus University), Phillip Stenmann Baun, has a background in History and is now working within the interdisciplinary field of memory studies and digital methods. As an offshoot of work done during his master’s thesis, Phillip’s Ph.d. project, “Reject degeneracy; Remember tradition!” A Study of Far-Right Digital Memory Practices, examines uses of the past in far-right communication on digital media through the use of digital methods and Natural Language Processing (NLP) techniques. Based on large amounts of data from the “politically incorrect” forum on 4chan.org, the aim of the project is to examine how memory and history inform the contemporary far-right imaginary.

Thoughts on interdisciplinarity

I am proud to call myself ‘interdisciplinary’ – I work within the field of ‘memory studies’, a field that draws its strength from many different disciplines, and I definitely see the interdisciplinary approach as very productive.

Phillip Stenmann Baun, Aarhus University

According to Phillip, a fundamental prerequisite for the project’s methodology is the combination of digital methods with a traditional hermeneutic approach: by using digital methods, the project can examine large amounts of data that would otherwise be inaccessible. However, the critical humanist cannot be completely replaced by computational methods, for example when designing the algorithms and interpreting the results:

How do I manage to operationalize concepts such as ‘memory’, ‘culture’, or ‘identity’ into something that a machine can read? I find such questions extremely interesting, because it is precisely here that the role of historians and others within the humanities really becomes relevant; it is in fact the humanities who need to both ask and answer these questions in order to actually understand and interpret the empirical material.

Phillip Stenmann Baun, Aarhus University

As such, the dialogue between digital methods and the humanities is essential during the whole process, not just when analysing the data; for example, Phillip’s initial meeting with classification algorithms and topic modelling when working on his master’s thesis led to an increased awareness of the importance of semantics when developing computational models:

When searching for specific historical entities, I need to consider carefully how they are represented lexically in my material.

Phillip Stenmann Baun, Aarhus University

Both topic modelling and sentiment analysis, i.e. computational linguistic analysis of affective states, will be central for Phillip’s further research on far-right memory and crucial for identifying words and expressions within the dataset related to the collective heading ‘memory’.

Approaching UCloud from a humanities perspective

UCloud played a key role for Phillip when working on his master’s thesis, especially in the initial phase when he needed to interpret the performance of his predictive models. Optimizing and fine-tuning the so-called parameters of algorithmic models, especially when working with a lot of data, usually involves heavy amounts of computational power. Without UCloud, this process would have taken much longer.

Working within the sphere of digital methods, however, was – and still is – challenging for a researcher with a background in the humanities, though assistance and easy accessibility has made the process much simpler:

I am really an outsider regarding everything concerning digital methods. When I was first introduced to UCloud, I was still ‘on the outside’ regarding many of these things. But the interface is fairly straightforward even though I had difficulties knowing where to begin at first.

Phillip Stenmann Baun, Aarhus University

In general, Phillip stresses that much value can be gained from implementing digital methods in the humanities, if researchers are open towards the new computational developments within a traditionally very analogue field:

It’s a bit of a shame that we are not more ‘digitally literate’ within the humanities and do not apply programmatic thinking more in both teaching and research. In some ways, many of us are still sceptic regarding the computational development and may consider digital methods as less qualified than traditional hermeneutic interpretation. This scepticism can only be overcome by showing the potential benefits in marrying digital methods with humanities research, where the strengths of each side gets to complement the other.

Phillip Stenmann Baun, Aarhus University

With his newly started Ph.d. project on far-right memory, a project heavily reliant on a digital approach, Phillip is part of an increasing number of researchers within the humanities who are currently paving the way for future interdisciplinary developments within this field and is an inspiration to researchers and students alike.

Front office support for 4chan data

For this project the Interactive HPC Front Office at Aarhus University, CHCAA assisted by collecting all 4chan posts from the #pol channel for the previous two years. In this case CHCAA-frontoffice had already, due to prior requests, created an api-crawler for 4chan that ensured access to the required data.

The 4chan api (https://github.com/4chan/4chan-API) only provides historical data for a very limited timespan, so to be able to analyze a larger timespan of data we created an api-crawler that, on a daily basis, fetches all new threads (post and replies) since last fetch and stores them in a json-file.

Peter Vahlstrup, CHCAA-frontoffice, Aarhus University

There are no obstacles in accessing the 4chan data as 4chan both offers an endpoint for archived threads, that contains threads which can no longer be commented on because they have been “pushed off the last page of a board” [https://github.com/4chan/4chan-API/blob/master/pages/Archive.md] and an endpoint for fetching live threads, but there are pros and cons to both endpoints.

Using the archived and therefore locked threads endpoint ensure that all replies are included, but it has the downside of reduced information about the author of the post/reply. Using the live threads endpoint has a downside in retrieving all replies. Ideally, the thread should be fetched just before it is “pushed off the last page of a board” because it is now no longer possible to comment on the thread. Every time a thread is commented on it is bumped to the top of the board until it reaches the bump-limit and then is automatically locked. (Not all threads reach the limit though, so the bump-limit cannot be used a measurement alone for when to collect the thread.) This means that if we use the live threads endpoint, we need to fetch data more often to get all threads with all replies.

Peter Vahlstrup, CHCAA-frontoffice, Aarhus University

During the project [CHCAA-frontoffice] experimented with both endpoints and, as foreseen, the archived threads endpoint is the easiest to work with, but they found that the live threads endpoint is also possible to work with when you fetch everything every 12 – 24 hours and subsequently delete all other versions of the same thread so only the most recent version with the most up-to-date data is preserved.

Categories
Research

Detecting text reuse in H.C. Andersen’s work

(…)In 2019, senior researcher Ejnar Stig Askgaard from Odense City Museums began comparing Hans Christian Andersen’s notes, written between approximately 1833 – 1875, with the 162 fairy tales, novels and autobiographies. This had led to the discovery that Hans Christian Andersen liked to use symbols such as cross marks or deletions in his notes to indicate that the note had been reused in his fairytales. 

For Detecting text reuse in H.C. Andersen’s work, Berg wanted to find out where each note had been reused. Earlier research had managed to manually identify where 278 notes had been reused in Hans Christian Andersen’s published work, but this had been a time-consuming effort, taking many months of work.

As 861 of the notes had been digitalized in addition to Hans Christian Andersen’s published work, Berg was able to apply digital methods to solve his problem. He contacted Zhiru Sun, Assistant Professor at the Department of Design and Communication at SDU, who used a method called Natural Language Processing to find similarities between the notes and Hans Christian Anderson’s work. Using the Python application on UCloud, this method generated a number of tables, which indicated how similar a specific note is to a specific fairytale.

This is an excerpt. Click here for the full story.

Categories
Research Teaching

Digital Humanities

Researchers within the field of humanities are typically not heavy users of HPC (High Performance Computing) or cloud computing. However, a book, once digitalized, is actually quite a big data set. Assistant Professor at the department of Design and Communication, Zhiru Sun, tells us how she has been helping researchers from the Faculty of Humanities at SDU solve their research problems through digital methods and how using computing resources such as UCloud, also called DeiC Interactive HPC, can be a highly viable option if your project e.g. involves looking for patterns and similarities in digitalized texts.

This is an excerpt. Click here for the full story.

Categories
Research

Første kald om regnetid på de nationale HPC-anlæg er nu åbent

Er du forsker eller Ph.d.-studerende ved et dansk universitet kan du nu søge om adgang til regnetid på de nationale HPC-anlæg, inklusiv den danske del af EuroHPC LUMI. Opslaget er åbent for alle forskningsområder.

Der er åbent for ansøgninger om adgang til regneressourcer på de nationale HPC-anlæg. Det gælder også den danske del af EuroHPC LUMI.

This is an excerpt. Click here for the full story

Categories
Research

HPC and Social Sciences

Professor (WSR) Oliver Baumann from the Department of Business & Management at SDU tells us how he uses supercomputing for his research and gives us his take on how researchers from social sciences, who are beginning to reach a limit with their own computers, can benefit from Interactive HPC.

In a current project, Baumann collaborates with two American colleagues to study resource allocation in hierarchical organizations…

This is an excerpt. Click here for the full story.

Categories
Publication Research

DaCy: A Unified Framework for Danish NLP

A new set of Danish deep learning models for natural language processing (NLP) was trained in UCloud. Danish NLP has in recent years obtained considerable improvements with the addition of multiple new datasets and models. However, at present, there is no coherent framework for applying state-of-the-art models for Danish. We present DaCy: a unified framework for Danish NLP built on SpaCy. DaCy uses efficient multitask models which obtain state-of-the-art performance on named entity recognition, part-of-speech tagging, and dependency parsing.

This is an excerpt. Click here for the full story.

Categories
Event Research Workshop

Improve your research impact: Metadata for Machines Workshop

One Danish research group can get the unique opportunity to make their metadata machine actionable in a 2 x 1/2 day event – free of charge. The concept is developed by researchers for researchers.

The aim of the M4M WS is to work practically on how to improve your metadata and make them machine-actionable, thus complying with the Findable, Accessible, Interoperable and Reusable (FAIR) principles.

This is an excerpt. Click here for the full story.

Categories
Publication Research

When no news is bad news – Detection of negative events from news media content

During the first wave of Covid-19 information decoupling could be observed in the flow of news media content. The corollary of the content alignment within and between news sources experienced by readers (i.e., all news transformed into Corona-news), was that the novelty of news content went down as media focused monotonically on the pandemic event. This all-important Covid-19 news theme turned out to be quite persistent…

This is an excerpt. Click here for the full story.