Kategori: Use-case

DeiC Interactive HPC er uundværlig i udviklingen af danske AI-sprogmodeller

Indlægsforfatter Af line
Indlægsdato 28. marts, 2025

Af Jasper Riis-Hansen and Line Ejby Sørensen, Center for Humanities Computing (CHC), Aarhus Universitet

DeiC Interactive HPC – UCloud spiller en central rolle i projektet Danish Foundation Models (DFM), som er en del af regeringens strategiske satsning for kunstig intelligens.

Danish Foundation Models (DFM) støttes af Digitaliseringsministeriet som led i regeringens strategiske indsats for kunstig intelligens, der skal sikre, at vi i Danmark har adgang til avancerede og specialtilpassede sprogmodeller, der kan anvendes i en række sektorer, herunder sundhed, offentlig forvaltning, uddannelse og det private erhvervsliv.

Et fælles digitalt miljø

DFM-projektet forener danske universiteter, forskningsinstitutioner og erhvervspartnere i en fælles indsats for at sætte nye standarder for etisk ansvarlige og inkluderende AI-sprogteknologier.

Projektet er et samarbejde mellem Aarhus Universitet, Københavns Universitet, Syddansk Universitet og Alexandra Instituttet. DeiC Interactive HPC – UCloud spiller en helt central rolle i projektet ved at bidrage med høj datasikkerhed, skalerbar regnekraft og ikke mindst en lettilgængelig, sikker og national cloud-platform til samarbejde mellem projektets partnere.

” UCloud danner basis for et vigtigt skridt i forskningsdigitaliseringen, fordi platformen både giver nem adgang til regnekraft, som gør skalerbar dataanalyse og modellering enklere, og den udgør et sikkert miljø til håndtering af følsomme data. Platformen gør det også nemt at samarbejde på tværs af institutioner og giver os mulighed for at styre dataadgang efter behov. Det er særligt relevant for DFM-projektet, hvor der er mange partnere, der deltager på forskellige niveauer af projektet.”
Postdoc Kenneth Enevoldsen

Datasikkerhed og regnekraft

Fordi AI-modeller ofte trænes på følsomme data, er det afgørende, at databehandlingen overholder både GDPR og danske sikkerhedsstandarder. UCloud er ISO27001-certificeret og udviklet specifikt til at leve op til både danske og EU-krav for sikker databehandling.

“I DFM-projektet håndterer vi meget store mængder af data fra forskelligartede kilder – inklusive følsomme data, som modellerne skal trænes på, og det stiller store krav til datasikkerheden. Derfor er UCloud et vigtigt værktøj i projektet – netop på grund af den høje datasikkerhed og adgangen til skalerbar regnekraft.”
Postdoc Kenneth Enevoldsen

Selvom DFM også anvender nogle af de europæiske supercomputere som LUMI i Finland og Leonardo i Italien, er projektets daglige drift stærkt afhængig af UCloud. Udover at være et trinbræt til højtydende beregninger, fungerer UCloud også som en sikker og brugervenlig platform med et bredt udvalg af lettilgængelige applikationer, der er afgørende for den daglige forskning, samarbejde, databehandling og innovation på tværs af projektets brede faglige spændvidde.

Kritisk infrastruktur for dansk AI-udvikling

DFM’s ledende forskere, Kristoffer Nielbo og Peter Schneider-Kamp, fremhæver, at DeiC Interactive HPC – UClouds robuste digitale forskningsmiljø udgør kritisk infrastruktur for forskningen. Det effektiviserer arbejdsgange, styrker samarbejdet og fremskynder udviklingen af både sprog- og AI-teknologier.

“Uden UCloud ville DFM-projektet selv være nødt til at etablere denne type digital infrastruktur fra bunden med store tidsmæssige og økonomiske omkostninger til følge. Platformens rolle i DFM-projektet viser tydeligt, hvordan robuste og samarbejdsorienterede digitale forskningsmiljøer er fundamentale for Danmarks AI-strategier.”

FAKTA: Danish Foundation Models (DFM)

Danish Foundation Models (DFM) er et samarbejdsprojekt mellem Aarhus Universitet, Københavns Universitet, Syddansk Universitet og Alexandra Instituttet.

Projektet er støttet af Digitaliseringsministeriet med en bevilling på 30,7 millioner kroner og har til formål at udvikle avancerede sprogmodeller med åben adgang og gennemsigtige udviklingsprocesser.

Sprogmodellerne er særligt tilpasset dansk og andre skandinaviske sprog og kulturer og skal kunne anvendes i en bred vifte af sektorer, herunder sundhed, offentlig forvaltning, uddannelse og det private erhvervsliv.

DFM skal etablere en ny standard for etisk ansvarlig, inkluderende og gennemsigtig AI-sprogteknologi til gavn for både det danske samfund og forskningsverdenen.

Læs mere: Danish Foundation Models, Ministry of Digital Affairs press release

Tags AI, Danish Foundation Models, DFM, Large Language Models, LLM

Interactive HPC Supercomputing UCloud Use-case

DeiC Interactive HPC offers integration of advanced Quantum Computing Applications

Indlægsforfatter Af line
Indlægsdato 19. januar, 2024

Recently two advanced quantum computing applications were deployed on DeiC Interactive HPC: the NVIDIA CUDA Quantum Platform and the NVIDIA cuQuantum Appliance.

These applications show the continuous commitment to offer cutting-edge technologies to the Interactive HPC users.

“With these new applications, DeiC Interactive HPC is at the forefront of bringing quantum computing into practical, real-world use,” says Emiliano Molinaro, leader of research support at the SDU eScience Center. “The platform is now uniquely equipped to support the development of quantum algorithms and simulations, offering unprecedented level of computational power and flexibility.”

We hope that DeiC Interactive HPC’s deployment of these NVIDIA applications will be useful for a wide array of users, from academic researchers to industry professionals, seeking to explore the uncharted territories of quantum computing. It represents not only an enhancement of DeiC Interactive HPC’s offerings but also a significant contribution to the Danish quantum computing ecosystem.

Check out the full story on the SDU eScience website.

Interactive HPC Supercomputing UCloud Use-case

Supercomputing for computational linguistics and (social) media data

Indlægsforfatter Af line
Indlægsdato 28. september, 2023

Supercomputing has long been associated with areas such as physics, engineering, and data science. However, researchers in humanities at Aarhus University are increasingly turning to supercomputing allowing them to delve into unexplored territories and discover new insights.
From analysing historical archives to simulating ancient civilizations to analysing social media data, supercomputing offers unique opportunities to generate insights and advance knowledge in humanities.

In this article series, we highlight three cases with humanities researchers from Aarhus University that illustrate the varied ways in which supercomputing is being used in humanities research.

While many studies are based on historical data, the research of Rebekah Baglini, Associate Professor in Linguistics at Interacting Minds Centre, Aarhus University is an excellent example of supercomputing applied to recent data in the humanities.  

She employs supercomputing in her current projects involving the collection, processing, and annotation of large-scale media data from traditional and social media sources. By examining this diverse range of data, Rebekah Baglini investigates causal inference and causal reasoning from a linguistic perspective. Her research involves the application of semantic model theory and computational methods to uncover insights in linguistics.

“I aim to develop computationally assisted methods to identify trends in the discursive and informational landscape around topics concerning media dynamics, public health and science communication, crisis and risk messaging, as well as the emergence of mis- and dis-information”. 
Rebekah Baglini, Associate Professor in Linguistics, Aarhus University

In addition to her linguistic investigations, Rebekah Baglini also strives to enhance the existing computational language models for multilingual natural language processing (NLP), with a particular focus on under-resourced languages.  

Humanities researchers should know the affordances of High-Performance Computing 

Rebekah’s pursuits demonstrate the continuous progress of digital humanities and the ongoing efforts to enhance existing language models, ultimately leading to a deeper understanding in the field of humanities.  

“My earlier work involved smaller language corpora and didn’t require HPC resources. However, as my projects grew in scale, involving large corpus creation, the relevance of supercomputing increased. I recognise that not all projects require HPC. However, it is useful for researchers to gain training in the affordances of HPC, parallel compute, and large models so they know what’s possible, and can potentially take on projects of larger scale or make use of state-of-the-art resources for data processing, modelling, and simulation.”  
Rebekah Baglini, Associate Professor in Linguistics, Aarhus University

This explains why NLP and Computational Linguistics have become integral to Rebekah Baglini’s teaching, enabling her to offer students practical exposure to working with extensive datasets and large language models, fostering hands-on learning opportunities. She emphasises that there is a significant learning curve when delving into the realm of supercomputing. 

“There has definitely been a learning curve involved in the transition from locally maintained clusters to the cloud based Interactive HPC platform, particularly because it is also a somewhat new service without comprehensive documentation, and my affiliation with Center for Humanities Computing at Aarhus University has been a valuable resource as there is a great deal of collective experience and knowledge to draw on in the community”.  
Rebekah Baglini, Associate Professor in Linguistics, Aarhus University

Rebekah has used the DeiC Interactive HPC system for storing and analysing news and social media in the national research project HOPE that monitored Scandinavian user behaviour during Covid-19.

Today she uses the system in her own AUFF Starting Grant Project CROSS: Causal Reasoning and Online Science Scepticism to train language models to identify and analyse emerging narratives that undermine or counteract verified messaging on scientific findings and public health recommendations.

You have just read the third and final case in our series on Interactive HPC usage in humanities.
Through these compelling cases it becomes evident that supercomputing in humanities research is transforming traditional approaches, empowering researchers to uncover new insights and deepen our understanding of the field.  It opens doors to interdisciplinary collaborations and expands the possibilities for data analysis and modelling, ultimately shaping the future of digital humanities. 

Check out the other two cases featuring Katrine Frøkjær Baunvig and the case of creating a Grundtvig-artificial intelligence using HPC and Iza Romanowska and the case of Utilizing agent-based models in archaeological data.

Forskning Interactive HPC Supercomputing Ukategoriseret Use-case

Utilizing agent-based models in archaeological data  

Indlægsforfatter Af line
Indlægsdato 22. august, 2023

In this article series, we highlight three cases with humanities researchers from Aarhus University that illustrate the varied ways in which supercomputing is being used in humanities research.

Iza Romanowska is assistant professor at Aarhus University working at the Aarhus Insitute of Advanced Studies where she studies complex ancient societies.

To overcome the challenges of limited data from these ancient societies, researchers have started utilizing Agent-based model (ABM) sometimes enabled by supercomputing. ABMs are computational models that simulate the behaviour and interactions of individual entities, known as agents, within a specified environment or system. Each agent in the model is typically programmed with a set of rules or algorithms that control its behaviour, decision-making processes, and interactions with other agents and the environment.

ABM is a valuable tool in archaeology that allows us to simulate and analyse the behaviours and interactions of individuals or groups in past societies, and the use of ABM allows comparison of the model against real archaeological data.
Assistant Professor Iza Romanowska

In one of Iza Romanowska’s studies, agent-based modelling (ABM) made it possible for her and her colleagues to explore the Roman economy in the context of long-distance trade, using ceramic tableware to understand the distribution patterns and buying strategies of traders in the Eastern Mediterranean between 200 BC and AD 300.  

The potential of supercomputing in humanities becomes particularly evident when studying such societies with only limited data as experienced by archaeologists and historians. Iza Romanowska explains that the availability of data is limited in her field compared to other disciplines, stating that while social scientists studying more contemporary populations have access to abundant amounts of data such as the number of traders, transactions, and values, “we have none of this information.” Therefore, the use of HPC has been essential for her research.  

ABM as methodological tool necessitates running the simulation many times, and by many, I mean eight hundred thousand times, and that is possible with a laptop… if one plans to be doing their Ph.D. for 500 years. Supercomputing is bigger, faster, better without any qualitative change in terms of the research.
Assistant Professor Iza Romanowska

Using a high-performance computer like the DeiC Interactive HPC system enhances the scalability and speed of ABMs, allowing researchers to gain deeper insights into the behavior and outcomes of complex systems. The DeiC Interactive HPC facility hosts out-of-the-box tools, like NetLogo, for working with ABM. Researchers can also use ABM frameworks for Python or R in one of the many development apps like JupyterLab or Coder.  

Supercomputing and coding as research tools advance humanities research 

While humanities data in general is plentiful and can be analysed effectively, Iza Romanowska finds that there is a gap in understanding the underlying processes that generate the observed patterns, resulting in underdeveloped explanatory frameworks. Her point is that the lack of formal tools for theory building and testing remains a major disciplinary issue. 

“Within humanities including archaeology and history, data analysis is well-established. However, there’s a kind of fundamental disciplinary problem with that we don’t have or use many computational tools for theory building and theory testing. Supercomputing as a tool for the humanities can contribute to fill this gap and strengthen theory building and ultimately it can advance the field of humanities research.”  
Assistant Professor Iza Romanowsk

Iza Romanowska believes that more people in humanities should learn to code to take advantage of the possibilities offered by their data. She suggests that supercomputing can be a natural progression from this. While many humanities researchers may not feel like they need supercomputing, perhaps they are simply not asking questions that could benefit from high-performance computing (HPC). 

I would especially encourage junior researchers in the humanities to embrace supercomputing. It never hurts to acquire a skill, and many of these tools are becoming so easily available that it’s almost a shame to not use them.

You have just read the second of three cases in our series on Interactive HPC usage in humanities.
Through these compelling cases it becomes evident that supercomputing in humanities research is transforming traditional approaches, empowering researchers to uncover new insights and deepen our understanding of the field.  It opens doors to interdisciplinary collaborations and expands the possibilities for data analysis and modelling, ultimately shaping the future of digital humanities. 

Stay tuned for our third case featuring Rebekah Baglini representing her field of linguistics and check out the first case featuring Katrine Frøkjær Baunvig and the case of creating a Grundtvig-artificial intelligence using HPC

Forskning Interactive HPC Supercomputing UCloud Use-case

Creating a Grundtvig-artificial intelligence using HPC

Indlægsforfatter Af line
Indlægsdato 28. juni, 2023

Beyond Tradition
Unveiling the Uses of Supercomputing in Humanities.

In this article series, we highlight three cases with humanities researchers from Aarhus University that illustrate the varied ways in which supercomputing is being used in humanities research. 

Katrine Frøkjær Baunvig, head of the Grundtvig Center at Aarhus University has used supercomputing as a methodological approach, and it has led her to non-trivial conclusions that significantly impact our understanding of of 19th-century nation builder and prominent pastor N.F.S. Grundtvig ‘s vast body of works and his immense influence on Danish culture.  

In order to conduct a certain type of text mining, so-called word embeddings, she has created an artificial intelligence of Grundtvig, enabling a comprehensive analysis of his over 1000 works and 8 million words, resulting in unprecedented insights.

Grundtvig’s worldview: analysed by Katrine Frøkjær Baunvig in the upcoming paper ”Benign Structures. The Worldview of Danish National Poet, Pastor, and Politician N.F.S. Grundtvig”.

This approach has ushered in a completely new era in Grundtvig research, according to Katrine Frøkjær Baunvig. She dismisses the criticism of digital humanities sceptics who argue that word embedding fails to consider the surrounding context of words. 

“This type of rejection is prevalent only among researchers who have not taken the time to understand or familiarize themselves with the current state and level of the research. When creating a word embedding, I obtain a vast mapping of a given word’s extensive association structure. Therefore, I can clearly discern different semantic focal points and contexts where the word appears in Grundtvig’s body of work. This is precisely what allows me to gain an overview.” 
Katrine Frøkjær Baunvig, Head of the Grundtvig Center at Aarhus University

Katrine Frøkjær Baunvig opted to form a research partnership with the Center for Humanities Computing at Aarhus University. Her best advice for other researchers going into supercomputing in the humanities is to team up with the right people.

“Stepping into the world of supercomputing requires an approach to work processes that, in my opinion, represents a new trend in the humanities, namely, interdisciplinary collaborations and team-based publishing. Someone takes care of what is typically called the domain expert area – in this case, knowledge of Grundtvig’s authorship – while others handle the more technical aspects of execution.”
Katrine Frøkjær Baunvig, Head of the Grundtvig Center at Aarhus University

She also emphasises the importance of comprehending the workings of the tools to better harness the power of supercomputing. 

“Even if you may not be able to train your algorithm yourself, it can be very practical to devote time and energy to obtain an operational understanding of the steps involved in creating a Grundtvig-artificial intelligence and the various types of applications such an intelligence can be used for.”
Katrine Frøkjær Baunvig, Head of the Grundtvig Center at Aarhus University

Grundtvig’s use of colour terms confirming his claim written to Ingemann: That one cannot paint Christ with colour. A point unfolded in another upcoming paper ”Med Farver kan man ingen Christus male” En komputationel udforskning af farvebrugen i Grundtvigs forfatterskab” by Katrine Frøkjær Baunvig.

With years of experience in using supercomputing in her research, Katrine plans to continue using it and encourages others to do so when it seems fit. Especially in times where humanities research is often dismissed as lacking scientific rigor, Katrine Frøkjær Baunvig sees an opportunity to make an impact.  With a keen sense of responsibility to bring her field forward, she is determined to prove that humanities research can be just as methodical and rigorous as research in any other discipline.  

“Researchers who have pioneering eagerness should explore supercomputing as it can give them a head start by venturing into “blue ocean” territory.” 
Katrine Frøkjær Baunvig, Head of the Grundtvig Center at Aarhus University

Katrine Frøkjær Baunvig has used the DeiC Interactive HPC system for a range of NLP tasks such as linguistic normalisation of historical Danish, semantic representation learning and inference, and finally, historical chat bot development based on custom Large Language Model for Danish. 

You have just read the first of three cases in our series on Interactive HPC usage in humanities.
Through these compelling cases it becomes evident that supercomputing in humanities research is transforming traditional approaches, empowering researchers to uncover new insights and deepen our understanding of the field.  It opens doors to interdisciplinary collaborations and expands the possibilities for data analysis and modelling, ultimately shaping the future of digital humanities. 

Stay tuned for our second and third case featuring Iza Romanowska and Rebekah Baglini representing their fields of archaeology and linguistics .

Forskning Interactive HPC Supercomputing UCloud Use-case

UCloud as a complementary HPC tool within theoretical particle physics

Indlægsforfatter Af line
Indlægsdato 2. december, 2022

Though supercomputers form the key basis of his research, UCloud has been a valuable, complementary tool for Tobias and his colleagues and will most likely continue to be so in future work as well.

Post.doc. Tobias Tsang works within the broader research field of theoretical particle physics. As part of the Centre for Cosmology and Particle Physics Phenomenology (CP³-Origins) at University of Southern Denmark, his research more specifically concerns quantum field theory and quantum chromodynamics (QCD), i.a. how fundamental particles, protons and neutrons, interact with each other:

My research aims to provide high precision predictions based solely on the theory of the Standard Model – the best-known understanding of the interaction of fundamental (i.e. not containing ‘smaller constituents’) particles. This is done via very large-scale numerical simulations using the most powerful supercomputers around the world.
Post.doc. Tobias Tsang, Centre for Cosmology and Particle Physics Phenomenology (CP³-Origins) at University of Southern Denmark

Experience and achievements

More traditional mathematical methods that can be written down with pen and paper do not apply for research on quantum chromodynamics. As such, Tobias’ research relies on a method called ‘Monte Carlo’ which is applied to compute statistical field theories of simple particle systems. Though this type of research is done using very large supercomputers, Tobias has recurrently applied UCloud for exploratory studies of smaller volumes of data:

When doing large scale simulations, we sometimes do it on something called ‘10,000 cores in parallel’, and clearly this is not something we can easily do on a resource like UCloud. But for the small exploratory studies, UCloud is a nice resource in the sense that it is available; you don’t have to sit here on a hot day and burn your laptop to death – you can send it to UCloud and run it there. I think this is kind of the point where I have used UCloud the most; for small exploratory studies and some of the projects that don’t need a huge amount of computer time but still a significant portion.
Post.doc. Tobias Tsang

Though UCloud has served as a supplemental rather than a key tool in Tobias’ work together with the CP³-Origins research centre, he describes it as a nice complement to other HPC resources:

“I don’t think UCloud will ever be the only resource we use. But this is also the design of it; UCloud is not meant to be a huge machine, it is meant to be an available resource that is easy to use and that gives you a playground to set up things really from scratch where you can test things out and run smaller jobs and analyses. In that sense, it is quite complementary to a lot of the things we normally work with. For exploratory studies and for code testing, UCloud will definitely remain very useful.”
Post.doc. Tobias Tsang

At one specific project done at SDU as a collaboration between CP³ and IMADA (Institute of Mathematics and Data Science) a few years back, the vast majority of samples were generated on UCloud, and a significant amount of data production and measurements were also carried out on there [1]. UCloud needs, however, to be considered a part of a whole, according to Tobias:

“It is not that one particular machine made it possible; we would otherwise have found another machine to run it on. But UCloud provided us with a nice set up where we could just use local resources without having to go through big grant applications to get computer time.”
Post.doc. Tobias Tsang

Pros and cons

In terms of time optimization, UCloud has also been a game changer for Tobias:

One of the nice things about UCloud compared to other machines is the wall clock time: quite often, for larger clusters, depending on the cluster though, you are very much restricted by the queue policies. So, there are some clusters where you have a maximum run time of 4 hours, and if you happen to run a small job that is longer than this, then you can’t – you have to always tailor your job to fit exactly and to make the maximum use of it. On UCloud you have a 200-hour wall clock. This is very helpful as for a lot of these things that have to run sequentially, you might not need a huge resource, you just need to have a long enough time span to actually do it.
Post.doc. Tobias Tsang

Though UCloud slowed the work process down a bit in the beginning as everything had to be installed and set up, this downside was quickly resolved and overshadowed by the benefits:

“Once you get used to it, you can kind of equalize the work process to what you would have on a cluster where everything is just readily installed.”
Post.doc. Tobias Tsang

Despite pros and cons, Tobias describes UCloud as a flexible system:

The fact that UCloud is really just a virtual machine has both positive and negative sides. The positive side is that you are really free to do whatever you want to do; you can install everything and you don’t have any restrictions that you would have on larger clusters where you can’t easily install software, or you can’t install it into the parts where you want to install it. On larger clusters, you are typically limited by the compilers that are already there. So, from that point of view, UCloud, at least to me, seems like a more flexible system. The downside is that you have to install everything; you can’t just quickly run something, you kind of have to constantly install everything from scratch.
Post.doc. Tobias Tsang

Last but not least, Tobias stresses the interaction with the UCloud front office as a major benefit that has helped the research group significantly, especially compared to other clusters with a much longer response time:

One of the nice things with UCloud as a general system is that every time something didn’t work, we got a really quick email back. Any questions we raised were answered quickly, so it was never something that kept us stuck for weeks or months – typically things were resolved in a very timely time scale. And things that we actively suggested as nice features or things that we thought were missing on UCloud were likewise addressed.
Post.doc. Tobias Tsang

[1] Della Morte, Jaeger, Sannino, Tsang and Ziegler, “One Flavour QCD as an analogue computer for SUSY”, PoS LATTICE2021 (2022) 225, https://doi.org/10.22323/1.396.0225

Forskning Interactive HPC UCloud Use-case

National Health Data Science Sandbox for Training and Research

Indlægsforfatter Af line
Indlægsdato 29. september, 2022

UCloud is not just an ideal platform for the individual researcher who wants interactive access to HPC resources or an easy way to collaborate with national or international partners. It is also highly suitable for teaching. Jennifer Bartell and Samuele Soraggi, who are both working on the project National Health Data Science Sandbox for Training and Research, share their experiences with using UCloud.

National “sandbox” platform

The growing amounts of data in all research fields offer researchers new opportunities and possibilities for scientific breakthrough. In the case of health science, the use of large amounts of data has great potential to improve our health care – it can e.g. expand our ability to understand and diagnose diseases. One of the constraints of using health data is that many datasets (e.g. person-specific health records or genomics data) are sensitive from a patient privacy perspective and governed by strict access and usage guidelines. This can be a major challenge in particular for students or researchers who are just learning best practices in handling health data while also developing data science skills.

Go to SDU eScience for full story

Tags Interactive HPC, UCloud

Forskning Use-case

Examining far-right memory practices by use of digital methods

Indlægsforfatter Af line
Indlægsdato 3. maj, 2022

“I believe that when combining digital methods with the humanities, a lot of really great research can be done.”
Phillip Stenmann Baun, Aarhus University

Ph.d. student at the Department of Global Studies (Aarhus University), Phillip Stenmann Baun, has a background in History and is now working within the interdisciplinary field of memory studies and digital methods. As an offshoot of work done during his master’s thesis, Phillip’s Ph.d. project, “Reject degeneracy; Remember tradition!” A Study of Far-Right Digital Memory Practices, examines uses of the past in far-right communication on digital media through the use of digital methods and Natural Language Processing (NLP) techniques. Based on large amounts of data from the “politically incorrect” forum on 4chan.org, the aim of the project is to examine how memory and history inform the contemporary far-right imaginary.

Thoughts on interdisciplinarity

I am proud to call myself ‘interdisciplinary’ – I work within the field of ‘memory studies’, a field that draws its strength from many different disciplines, and I definitely see the interdisciplinary approach as very productive.
Phillip Stenmann Baun, Aarhus University

According to Phillip, a fundamental prerequisite for the project’s methodology is the combination of digital methods with a traditional hermeneutic approach: by using digital methods, the project can examine large amounts of data that would otherwise be inaccessible. However, the critical humanist cannot be completely replaced by computational methods, for example when designing the algorithms and interpreting the results:

How do I manage to operationalize concepts such as ‘memory’, ‘culture’, or ‘identity’ into something that a machine can read? I find such questions extremely interesting, because it is precisely here that the role of historians and others within the humanities really becomes relevant; it is in fact the humanities who need to both ask and answer these questions in order to actually understand and interpret the empirical material.
Phillip Stenmann Baun, Aarhus University

As such, the dialogue between digital methods and the humanities is essential during the whole process, not just when analysing the data; for example, Phillip’s initial meeting with classification algorithms and topic modelling when working on his master’s thesis led to an increased awareness of the importance of semantics when developing computational models:

When searching for specific historical entities, I need to consider carefully how they are represented lexically in my material.
Phillip Stenmann Baun, Aarhus University

Both topic modelling and sentiment analysis, i.e. computational linguistic analysis of affective states, will be central for Phillip’s further research on far-right memory and crucial for identifying words and expressions within the dataset related to the collective heading ‘memory’.

Approaching UCloud from a humanities perspective

UCloud played a key role for Phillip when working on his master’s thesis, especially in the initial phase when he needed to interpret the performance of his predictive models. Optimizing and fine-tuning the so-called parameters of algorithmic models, especially when working with a lot of data, usually involves heavy amounts of computational power. Without UCloud, this process would have taken much longer.

Working within the sphere of digital methods, however, was – and still is – challenging for a researcher with a background in the humanities, though assistance and easy accessibility has made the process much simpler:

I am really an outsider regarding everything concerning digital methods. When I was first introduced to UCloud, I was still ‘on the outside’ regarding many of these things. But the interface is fairly straightforward even though I had difficulties knowing where to begin at first.
Phillip Stenmann Baun, Aarhus University

In general, Phillip stresses that much value can be gained from implementing digital methods in the humanities, if researchers are open towards the new computational developments within a traditionally very analogue field:

It’s a bit of a shame that we are not more ‘digitally literate’ within the humanities and do not apply programmatic thinking more in both teaching and research. In some ways, many of us are still sceptic regarding the computational development and may consider digital methods as less qualified than traditional hermeneutic interpretation. This scepticism can only be overcome by showing the potential benefits in marrying digital methods with humanities research, where the strengths of each side gets to complement the other.
Phillip Stenmann Baun, Aarhus University

With his newly started Ph.d. project on far-right memory, a project heavily reliant on a digital approach, Phillip is part of an increasing number of researchers within the humanities who are currently paving the way for future interdisciplinary developments within this field and is an inspiration to researchers and students alike.

Front office support for 4chan data

For this project the Interactive HPC Front Office at Aarhus University, CHCAA assisted by collecting all 4chan posts from the #pol channel for the previous two years. In this case CHCAA-frontoffice had already, due to prior requests, created an api-crawler for 4chan that ensured access to the required data.

The 4chan api (https://github.com/4chan/4chan-API) only provides historical data for a very limited timespan, so to be able to analyze a larger timespan of data we created an api-crawler that, on a daily basis, fetches all new threads (post and replies) since last fetch and stores them in a json-file.
Peter Vahlstrup, CHCAA-frontoffice, Aarhus University

There are no obstacles in accessing the 4chan data as 4chan both offers an endpoint for archived threads, that contains threads which can no longer be commented on because they have been “pushed off the last page of a board” [https://github.com/4chan/4chan-API/blob/master/pages/Archive.md] and an endpoint for fetching live threads, but there are pros and cons to both endpoints.

Using the archived and therefore locked threads endpoint ensure that all replies are included, but it has the downside of reduced information about the author of the post/reply. Using the live threads endpoint has a downside in retrieving all replies. Ideally, the thread should be fetched just before it is “pushed off the last page of a board” because it is now no longer possible to comment on the thread. Every time a thread is commented on it is bumped to the top of the board until it reaches the bump-limit and then is automatically locked. (Not all threads reach the limit though, so the bump-limit cannot be used a measurement alone for when to collect the thread.) This means that if we use the live threads endpoint, we need to fetch data more often to get all threads with all replies.
Peter Vahlstrup, CHCAA-frontoffice, Aarhus University

During the project [CHCAA-frontoffice] experimented with both endpoints and, as foreseen, the archived threads endpoint is the easiest to work with, but they found that the live threads endpoint is also possible to work with when you fetch everything every 12 – 24 hours and subsequently delete all other versions of the same thread so only the most recent version with the most up-to-date data is preserved.

Tags 4chan, hate speech, topic modelling

Forskning Use-case

Detecting text reuse in H.C. Andersen’s work

Indlægsforfatter Af line
Indlægsdato 3. marts, 2022

(…)In 2019, senior researcher Ejnar Stig Askgaard from Odense City Museums began comparing Hans Christian Andersen’s notes, written between approximately 1833 – 1875, with the 162 fairy tales, novels and autobiographies. This had led to the discovery that Hans Christian Andersen liked to use symbols such as cross marks or deletions in his notes to indicate that the note had been reused in his fairytales.

For Detecting text reuse in H.C. Andersen’s work, Berg wanted to find out where each note had been reused. Earlier research had managed to manually identify where 278 notes had been reused in Hans Christian Andersen’s published work, but this had been a time-consuming effort, taking many months of work.

As 861 of the notes had been digitalized in addition to Hans Christian Andersen’s published work, Berg was able to apply digital methods to solve his problem. He contacted Zhiru Sun, Assistant Professor at the Department of Design and Communication at SDU, who used a method called Natural Language Processing to find similarities between the notes and Hans Christian Anderson’s work. Using the Python application on UCloud, this method generated a number of tables, which indicated how similar a specific note is to a specific fairytale.

This is an excerpt. Click here for the full story.

Forskning Undervisning Use-case

Digital Humanities

Indlægsforfatter Af line
Indlægsdato 2. marts, 2022

Researchers within the field of humanities are typically not heavy users of HPC (High Performance Computing) or cloud computing. However, a book, once digitalized, is actually quite a big data set. Assistant Professor at the department of Design and Communication, Zhiru Sun, tells us how she has been helping researchers from the Faculty of Humanities at SDU solve their research problems through digital methods and how using computing resources such as UCloud, also called DeiC Interactive HPC, can be a highly viable option if your project e.g. involves looking for patterns and similarities in digitalized texts.

This is an excerpt. Click here for the full story.

DeiC Interactive HPC – UCloud spiller en central rolle i projektet Danish Foundation Models (DFM), som er en del af regeringens strategiske satsning for kunstig intelligens.

Et fælles digitalt miljø

Datasikkerhed og regnekraft

Kritisk infrastruktur for dansk AI-udvikling

FAKTA: Danish Foundation Models (DFM)

Humanities researchers should know the affordances of High-Performance Computing

Supercomputing and coding as research tools advance humanities research

Beyond TraditionUnveiling the Uses of Supercomputing in Humanities.

Experience and achievements

Pros and cons

National “sandbox” platform

Thoughts on interdisciplinarity

Approaching UCloud from a humanities perspective

Front office support for 4chan data

Humanities researchers should know the affordances of High-Performance Computing 

Supercomputing and coding as research tools advance humanities research 

Beyond Tradition
Unveiling the Uses of Supercomputing in Humanities.