Categories
Conference DeiC Event HPC Interactive HPC Research Supercomputing UCloud

Video use case: HPC enlightens researchers in social sciences and humanities about human behavior

Sociologist Rolf Lyneborg Lund has trained an image AI using DeiCInteractive, which can help us understand how people perceive the concepts of “good” and “bad” neighbourhoods.

Visit deic.dk to view video use case from the 2023 DeiC Conference

Categories
HPC Interactive HPC Research Supercomputing

State-of-the-art GPUs for AI available through DeiC Interactive HPC

AI companies around the world are scrambling to get their hands on the latest and most powerful NVIDIA GPU called H100. The biggest costumers include OpenAI, Microsoft and Google. Now, 16 NVIDIA H100 GPUs have landed at SDU, ready to be integrated into the DeiC Interactive HPC system. With the arrival of 4 servers with 4 H100 GPUs each at SDU, Danish researchers will be able to access the same hardware coveted by some of the biggest tech companies in the world.

Go to story

Image: NVIDIA Hopper H100 GPU. Credit: NVIDIA

Categories
DeiC HPC Interactive HPC Research Supercomputing Uncategorized

Utilizing agent-based models in archaeological data   

Supercomputing has long been associated with areas such as physics, engineering, and data science. However, researchers in humanities at Aarhus University are increasingly turning to supercomputing allowing them to delve into unexplored territories and discover new insights.
From analysing historical archives to simulating ancient civilizations to analysing social media data, supercomputing offers unique opportunities to generate insights and advance knowledge in humanities.

In this article series, we highlight three cases with humanities researchers from Aarhus University that illustrate the varied ways in which supercomputing is being used in humanities research.


Iza Romanowska is assistant professor at Aarhus University working at the Aarhus Insitute of Advanced Studies where she studies complex ancient societies.

To overcome the challenges of limited data from these ancient societies, researchers have started utilizing Agent-based model (ABM) sometimes enabled by supercomputing. ABMs are computational models that simulate the behaviour and interactions of individual entities, known as agents, within a specified environment or system. Each agent in the model is typically programmed with a set of rules or algorithms that control its behaviour, decision-making processes, and interactions with other agents and the environment.

ABM is a valuable tool in archaeology that allows us to simulate and analyse the behaviours and interactions of individuals or groups in past societies, and the use of ABM allows comparison of the model against real archaeological data.

Assistant Professor Iza Romanowska

In one of Iza Romanowska’s studies, agent-based modelling (ABM) made it possible for her and her colleagues to explore the Roman economy in the context of long-distance trade, using ceramic tableware to understand the distribution patterns and buying strategies of traders in the Eastern Mediterranean between 200 BC and AD 300.  

The potential of supercomputing in humanities becomes particularly evident when studying such societies with only limited data as experienced by archaeologists and historians. Iza Romanowska explains that the availability of data is limited in her field compared to other disciplines, stating that while social scientists studying more contemporary populations have access to abundant amounts of data such as the number of traders, transactions, and values, “we have none of this information.” Therefore, the use of HPC has been essential for her research.  

ABM as methodological tool necessitates running the simulation many times, and by many, I mean eight hundred thousand times, and that is possible with a laptop… if one plans to be doing their Ph.D. for 500 years. Supercomputing is bigger, faster, better without any qualitative change in terms of the research.

Assistant Professor Iza Romanowska

Using a high-performance computer like the DeiC Interactive HPC system enhances the scalability and speed of ABMs, allowing researchers to gain deeper insights into the behavior and outcomes of complex systems. The DeiC Interactive HPC facility hosts out-of-the-box tools, like NetLogo, for working with ABM. Researchers can also use ABM frameworks for Python or R in one of the many development apps like JupyterLab or Coder.  

Supercomputing and coding as research tools advance humanities research 

While humanities data in general is plentiful and can be analysed effectively, Iza Romanowska finds that there is a gap in understanding the underlying processes that generate the observed patterns, resulting in underdeveloped explanatory frameworks. Her point is that the lack of formal tools for theory building and testing remains a major disciplinary issue. 

“Within humanities including archaeology and history, data analysis is well-established. However, there’s a kind of fundamental disciplinary problem with that we don’t have or use many computational tools for theory building and theory testing. Supercomputing as a tool for the humanities can contribute to fill this gap and strengthen theory building and ultimately it can advance the field of humanities research.”  

Assistant Professor Iza Romanowsk

Iza Romanowska believes that more people in humanities should learn to code to take advantage of the possibilities offered by their data. She suggests that supercomputing can be a natural progression from this. While many humanities researchers may not feel like they need supercomputing, perhaps they are simply not asking questions that could benefit from high-performance computing (HPC). 

I would especially encourage junior researchers in the humanities to embrace supercomputing. It never hurts to acquire a skill, and many of these tools are becoming so easily available that it’s almost a shame to not use them.


You have just read the second of three cases in our series on Interactive HPC usage in humanities.
Through these compelling cases it becomes evident that supercomputing in humanities research is transforming traditional approaches, empowering researchers to uncover new insights and deepen our understanding of the field.  It opens doors to interdisciplinary collaborations and expands the possibilities for data analysis and modelling, ultimately shaping the future of digital humanities. 

Stay tuned for our third case featuring Rebekah Baglini representing her field of linguistics and check out the first case featuring Katrine Frøkjær Baunvig and the case of creating a Grundtvig-artificial intelligence using HPC

Categories
call HPC Research Supercomputing UCloud

Apply for HPC resources

Researchers at a Danish university have various options for gaining access to computing power at both Danish and international HPC facilities. Front office personnell, please inform your users that the fall call H1-2024 is now open for applications for access to the e-ressources.

Information about the call is found on DeiC’s website.

Categories
HPC Interactive HPC Research Supercomputing UCloud

Creating a Grundtvig-artificial intelligence using HPC

Beyond Tradition
Unveiling the Uses of Supercomputing in Humanities. 

Supercomputing has long been associated with areas such as physics, engineering, and data science. However, researchers in humanities at Aarhus University are increasingly turning to supercomputing allowing them to delve into unexplored territories and discover new insights.
From analysing historical archives to simulating ancient civilizations to analysing social media data, supercomputing offers unique opportunities to generate insights and advance knowledge in humanities.

In this article series, we highlight three cases with humanities researchers from Aarhus University that illustrate the varied ways in which supercomputing is being used in humanities research. 


Katrine Frøkjær Baunvig, head of the Grundtvig Center at Aarhus University has used supercomputing as a methodological approach, and it has led her to non-trivial conclusions that significantly impact our understanding of of 19th-century nation builder and prominent pastor N.F.S. Grundtvig ‘s vast body of works and his immense influence on Danish culture.  

In order to conduct a certain type of text mining, so-called word embeddings, she has created an artificial intelligence of Grundtvig, enabling a comprehensive analysis of his over 1000 works and 8 million words, resulting in unprecedented insights.

Grundtvig’s worldview: analysed by Katrine Frøkjær Baunvig in the upcoming paper ”Benign Structures. The Worldview of Danish National Poet, Pastor, and Politician N.F.S. Grundtvig”.

This approach has ushered in a completely new era in Grundtvig research, according to Katrine Frøkjær Baunvig. She dismisses the criticism of digital humanities sceptics who argue that word embedding fails to consider the surrounding context of words. 

“This type of rejection is prevalent only among researchers who have not taken the time to understand or familiarize themselves with the current state and level of the research. When creating a word embedding, I obtain a vast mapping of a given word’s extensive association structure. Therefore, I can clearly discern different semantic focal points and contexts where the word appears in Grundtvig’s body of work. This is precisely what allows me to gain an overview.” 

Katrine Frøkjær Baunvig, Head of the Grundtvig Center at Aarhus University

Katrine Frøkjær Baunvig opted to form a research partnership with the Center for Humanities Computing at Aarhus University. Her best advice for other researchers going into supercomputing in the humanities is to team up with the right people.  

“Stepping into the world of supercomputing requires an approach to work processes that, in my opinion, represents a new trend in the humanities, namely, interdisciplinary collaborations and team-based publishing. Someone takes care of what is typically called the domain expert area – in this case, knowledge of Grundtvig’s authorship – while others handle the more technical aspects of execution.

Katrine Frøkjær Baunvig, Head of the Grundtvig Center at Aarhus University

She also emphasises the importance of comprehending the workings of the tools to better harness the power of supercomputing.  

“Even if you may not be able to train your algorithm yourself, it can be very practical to devote time and energy to obtain an operational understanding of the steps involved in creating a Grundtvig-artificial intelligence and the various types of applications such an intelligence can be used for.”

Katrine Frøkjær Baunvig, Head of the Grundtvig Center at Aarhus University
Grundtvig’s use of colour terms confirming his claim written to Ingemann: That one cannot paint Christ with colour. A point unfolded in another upcoming paper ”Med Farver kan man ingen Christus male” En komputationel udforskning af farvebrugen i Grundtvigs forfatterskab” by Katrine Frøkjær Baunvig.

With years of experience in using supercomputing in her research, Katrine plans to continue using it and encourages others to do so when it seems fit. Especially in times where humanities research is often dismissed as lacking scientific rigor, Katrine Frøkjær Baunvig sees an opportunity to make an impact.  With a keen sense of responsibility to bring her field forward, she is determined to prove that humanities research can be just as methodical and rigorous as research in any other discipline.  

“Researchers who have pioneering eagerness should explore supercomputing as it can give them a head start by venturing into “blue ocean” territory.” 

Katrine Frøkjær Baunvig, Head of the Grundtvig Center at Aarhus University

Katrine Frøkjær Baunvig has used the DeiC Interactive HPC system for a range of NLP tasks such as linguistic normalisation of historical Danish, semantic representation learning and inference, and finally, historical chat bot development based on custom Large Language Model for Danish. 


You have just read the first of three cases in our series on Interactive HPC usage in humanities.
Through these compelling cases it becomes evident that supercomputing in humanities research is transforming traditional approaches, empowering researchers to uncover new insights and deepen our understanding of the field.  It opens doors to interdisciplinary collaborations and expands the possibilities for data analysis and modelling, ultimately shaping the future of digital humanities. 

Stay tuned for our second and third case featuring Iza Romanowska and Rebekah Baglini representing their fields of archaeology and linguistics .

Categories
HPC Interactive HPC Research Supercomputing UCloud

UCloud as a complementary HPC tool within theoretical particle physics

Though supercomputers form the key basis of his research, UCloud has been a valuable, complementary tool for Tobias and his colleagues and will most likely continue to be so in future work as well.

Post.doc. Tobias Tsang works within the broader research field of theoretical particle physics. As part of the Centre for Cosmology and Particle Physics Phenomenology (CP3-Origins) at University of Southern Denmark, his research more specifically concerns quantum field theory and quantum chromodynamics (QCD), i.a. how fundamental particles, protons and neutrons, interact with each other:

My research aims to provide high precision predictions based solely on the theory of the Standard Model – the best-known understanding of the interaction of fundamental (i.e. not containing ‘smaller constituents’) particles. This is done via very large-scale numerical simulations using the most powerful supercomputers around the world.

Post.doc. Tobias Tsang, Centre for Cosmology and Particle Physics Phenomenology (CP3-Origins) at University of Southern Denmark

Experience and achievements

More traditional mathematical methods that can be written down with pen and paper do not apply for research on quantum chromodynamics. As such, Tobias’ research relies on a method called ‘Monte Carlo’ which is applied to compute statistical field theories of simple particle systems. Though this type of research is done using very large supercomputers, Tobias has recurrently applied UCloud for exploratory studies of smaller volumes of data:

When doing large scale simulations, we sometimes do it on something called ‘10,000 cores in parallel’, and clearly this is not something we can easily do on a resource like UCloud. But for the small exploratory studies, UCloud is a nice resource in the sense that it is available; you don’t have to sit here on a hot day and burn your laptop to death – you can send it to UCloud and run it there. I think this is kind of the point where I have used UCloud the most; for small exploratory studies and some of the projects that don’t need a huge amount of computer time but still a significant portion.

Post.doc. Tobias Tsang

Though UCloud has served as a supplemental rather than a key tool in Tobias’ work together with the CP3-Origins research centre, he describes it as a nice complement to other HPC resources:

“I don’t think UCloud will ever be the only resource we use. But this is also the design of it; UCloud is not meant to be a huge machine, it is meant to be an available resource that is easy to use and that gives you a playground to set up things really from scratch where you can test things out and run smaller jobs and analyses. In that sense, it is quite complementary to a lot of the things we normally work with. For exploratory studies and for code testing, UCloud will definitely remain very useful.”

Post.doc. Tobias Tsang

At one specific project done at SDU as a collaboration between CP3 and IMADA (Institute of Mathematics and Data Science) a few years back, the vast majority of samples were generated on UCloud, and a significant amount of data production and measurements were also carried out on there [1]. UCloud needs, however, to be considered a part of a whole, according to Tobias:

“It is not that one particular machine made it possible; we would otherwise have found another machine to run it on. But UCloud provided us with a nice set up where we could just use local resources without having to go through big grant applications to get computer time.”

Post.doc. Tobias Tsang

Pros and cons

In terms of time optimization, UCloud has also been a game changer for Tobias:

One of the nice things about UCloud compared to other machines is the wall clock time: quite often, for larger clusters, depending on the cluster though, you are very much restricted by the queue policies. So, there are some clusters where you have a maximum run time of 4 hours, and if you happen to run a small job that is longer than this, then you can’t – you have to always tailor your job to fit exactly and to make the maximum use of it. On UCloud you have a 200-hour wall clock. This is very helpful as for a lot of these things that have to run sequentially, you might not need a huge resource, you just need to have a long enough time span to actually do it.

Post.doc. Tobias Tsang

Though UCloud slowed the work process down a bit in the beginning as everything had to be installed and set up, this downside was quickly resolved and overshadowed by the benefits: 

“Once you get used to it, you can kind of equalize the work process to what you would have on a cluster where everything is just readily installed.”

Post.doc. Tobias Tsang

Despite pros and cons, Tobias describes UCloud as a flexible system:

The fact that UCloud is really just a virtual machine has both positive and negative sides. The positive side is that you are really free to do whatever you want to do; you can install everything and you don’t have any restrictions that you would have on larger clusters where you can’t easily install software, or you can’t install it into the parts where you want to install it. On larger clusters, you are typically limited by the compilers that are already there. So, from that point of view, UCloud, at least to me, seems like a more flexible system. The downside is that you have to install everything; you can’t just quickly run something, you kind of have to constantly install everything from scratch.

Post.doc. Tobias Tsang

Last but not least, Tobias stresses the interaction with the UCloud front office as a major benefit that has helped the research group significantly, especially compared to other clusters with a much longer response time:

One of the nice things with UCloud as a general system is that every time something didn’t work, we got a really quick email back. Any questions we raised were answered quickly, so it was never something that kept us stuck for weeks or months – typically things were resolved in a very timely time scale. And things that we actively suggested as nice features or things that we thought were missing on UCloud were likewise addressed.

Post.doc. Tobias Tsang

[1]  Della Morte, Jaeger, Sannino, Tsang and Ziegler, “One Flavour QCD as an analogue computer for SUSY”, PoS LATTICE2021 (2022) 225, https://doi.org/10.22323/1.396.0225

Categories
Interactive HPC Research UCloud

National Health Data Science Sandbox for Training and Research

UCloud is not just an ideal platform for the individual researcher who wants interactive access to HPC resources or an easy way to collaborate with national or international partners. It is also highly suitable for teaching. Jennifer Bartell and Samuele Soraggi, who are both working on the project National Health Data Science Sandbox for Training and Research, share their experiences with using UCloud.

National “sandbox” platform

The growing amounts of data in all research fields offer researchers new opportunities and possibilities for scientific breakthrough. In the case of health science, the use of large amounts of data has great potential to improve our health care – it can e.g. expand our ability to understand and diagnose diseases. One of the constraints of using health data is that many datasets (e.g. person-specific health records or genomics data) are sensitive from a patient privacy perspective and governed by strict access and usage guidelines. This can be a major challenge in particular for students or researchers who are just learning best practices in handling health data while also developing data science skills.

Go to SDU eScience for full story

Categories
Conference Data Management DeiC Event HPC Interactive HPC Research Supercomputing

Tilmeldingen til DeiC konference 2022 er åben

Nu er der åbnet for tilmeldingen til årets DeiC konference med fokus på modenhed og tilpasning.

Konferencens hovedtema er ”Alignment and Maturity: Implementing Research Infrastructure Solutions”.

Programmet er inddelt i fire spor: Data management, supercomputing (HPC), net og tjenester, samt sikkerhed. Inden for hvert spor vil der blive fokuseret på ’maturity’ og ’alignment’, samt strategier til løsninger på problemstillinger inden for forskningsinfrastrukturen.

Se program og tilmeld dig konferencen.

Categories
Research

Examining far-right memory practices by use of digital methods

“I believe that when combining digital methods with the humanities, a lot of really great research can be done.”

Phillip Stenmann Baun, Aarhus University

Ph.d. student at the Department of Global Studies (Aarhus University), Phillip Stenmann Baun, has a background in History and is now working within the interdisciplinary field of memory studies and digital methods. As an offshoot of work done during his master’s thesis, Phillip’s Ph.d. project, “Reject degeneracy; Remember tradition!” A Study of Far-Right Digital Memory Practices, examines uses of the past in far-right communication on digital media through the use of digital methods and Natural Language Processing (NLP) techniques. Based on large amounts of data from the “politically incorrect” forum on 4chan.org, the aim of the project is to examine how memory and history inform the contemporary far-right imaginary.

Thoughts on interdisciplinarity

I am proud to call myself ‘interdisciplinary’ – I work within the field of ‘memory studies’, a field that draws its strength from many different disciplines, and I definitely see the interdisciplinary approach as very productive.

Phillip Stenmann Baun, Aarhus University

According to Phillip, a fundamental prerequisite for the project’s methodology is the combination of digital methods with a traditional hermeneutic approach: by using digital methods, the project can examine large amounts of data that would otherwise be inaccessible. However, the critical humanist cannot be completely replaced by computational methods, for example when designing the algorithms and interpreting the results:

How do I manage to operationalize concepts such as ‘memory’, ‘culture’, or ‘identity’ into something that a machine can read? I find such questions extremely interesting, because it is precisely here that the role of historians and others within the humanities really becomes relevant; it is in fact the humanities who need to both ask and answer these questions in order to actually understand and interpret the empirical material.

Phillip Stenmann Baun, Aarhus University

As such, the dialogue between digital methods and the humanities is essential during the whole process, not just when analysing the data; for example, Phillip’s initial meeting with classification algorithms and topic modelling when working on his master’s thesis led to an increased awareness of the importance of semantics when developing computational models:

When searching for specific historical entities, I need to consider carefully how they are represented lexically in my material.

Phillip Stenmann Baun, Aarhus University

Both topic modelling and sentiment analysis, i.e. computational linguistic analysis of affective states, will be central for Phillip’s further research on far-right memory and crucial for identifying words and expressions within the dataset related to the collective heading ‘memory’.

Approaching UCloud from a humanities perspective

UCloud played a key role for Phillip when working on his master’s thesis, especially in the initial phase when he needed to interpret the performance of his predictive models. Optimizing and fine-tuning the so-called parameters of algorithmic models, especially when working with a lot of data, usually involves heavy amounts of computational power. Without UCloud, this process would have taken much longer.

Working within the sphere of digital methods, however, was – and still is – challenging for a researcher with a background in the humanities, though assistance and easy accessibility has made the process much simpler:

I am really an outsider regarding everything concerning digital methods. When I was first introduced to UCloud, I was still ‘on the outside’ regarding many of these things. But the interface is fairly straightforward even though I had difficulties knowing where to begin at first.

Phillip Stenmann Baun, Aarhus University

In general, Phillip stresses that much value can be gained from implementing digital methods in the humanities, if researchers are open towards the new computational developments within a traditionally very analogue field:

It’s a bit of a shame that we are not more ‘digitally literate’ within the humanities and do not apply programmatic thinking more in both teaching and research. In some ways, many of us are still sceptic regarding the computational development and may consider digital methods as less qualified than traditional hermeneutic interpretation. This scepticism can only be overcome by showing the potential benefits in marrying digital methods with humanities research, where the strengths of each side gets to complement the other.

Phillip Stenmann Baun, Aarhus University

With his newly started Ph.d. project on far-right memory, a project heavily reliant on a digital approach, Phillip is part of an increasing number of researchers within the humanities who are currently paving the way for future interdisciplinary developments within this field and is an inspiration to researchers and students alike.

Front office support for 4chan data

For this project the Interactive HPC Front Office at Aarhus University, CHCAA assisted by collecting all 4chan posts from the #pol channel for the previous two years. In this case CHCAA-frontoffice had already, due to prior requests, created an api-crawler for 4chan that ensured access to the required data.

The 4chan api (https://github.com/4chan/4chan-API) only provides historical data for a very limited timespan, so to be able to analyze a larger timespan of data we created an api-crawler that, on a daily basis, fetches all new threads (post and replies) since last fetch and stores them in a json-file.

Peter Vahlstrup, CHCAA-frontoffice, Aarhus University

There are no obstacles in accessing the 4chan data as 4chan both offers an endpoint for archived threads, that contains threads which can no longer be commented on because they have been “pushed off the last page of a board” [https://github.com/4chan/4chan-API/blob/master/pages/Archive.md] and an endpoint for fetching live threads, but there are pros and cons to both endpoints.

Using the archived and therefore locked threads endpoint ensure that all replies are included, but it has the downside of reduced information about the author of the post/reply. Using the live threads endpoint has a downside in retrieving all replies. Ideally, the thread should be fetched just before it is “pushed off the last page of a board” because it is now no longer possible to comment on the thread. Every time a thread is commented on it is bumped to the top of the board until it reaches the bump-limit and then is automatically locked. (Not all threads reach the limit though, so the bump-limit cannot be used a measurement alone for when to collect the thread.) This means that if we use the live threads endpoint, we need to fetch data more often to get all threads with all replies.

Peter Vahlstrup, CHCAA-frontoffice, Aarhus University

During the project [CHCAA-frontoffice] experimented with both endpoints and, as foreseen, the archived threads endpoint is the easiest to work with, but they found that the live threads endpoint is also possible to work with when you fetch everything every 12 – 24 hours and subsequently delete all other versions of the same thread so only the most recent version with the most up-to-date data is preserved.

Categories
Research

Detecting text reuse in H.C. Andersen’s work

(…)In 2019, senior researcher Ejnar Stig Askgaard from Odense City Museums began comparing Hans Christian Andersen’s notes, written between approximately 1833 – 1875, with the 162 fairy tales, novels and autobiographies. This had led to the discovery that Hans Christian Andersen liked to use symbols such as cross marks or deletions in his notes to indicate that the note had been reused in his fairytales. 

For Detecting text reuse in H.C. Andersen’s work, Berg wanted to find out where each note had been reused. Earlier research had managed to manually identify where 278 notes had been reused in Hans Christian Andersen’s published work, but this had been a time-consuming effort, taking many months of work.

As 861 of the notes had been digitalized in addition to Hans Christian Andersen’s published work, Berg was able to apply digital methods to solve his problem. He contacted Zhiru Sun, Assistant Professor at the Department of Design and Communication at SDU, who used a method called Natural Language Processing to find similarities between the notes and Hans Christian Anderson’s work. Using the Python application on UCloud, this method generated a number of tables, which indicated how similar a specific note is to a specific fairytale.

This is an excerpt. Click here for the full story.