Categories
Conference Data Management DeiC Event

DeiC Conference 2022 – Oct. 26th-27th

This year’s DeiC conference kicks off on the 26th and 27th of October!

Mark your calendar for the 26th – 27th of October 2022, for the DeiC conference! This time the conference will be held at Comwell Kolding.

The conference is the event of the year in the field of e-infrastructure, particularly for research and education. The focus of the conference will be data management, supercomputing, net & services, and security.

More information will follow in the coming months, as registration opens, and the preliminary program is released.

For updates and more information, visit DeiC.

Categories
Research

Examining far-right memory practices by use of digital methods

“I believe that when combining digital methods with the humanities, a lot of really great research can be done.”

Phillip Stenmann Baun, Aarhus University

Ph.d. student at the Department of Global Studies (Aarhus University), Phillip Stenmann Baun, has a background in History and is now working within the interdisciplinary field of memory studies and digital methods. As an offshoot of work done during his master’s thesis, Phillip’s Ph.d. project, “Reject degeneracy; Remember tradition!” A Study of Far-Right Digital Memory Practices, examines uses of the past in far-right communication on digital media through the use of digital methods and Natural Language Processing (NLP) techniques. Based on large amounts of data from the “politically incorrect” forum on 4chan.org, the aim of the project is to examine how memory and history inform the contemporary far-right imaginary.

Thoughts on interdisciplinarity

I am proud to call myself ‘interdisciplinary’ – I work within the field of ‘memory studies’, a field that draws its strength from many different disciplines, and I definitely see the interdisciplinary approach as very productive.

Phillip Stenmann Baun, Aarhus University

According to Phillip, a fundamental prerequisite for the project’s methodology is the combination of digital methods with a traditional hermeneutic approach: by using digital methods, the project can examine large amounts of data that would otherwise be inaccessible. However, the critical humanist cannot be completely replaced by computational methods, for example when designing the algorithms and interpreting the results:

How do I manage to operationalize concepts such as ‘memory’, ‘culture’, or ‘identity’ into something that a machine can read? I find such questions extremely interesting, because it is precisely here that the role of historians and others within the humanities really becomes relevant; it is in fact the humanities who need to both ask and answer these questions in order to actually understand and interpret the empirical material.

Phillip Stenmann Baun, Aarhus University

As such, the dialogue between digital methods and the humanities is essential during the whole process, not just when analysing the data; for example, Phillip’s initial meeting with classification algorithms and topic modelling when working on his master’s thesis led to an increased awareness of the importance of semantics when developing computational models:

When searching for specific historical entities, I need to consider carefully how they are represented lexically in my material.

Phillip Stenmann Baun, Aarhus University

Both topic modelling and sentiment analysis, i.e. computational linguistic analysis of affective states, will be central for Phillip’s further research on far-right memory and crucial for identifying words and expressions within the dataset related to the collective heading ‘memory’.

Approaching UCloud from a humanities perspective

UCloud played a key role for Phillip when working on his master’s thesis, especially in the initial phase when he needed to interpret the performance of his predictive models. Optimizing and fine-tuning the so-called parameters of algorithmic models, especially when working with a lot of data, usually involves heavy amounts of computational power. Without UCloud, this process would have taken much longer.

Working within the sphere of digital methods, however, was – and still is – challenging for a researcher with a background in the humanities, though assistance and easy accessibility has made the process much simpler:

I am really an outsider regarding everything concerning digital methods. When I was first introduced to UCloud, I was still ‘on the outside’ regarding many of these things. But the interface is fairly straightforward even though I had difficulties knowing where to begin at first.

Phillip Stenmann Baun, Aarhus University

In general, Phillip stresses that much value can be gained from implementing digital methods in the humanities, if researchers are open towards the new computational developments within a traditionally very analogue field:

It’s a bit of a shame that we are not more ‘digitally literate’ within the humanities and do not apply programmatic thinking more in both teaching and research. In some ways, many of us are still sceptic regarding the computational development and may consider digital methods as less qualified than traditional hermeneutic interpretation. This scepticism can only be overcome by showing the potential benefits in marrying digital methods with humanities research, where the strengths of each side gets to complement the other.

Phillip Stenmann Baun, Aarhus University

With his newly started Ph.d. project on far-right memory, a project heavily reliant on a digital approach, Phillip is part of an increasing number of researchers within the humanities who are currently paving the way for future interdisciplinary developments within this field and is an inspiration to researchers and students alike.

Front office support for 4chan data

For this project the Interactive HPC Front Office at Aarhus University, CHCAA assisted by collecting all 4chan posts from the #pol channel for the previous two years. In this case CHCAA-frontoffice had already, due to prior requests, created an api-crawler for 4chan that ensured access to the required data.

The 4chan api (https://github.com/4chan/4chan-API) only provides historical data for a very limited timespan, so to be able to analyze a larger timespan of data we created an api-crawler that, on a daily basis, fetches all new threads (post and replies) since last fetch and stores them in a json-file.

Peter Vahlstrup, CHCAA-frontoffice, Aarhus University

There are no obstacles in accessing the 4chan data as 4chan both offers an endpoint for archived threads, that contains threads which can no longer be commented on because they have been “pushed off the last page of a board” [https://github.com/4chan/4chan-API/blob/master/pages/Archive.md] and an endpoint for fetching live threads, but there are pros and cons to both endpoints.

Using the archived and therefore locked threads endpoint ensure that all replies are included, but it has the downside of reduced information about the author of the post/reply. Using the live threads endpoint has a downside in retrieving all replies. Ideally, the thread should be fetched just before it is “pushed off the last page of a board” because it is now no longer possible to comment on the thread. Every time a thread is commented on it is bumped to the top of the board until it reaches the bump-limit and then is automatically locked. (Not all threads reach the limit though, so the bump-limit cannot be used a measurement alone for when to collect the thread.) This means that if we use the live threads endpoint, we need to fetch data more often to get all threads with all replies.

Peter Vahlstrup, CHCAA-frontoffice, Aarhus University

During the project [CHCAA-frontoffice] experimented with both endpoints and, as foreseen, the archived threads endpoint is the easiest to work with, but they found that the live threads endpoint is also possible to work with when you fetch everything every 12 – 24 hours and subsequently delete all other versions of the same thread so only the most recent version with the most up-to-date data is preserved.

Categories
Event Workshop

Participant reflections on data(Tinget) and using UCloud

By the end of 2021, students and staff interested in digital methods, data wrangling, text and data mining from Aarhus University and University of Copenhagen were once again invited to join the annually recurring datasprint organised by The University Libraries at The Royal Danish Library (Det Kgl. Bibliotek).

With the purpose of developing competencies within the field of digital humanities, the datasprint focused on the importance of open political data and the potential of text and data mining in this context.

Large historical data sets were made available to the participants as raw material to explore using the cloud based Interactive High Performance Computing service, UCloud, developed for Danish Universities. A hybrid group of staff from Center for Humanities Computing Aarhus (CHCAA) and students from Information Science, Aarhus University participated in the datasprint in Aarhus (November 18th and 19th) and gained experience with applying UCloud in their work with large datasets.

Benefits of UCloud

High Performance Computing systems (HPC), colloquially referred to as ‘super computers’, are characterised by their immense amount of computing power that far surpasses the abilities of regular desktop computers.

With the cloud based service UCloud, though, complex HPC systems are made accessible for researchers and students even when working with large datasets on laptops.

According to the participants from CHCAA, Aarhus University one main advantage of working with UCloud at the datasprint was the efficiency gained from the use of UCloud as it inflicts more computer power and works faster than similar systems. The ability to process large amounts of data in a relatively short amount of time is also described as a significant feature of UCloud next to its intuitive interface and easy error recovery.

The value of UCloud in the datasprint

UCloud formed an important tool at the datasprint in Aarhus as the topic of the datasprint involved a considerable amount of data, that is the complete collection of Folketinget’s proceedings from 1953 to 2021.

A notable challenge working with the large dataset from the Danish parliament was that only contemporary data from the 2000’s onwards had already been categorised into subjects, a challenge that the participants from our hybrid group sought to solve in order to favour the conditions for analysing the dataset.

By creating a new classifier for the old datasets lacking categories of subjects, the dataset will thus become more accessible and available for further analyses: We’re working with only 20 subjects, so it is very generic …like economy, labour, foreign affairs.

– Jan Kostkan, Center for Humanities Computing, Aarhus University

A broader comprehension of the dataset from Folketinget can thus be gained, and the group found a way to categorise the proceedings making them available for further analyses by experts with subject-matter knowledge, for example historians.

Evaluating the datasprint

UCloud thus served a valuable tool at the datasprint in Aarhus this November. All four participants unanimously agree that UCloud contains significant advantages when it comes to working with large datasets as in the datasprint, mainly because UCloud has more computer power and works faster than other systems.

One specific quality of UCloud that is emphasised by the participants is its ability to support the collaborative working process as the system makes it easy to work with others, even on a distance. Apart from minor issues in the user interface, UCloud is generally commended for its usability, even for beginners, and both students and staff from the group stress the potential of including UCloud in teaching.

Read more about the Data(Tinget) datasprint

Categories
Event Workshop

Organizer reflections on data(Tinget) and using UCloud

For the majority of researchers and students of the humanities, digital methods are far from standard procedure, and this is exactly what initiatives such as the datasprints organised and financed by The Royal Danish Library hope to change.

The value of digital methods in the humanities is gradually becoming clearer across disciplines. However, as programming and coding seems far from the traditional methods of the humanities, work still has to be done to fully integrate digital approaches in both research and teaching across the humanities.

Making use of digital methods opens new opportunities for working with large amounts of data and identifying connections across material – something that would simply be impossible without the integration of digital methods into the humanities.

The vision behind data(Tinget)

At the end of 2021, two datasprints focusing on the value of open political data and digital competencies were organised in Aarhus (November 18th-19th) and Copenhagen (December 2nd and 3rd). Due to a close collaboration between DeiC Facility for Interactive HPC and The Royal Danish Library, the cloud-based HPC (High Performance Computing) service UCloud developed for Danish Universities presented itself as a pertinent topic for the 2021-datasprints. More specifically, the participants were asked to explore parliamentary proceedings from the Danish Parliament (Folketinget) from 1953 to 2021 by use of UCloud at the datasprints.

The purpose of the datasprints were thus twofold: creating awareness of the value of open political data, and finally developing the digital competencies of the students and staff participating from Aarhus University and University of Copenhagen. UCloud played a significant part in the latter – as the datasprints involved considerable amounts of data -despite initial concerns for the organisers:

What worried us the most during the preparations was how difficult it would be to get the participants connected to UCloud. And if they would be able to use it at all. It went completely pain-free though; a few emails and fairly simple clicks on UCloud (full disclosure – it wasn’t me who had to click, so of course it was simple to me). And then it was up and running. Only real challenge was a semi bad internet connection on the first day in Copenhagen. And when they [the participants, red.] got access – all problems were gone, and everything went smooth!

– Christian B. Knudsen, The Royal Danish Library

UCloud as a key figure

As soon as the participants were confidential with UCloud, some of the benefits of working with the HPC service were made clear for participants as well as organisers. Per Møldrup-Dalum, one of the organisers of the datasprint (currently working as data manager for Center for Humanities Computing Aarhus/CHCAA) specifically emphasises UCloud as a pivotal tool at the datasprints:

Imagine the hassle when students, researchers, journalists, etc. show up with an equal number of different laptop computers. Some are old, others new, some running Windows, others Macs or Linux. Some attendees have no problem discerning between different Python versions, while others have never heard of Python or R or installing arbitrary software on their computer. Now, all these people need to have the same version of e.g. RStudio, R, Python and software to work with computer code. To get that to work could require a complete datasprint in itself.

– Per Møldrup-Dalum, The Royal Danish Library/CHCAA

All of these technical obstacles, however, were completely erased thanks to UCloud:

Now, enter UCloud. There we control everything and can ensure that it all just works — from the get-go! On top of that, we don’t have to worry that much about data size or computational resources. It’s all win-win.

– Per Møldrup-Dalum, The Royal Danish Library/CHCAA

As these evaluations show, UCloud holds major potential, not only in the context of these specific datasprints, but for developing digital skills across the humanities on a broader scale; the cloud-based HPC service, UCloud, simplifies the working process and makes collaborative work much more manageable. Hopefully, events such as the datasprints organised by The Royal Danish Library will have a sustained impact on researchers as well as students whose interest in digital methods and UCloud specifically can further the development and integration of digital methods across the humanities in the future.

Read more about the Data(Tinget) datasprint in Aarhus and the use of UCloud from the participants’ point of view.

Categories
Research

Detecting text reuse in H.C. Andersen’s work

(…)In 2019, senior researcher Ejnar Stig Askgaard from Odense City Museums began comparing Hans Christian Andersen’s notes, written between approximately 1833 – 1875, with the 162 fairy tales, novels and autobiographies. This had led to the discovery that Hans Christian Andersen liked to use symbols such as cross marks or deletions in his notes to indicate that the note had been reused in his fairytales. 

For Detecting text reuse in H.C. Andersen’s work, Berg wanted to find out where each note had been reused. Earlier research had managed to manually identify where 278 notes had been reused in Hans Christian Andersen’s published work, but this had been a time-consuming effort, taking many months of work.

As 861 of the notes had been digitalized in addition to Hans Christian Andersen’s published work, Berg was able to apply digital methods to solve his problem. He contacted Zhiru Sun, Assistant Professor at the Department of Design and Communication at SDU, who used a method called Natural Language Processing to find similarities between the notes and Hans Christian Anderson’s work. Using the Python application on UCloud, this method generated a number of tables, which indicated how similar a specific note is to a specific fairytale.

This is an excerpt. Click here for the full story.

Categories
Research Teaching

Digital Humanities

Researchers within the field of humanities are typically not heavy users of HPC (High Performance Computing) or cloud computing. However, a book, once digitalized, is actually quite a big data set. Assistant Professor at the department of Design and Communication, Zhiru Sun, tells us how she has been helping researchers from the Faculty of Humanities at SDU solve their research problems through digital methods and how using computing resources such as UCloud, also called DeiC Interactive HPC, can be a highly viable option if your project e.g. involves looking for patterns and similarities in digitalized texts.

This is an excerpt. Click here for the full story.

Categories
Event Workshop

One-year-in workshop – Status of DeiC Interactive HPC, UCloud

Monday the 31st of January the partnering Universities (AU, AAU, and SDU) met up for a workshop to take stock on the first year with UCloud – DeiC Interactive HPC.

One year of Interactive HPC

One year after going live, the effect and esteem of UCloud across national users can be (partly) analysed, and with more than 2,800 users, 30,000 jobs run and 400 projects started, it seems that UCloud has been well received and proven a welcome service for a wide range of users. Also, the numbers show, that DeiC Interactive HPC/UCloud is mainly being used during working hours as intended because of the interactive element of the platform.

Though three out of four users are affiliates of one of the three partner universities, a growing number of users from University of Copenhagen and Copenhagen Business School employ UCloud in their research.

Outreach will continue in order to gain more users from all the Danish universities, nonetheless UCloud has reached a considerable number of users – including many female users, which is normally a challenge for HPC systems.

Center Director, Claudio Pica

The new website interactivehpc.dk will also play a part in future outreach to ensure even more users in the years to come.

Future developments on UCloud

As an intuitive and interactive platform, UCloud was developed to assist and support researcher’s need for both computing and data management.

In general, the UCloud service support for users has been credited with high satisfaction rates, however, this is still an area with room for improvement.

Center Director, Claudio Pica

Future development of the service in terms of software and UCloud functionalities was thus a central focus of the agenda when the consortium met at the end of January to reflect on this first year’s outcome.
More specifically, the objective of the recent meeting was to discuss 1) improvement of the UCloud support 2) how to improve communication about UCloud on different platforms, and 3) how to develop the service, i.e. software and functionalities.

The national HPC landscape

Ambitions are high both in service support, development and documentation and the consortium will continue improving on UCloud and its accompanying services. UCloud also went through a major update (DeiC Project 5) in January 2022 preparing the platform to become the National Integration Portal.

DeiC Interactive HPC is part of a national HPC landscape, and all HPC facilities are now available except from Accelerated HPC, which is expected in 2022. The main objective of the landscape is to improve the e-infrastructure within Danish research and education.

The consortium behind DeiC Interactive HPC is a collaboration between Aalborg University Aarhus and University of Southern Denmark (SDU) including a partnership with The Royal Danish Library. Interactive HPC, was launched in the fall of 2020 with the purpose of encouraging and improving computing, storage, and network infrastructure across Danish education and research environments.
The HPC service UCloud, developed by the consortium, plays a decisive role for the consortium’s primary objective concerning the improvement of national e-infrastructure.

More workshops are planned in the near future, bringing the partnering universities SDU, Aalborg University and Aarhus University even closer together in their joint effort to provide the best research infrastructure across Danish and education and research environments.

Categories
UCloud status

Faster startup times for Virtual Machines

On the 1st of February, a new and improved experience for virtual machines was launched on the UCloud platform. This means that launching a virtual machine will only take a few seconds. Previously, UCloud users had to wait a few business days before a virtual machine could be created due to a manual step in the approval process.

We are also planning to expand the offer of virtual machines with more types of GPU enabled machines and different software.

This is an excerpt. Click here for the full story.

Categories
Event

Major UCloud update

In November 2020, the Danish universities coordinated by DeiC committed to working closely together to provide a joint platform to access all the DeiC national services such as the national HPC facilities. The project known as DeiC Project 5 will create the National Integration Portal based on the UCloud software infrastructure. As part of the DeiC Project 5, we are now introducing a major update of UCloud, preparing the platform to allow multiple providers to expose their services.

As part of the call for National HPC services, a consortium consisting of DTU, AU and SDU was awarded by the DeiC Board the DeiC Project 5, the goal of which is to build the National Integration Portal. This project will extend the UCloud software to provide the additional functionality required by the National Integration Portal.

This is an excerpt. Click here for the full story.

Categories
Research

Første kald om regnetid på de nationale HPC-anlæg er nu åbent

Er du forsker eller Ph.d.-studerende ved et dansk universitet kan du nu søge om adgang til regnetid på de nationale HPC-anlæg, inklusiv den danske del af EuroHPC LUMI. Opslaget er åbent for alle forskningsområder.

Der er åbent for ansøgninger om adgang til regneressourcer på de nationale HPC-anlæg. Det gælder også den danske del af EuroHPC LUMI.

This is an excerpt. Click here for the full story