Recently a group of research software engineers from the Danish universities came together and formed the new cross-university network; Danish RSE.
Having crossed paths on various university collaborations such as Digital Literacy and Digital Curriculum we saw a need to come together and establish a more formel group where experiences and challenges working as a Research Software Engineer could be discussed. Furthermore, we saw a need for a forum for sharing knowledge and solutions and to raise awareness of difficulties within the field.
Peter Vahlstrup, Research Software Engineer, Aarhus University
That led to the establishment of the Danish RSE network. A network open to all research engineers from the Danish universities who fit the profile. The focus for now is Humanities and Social Sciences, and the network is being supported by Dighumlab. Dighumlab will act as point of contact for RSE’s in Denmark and for other RSE networks in the Nordic countries. The first Danish RSE workshop was held this summer and the plan is to do more workshops and keep in touch using a dedicated Slack channel to communicate about issues specifically related to RSE-issues.
Values and goals of Danish RSE
The RSE network aims to foster best practices for RSE units in terms of service, support, teaching and engineering by sharing knowledge and experiences. The recent workshop dealt with issues such as acknowledgement of research software engineering in publications, education and career management. These sessions resulted in the the formulation the values and goals of the Danish RSE network.
Danish RSE supports and works towards enabling diverse RSE career paths
Danish RSE supports and works towards recognition of engineering tasks
Danish RSE supports and works towards Open Science and Open Source
RSE career paths and co-authorship
One of the major topics of the RSE workshop was that it seems RSE’s across universities are largely regarded as support staff, and the specialised skills of the RSE’s aren’t recognised. As an RSE you need a very high level of technical expertise combined with in dept understanding of research within the field of for instance Humanities and Social Sciences.
Talented RSE’s are hard to recruit and retain and the Danish RSE network hopes to impact the universities to recognise the value of their RSE’s by enabling diverse career paths to match their skills. And the word diverse is worth noticing, because there is no “one size fits all” when it comes to RSE’s. Some will want a technical position with the possibility of advancement. Others will want to do research themselves and go an entirely different direction.
If you are not interested in becoming a researcher yourself, you are stuck both in terms of salary and recognition on a level that cannot compete with offers in the private sector. And on the other hand if you’re interested in doing research, RSE’s are not necessarily recognised as contributors deserving of co-authorship. We need to ensure recognition equal to that of researcher and establish specialised technical positions that allow RSE’s to have career paths recognising their skills and expertise.
Peter Vahlstrup, Research Software Engineer, Aarhus University
We’re approaching the end of the second year with DeiC Interactive HPC – and there are now 4000 users on UCloud!
During the first year with DeiC Interactive HPC, UCloud reached more than 2000 users. We’re glad that the interest in the platform has continued to grow throughout the second year.
Since we discontinued our support for mounting your local folders onto UCloud using WebDAV, have we been in search of a way to allow the users of UCloud to work with their files locally without having to re-upload them to UCloud after every change. We are happy to announce that we now have a new solution that gives the possibility to synchronize your local files with your UCloud file storage.
UCloud has been a game changer for Assistant Professor of Cognitive Science and Humanities Computing (Aarhus University), Ross Deans Kristensen-McLachlan, teaching within the crossroads of cultural studies and data science.
In short, the benefits of UCloud within teaching narrows down to a much more trouble-free teaching process free of unnecessary technical issues, allowing teachers as well as students to focus on the substance of their work.
Benefits when teaching in UCloud
One of the major benefits is the computational resources available in terms of having more computing power, allowing students to focus on state-of-the-art work.
Assistant Professor Ross Deans Kristensen-McLachlan
Ross has been teaching two elective Cultural Data Science bachelor courses as well as a master’s level course on Natural Language Processing (NLP) for students of Cognitive Science. A clear before and after characterises the two elective courses, formerly run on a local server: as more than 25 students typically had to have access to the server, it naturally required a lot of energy and time. As a result, actual time to do state-of-the-art work was typically limited but with UCloud this kind of downtime has been reduced significantly. Barriers that could potentially make students new to computational methods loose interest in the field have therefore also been reduced.
Another major benefit, according to Ross, is that UCloud allows all students to work from the same starting point and reduces possible imbalances between students with brand new computers and students with older computer models:
One thing about UCloud, that I actually think is quite important, is that it kind of democratises access to resources.
Assistant Professor Ross Deans Kristensen-McLachlan
In terms of teaching, several palpable benefits allow teachers and students alike to concentrate on the substantial content of the respective courses. Some challenges do, however, arise in class, though these are typically rather insignificant such as some minor issues when integrating with GitHub.
UCloud and the humanities
When teaching the elective courses on Cultural Data Science, Ross has encountered humanities students with no background within computational methods whatsoever. This, however, turned out to be an advantage as the students were typically open and able to adapt quickly:
Because UCloud has eliminated a lot of former technical obstacles and barriers, students can focus on learning good programming practices and the results of their research. It allows us to focus on the task at hand. The students don’t have to know how the backend works; they don’t have to be computer scientists – they are humanities students and should be able to think about humanities objects (texts, visuals etc.) using computational methods.
Assistant Professor Ross Deans Kristensen-McLachlan
As such, UCloud is “a means to an end”, Ross emphasises. Though computational background knowledge is of course far from irrelevant, the objective for the Cultural Data Science courses has been to educate the students to think critically when working with computational methods:
We are not just looking at data science methods and applying them uncritically. We try to use the students’ main expertise and encourage them to apply their subject knowledge to think critically about their results when working with computational methods. Determining the notion ‘genre’ from a classification model eg. urges the students to think critically about the notion itself – is it even something we can determine from text alone?
Assistant Professor Ross Deans Kristensen-McLachlan
Overall, the students following Ross’ courses have been extremely positive about UCloud, even though some were sceptics to begin with. Two kinds of feedback characterised the reception of UCloud from the students in general: one group fully integrated with UCloud from the start, others came to accept it as a necessary (useful) tool.
Collaborate teaching resources
Among teachers from the department of Linguistics, Cognitive Science, and Semiotics at Aarhus University, UCloud has furthermore improved the internal coherence across the department for the benefit of both students and teachers. As most teachers have moved all their material on to UCloud, students now avoid using one set of tools for one course and one set of tools for another course.
Besides the many teaching-related benefits to be gained from UCloud, Ross further emphasises the ongoing dialogue between users of UCloud and the team who maintain it:
They are very responsive to suggestions. Over the past year it’s (UCloud) become even more fully featured in terms of what you can do with it, and I don’t see that stopping any time soon.
Assistant Professor Ross Deans Kristensen-McLachlan
One potential improvement of UCloud, Ross suggests, could be the implementation of some sort of outreach program in order to get even more people to gain from the benefits:
UCloud gets rid of all the annoying things. As far as I can see there are only benefits – the minor issues are vastly outnumbered by the benefits.
Assistant Professor Ross Deans Kristensen-McLachlan
As a researcher or Ph.d. Student at a Danish university you can now apply for resources on the national HPC centers, including the Danish part of EuroHPC LUMI.
The second call for applications for regular access to resources on the national HPC centers is now open. This includes applications for the Danish part of EuroHPC LUMI. The call is open for applications from all research areas.
As part of the use of the national e-infrastructures DeiC issues calls for applications on the use of the national resources. The projects are granted resources after application and on basis of assessment of research quality and technical feasibility.
The applications are evaluated by the appointed e-ressource committe and the grants are approved by the DeiC board.
Deadline for applications are 4th of October 2022 midnight, and the resources will be available for use from 1. January 2023.
Nu er der åbnet for tilmeldingen til årets DeiC konference med fokus på modenhed og tilpasning.
Konferencens hovedtema er ”Alignment and Maturity: Implementing Research Infrastructure Solutions”.
Programmet er inddelt i fire spor: Data management, supercomputing (HPC), net og tjenester, samt sikkerhed. Inden for hvert spor vil der blive fokuseret på ’maturity’ og ’alignment’, samt strategier til løsninger på problemstillinger inden for forskningsinfrastrukturen.
This year’s DeiC conference kicks off on the 26th and 27th of October!
Mark your calendar for the 26th – 27th of October 2022, for the DeiC conference! This time the conference will be held at Comwell Kolding.
The conference is the event of the year in the field of e-infrastructure, particularly for research and education. The focus of the conference will be data management, supercomputing, net & services, and security.
More information will follow in the coming months, as registration opens, and the preliminary program is released.
“I believe that when combining digital methods with the humanities, a lot of really great research can be done.”
Phillip Stenmann Baun, Aarhus University
Ph.d. student at the Department of Global Studies (Aarhus University), Phillip Stenmann Baun, has a background in History and is now working within the interdisciplinary field of memory studies and digital methods. As an offshoot of work done during his master’s thesis, Phillip’s Ph.d. project, “Reject degeneracy; Remember tradition!” A Study of Far-Right Digital Memory Practices, examines uses of the past in far-right communication on digital media through the use of digital methods and Natural Language Processing (NLP) techniques. Based on large amounts of data from the “politically incorrect” forum on 4chan.org, the aim of the project is to examine how memory and history inform the contemporary far-right imaginary.
Thoughts on interdisciplinarity
I am proud to call myself ‘interdisciplinary’ – I work within the field of ‘memory studies’, a field that draws its strength from many different disciplines, and I definitely see the interdisciplinary approach as very productive.
Phillip Stenmann Baun, Aarhus University
According to Phillip, a fundamental prerequisite for the project’s methodology is the combination of digital methods with a traditional hermeneutic approach: by using digital methods, the project can examine large amounts of data that would otherwise be inaccessible. However, the critical humanist cannot be completely replaced by computational methods, for example when designing the algorithms and interpreting the results:
How do I manage to operationalize concepts such as ‘memory’, ‘culture’, or ‘identity’ into something that a machine can read? I find such questions extremely interesting, because it is precisely here that the role of historians and others within the humanities really becomes relevant; it is in fact the humanities who need to both ask and answer these questions in order to actually understand and interpret the empirical material.
Phillip Stenmann Baun, Aarhus University
As such, the dialogue between digital methods and the humanities is essential during the whole process, not just when analysing the data; for example, Phillip’s initial meeting with classification algorithms and topic modelling when working on his master’s thesis led to an increased awareness of the importance of semantics when developing computational models:
When searching for specific historical entities, I need to consider carefully how they are represented lexically in my material.
Phillip Stenmann Baun, Aarhus University
Both topic modelling and sentiment analysis, i.e. computational linguistic analysis of affective states, will be central for Phillip’s further research on far-right memory and crucial for identifying words and expressions within the dataset related to the collective heading ‘memory’.
Approaching UCloud from a humanities perspective
UCloud played a key role for Phillip when working on his master’s thesis, especially in the initial phase when he needed to interpret the performance of his predictive models. Optimizing and fine-tuning the so-called parameters of algorithmic models, especially when working with a lot of data, usually involves heavy amounts of computational power. Without UCloud, this process would have taken much longer.
Working within the sphere of digital methods, however, was – and still is – challenging for a researcher with a background in the humanities, though assistance and easy accessibility has made the process much simpler:
I am really an outsider regarding everything concerning digital methods. When I was first introduced to UCloud, I was still ‘on the outside’ regarding many of these things. But the interface is fairly straightforward even though I had difficulties knowing where to begin at first.
Phillip Stenmann Baun, Aarhus University
In general, Phillip stresses that much value can be gained from implementing digital methods in the humanities, if researchers are open towards the new computational developments within a traditionally very analogue field:
It’s a bit of a shame that we are not more ‘digitally literate’ within the humanities and do not apply programmatic thinking more in both teaching and research. In some ways, many of us are still sceptic regarding the computational development and may consider digital methods as less qualified than traditional hermeneutic interpretation. This scepticism can only be overcome by showing the potential benefits in marrying digital methods with humanities research, where the strengths of each side gets to complement the other.
Phillip Stenmann Baun, Aarhus University
With his newly started Ph.d. project on far-right memory, a project heavily reliant on a digital approach, Phillip is part of an increasing number of researchers within the humanities who are currently paving the way for future interdisciplinary developments within this field and is an inspiration to researchers and students alike.
Front office support for 4chan data
For this project the Interactive HPC Front Office at Aarhus University, CHCAA assisted by collecting all 4chan posts from the #pol channel for the previous two years. In this case CHCAA-frontoffice had already, due to prior requests, created an api-crawler for 4chan that ensured access to the required data.
The 4chan api (https://github.com/4chan/4chan-API) only provides historical data for a very limited timespan, so to be able to analyze a larger timespan of data we created an api-crawler that, on a daily basis, fetches all new threads (post and replies) since last fetch and stores them in a json-file.
Peter Vahlstrup, CHCAA-frontoffice, Aarhus University
There are no obstacles in accessing the 4chan data as 4chan both offers an endpoint for archived threads, that contains threads which can no longer be commented on because they have been “pushed off the last page of a board” [https://github.com/4chan/4chan-API/blob/master/pages/Archive.md] and an endpoint for fetching live threads, but there are pros and cons to both endpoints.
Using the archived and therefore locked threads endpoint ensure that all replies are included, but it has the downside of reduced information about the author of the post/reply. Using the live threads endpoint has a downside in retrieving all replies. Ideally, the thread should be fetched just before it is “pushed off the last page of a board” because it is now no longer possible to comment on the thread. Every time a thread is commented on it is bumped to the top of the board until it reaches the bump-limit and then is automatically locked. (Not all threads reach the limit though, so the bump-limit cannot be used a measurement alone for when to collect the thread.) This means that if we use the live threads endpoint, we need to fetch data more often to get all threads with all replies.
Peter Vahlstrup, CHCAA-frontoffice, Aarhus University
During the project [CHCAA-frontoffice] experimented with both endpoints and, as foreseen, the archived threads endpoint is the easiest to work with, but they found that the live threads endpoint is also possible to work with when you fetch everything every 12 – 24 hours and subsequently delete all other versions of the same thread so only the most recent version with the most up-to-date data is preserved.
By the end of 2021, students and staff interested in digital methods, data wrangling, text and data mining from Aarhus University and University of Copenhagen were once again invited to join the annually recurring datasprint organised by The University Libraries at The Royal Danish Library (Det Kgl. Bibliotek).
With the purpose of developing competencies within the field of digital humanities, the datasprint focused on the importance of open political data and the potential of text and data mining in this context.
Large historical data sets were made available to the participants as raw material to explore using the cloud based Interactive High Performance Computing service, UCloud, developed for Danish Universities. A hybrid group of staff from Center for Humanities Computing Aarhus (CHCAA) and students from Information Science, Aarhus University participated in the datasprint in Aarhus (November 18th and 19th) and gained experience with applying UCloud in their work with large datasets.
Benefits of UCloud
High Performance Computing systems (HPC), colloquially referred to as ‘super computers’, are characterised by their immense amount of computing power that far surpasses the abilities of regular desktop computers.
With the cloud based service UCloud, though, complex HPC systems are made accessible for researchers and students even when working with large datasets on laptops.
According to the participants from CHCAA, Aarhus University one main advantage of working with UCloud at the datasprint was the efficiency gained from the use of UCloud as it inflicts more computer power and works faster than similar systems. The ability to process large amounts of data in a relatively short amount of time is also described as a significant feature of UCloud next to its intuitive interface and easy error recovery.
The value of UCloud in the datasprint
UCloud formed an important tool at the datasprint in Aarhus as the topic of the datasprint involved a considerable amount of data, that is the complete collection of Folketinget’s proceedings from 1953 to 2021.
A notable challenge working with the large dataset from the Danish parliament was that only contemporary data from the 2000’s onwards had already been categorised into subjects, a challenge that the participants from our hybrid group sought to solve in order to favour the conditions for analysing the dataset.
By creating a new classifier for the old datasets lacking categories of subjects, the dataset will thus become more accessible and available for further analyses: We’re working with only 20 subjects, so it is very generic …like economy, labour, foreign affairs.
– Jan Kostkan, Center for Humanities Computing, Aarhus University
A broader comprehension of the dataset from Folketinget can thus be gained, and the group found a way to categorise the proceedings making them available for further analyses by experts with subject-matter knowledge, for example historians.
Evaluating the datasprint
UCloud thus served a valuable tool at the datasprint in Aarhus this November. All four participants unanimously agree that UCloud contains significant advantages when it comes to working with large datasets as in the datasprint, mainly because UCloud has more computer power and works faster than other systems.
One specific quality of UCloud that is emphasised by the participants is its ability to support the collaborative working process as the system makes it easy to work with others, even on a distance. Apart from minor issues in the user interface, UCloud is generally commended for its usability, even for beginners, and both students and staff from the group stress the potential of including UCloud in teaching.
For the majority of researchers and students of the humanities, digital methods are far from standard procedure, and this is exactly what initiatives such as the datasprints organised and financed by The Royal Danish Library hope to change.
The value of digital methods in the humanities is gradually becoming clearer across disciplines. However, as programming and coding seems far from the traditional methods of the humanities, work still has to be done to fully integrate digital approaches in both research and teaching across the humanities.
Making use of digital methods opens new opportunities for working with large amounts of data and identifying connections across material – something that would simply be impossible without the integration of digital methods into the humanities.
The vision behind data(Tinget)
At the end of 2021, two datasprints focusing on the value of open political data and digital competencies were organised in Aarhus (November 18th-19th) and Copenhagen (December 2nd and 3rd). Due to a close collaboration between DeiC Facility for Interactive HPC and The Royal Danish Library, the cloud-based HPC (High Performance Computing) service UCloud developed for Danish Universities presented itself as a pertinent topic for the 2021-datasprints. More specifically, the participants were asked to explore parliamentary proceedings from the Danish Parliament (Folketinget) from 1953 to 2021 by use of UCloud at the datasprints.
The purpose of the datasprints were thus twofold: creating awareness of the value of open political data, and finally developing the digital competencies of the students and staff participating from Aarhus University and University of Copenhagen. UCloud played a significant part in the latter – as the datasprints involved considerable amounts of data -despite initial concerns for the organisers:
What worried us the most during the preparations was how difficult it would be to get the participants connected to UCloud. And if they would be able to use it at all. It went completely pain-free though; a few emails and fairly simple clicks on UCloud (full disclosure – it wasn’t me who had to click, so of course it was simple to me). And then it was up and running. Only real challenge was a semi bad internet connection on the first day in Copenhagen. And when they [the participants, red.] got access – all problems were gone, and everything went smooth!
– Christian B. Knudsen, The Royal Danish Library
UCloud as a key figure
As soon as the participants were confidential with UCloud, some of the benefits of working with the HPC service were made clear for participants as well as organisers. Per Møldrup-Dalum, one of the organisers of the datasprint (currently working as data manager for Center for Humanities Computing Aarhus/CHCAA) specifically emphasises UCloud as a pivotal tool at the datasprints:
Imagine the hassle when students, researchers, journalists, etc. show up with an equal number of different laptop computers. Some are old, others new, some running Windows, others Macs or Linux. Some attendees have no problem discerning between different Python versions, while others have never heard of Python or R or installing arbitrary software on their computer. Now, all these people need to have the same version of e.g. RStudio, R, Python and software to work with computer code. To get that to work could require a complete datasprint in itself.
– Per Møldrup-Dalum, The Royal Danish Library/CHCAA
All of these technical obstacles, however, were completely erased thanks to UCloud:
Now, enter UCloud. There we control everything and can ensure that it all just works — from the get-go! On top of that, we don’t have to worry that much about data size or computational resources. It’s all win-win.
– Per Møldrup-Dalum, The Royal Danish Library/CHCAA
As these evaluations show, UCloud holds major potential, not only in the context of these specific datasprints, but for developing digital skills across the humanities on a broader scale; the cloud-based HPC service, UCloud, simplifies the working process and makes collaborative work much more manageable. Hopefully, events such as the datasprints organised by The Royal Danish Library will have a sustained impact on researchers as well as students whose interest in digital methods and UCloud specifically can further the development and integration of digital methods across the humanities in the future.