The purpose of this webinar is to demonstrate how to perform automated text classification using pre-trained, open-source large language models (LLMs). The efficiency that This webinar demonstrates how to perform automated text classification using pre-trained, open-source large language models (LLMs). The use case is classification of text according to sentiment, but the general workflow is applicable to many other text classification tasks.
In the webinar, it is demonstrated how to:
Retrieve text data from online sources
Store and prepare the data for analysis
Set up an LLM text classifier
Perform the text classification
Use a web app for human validation of the classification result
Time stamps:
00:00: Introduction and agenda 02:04: Terminology and UCloud workflow outline 08:30: Best practices in web scraping 11:30: Setting up the UCloud workflow 18:30: Scraping and storing the data 27:06: Setting up the the classification pipeline 42:15: Classifying the data and analyzing the results 57:27: Creating a web app for result validation by humans
UCloud played an important role in Sofia Topcu Madsen’s research by helping her analyse large-scale time-use and consumption data across seven countries faster and more efficiently, supporting her work to gain insights into how people spend their time and money in everyday life.
Sofia Topcu Madsen is a former PhD fellow at Aalborg University, Department of Sustainability and Planning. As part of her PhD project, Getting the Data Right, she investigates how people across seven countries in low- and middle-income countries spend time and money on different everyday activities — and how socio-economic factors shape what people are able to do.
The project looks at countries including Uganda, Kenya, Tanzania, India, Sri Lanka, Argentina and Mongolia. Based on large-scale time-use and consumption data, Sofia analyses activities such as transport and household work and explores how time and money spent on everyday activities vary across different socio-economic groups.
Working with large datasets
Sofia’s analyses are based on different types of detailed data. One example is time-use diaries, where participants report what they do throughout the day in 10-minute intervals. For some countries, the datasets are very large. In the Indian dataset alone, Sofia works with around half a million observations.
This makes the computational work demanding. At first, Sofia tried running the analyses on her own computer, but the datasets were too large and the analyses too time-consuming.
“On my own computer, the analyses could take several days to run. With UCloud, it became much faster and more manageable,” Sofia explains.
UCloud gave her access to more computing power, making it possible to run large analyses more efficiently and rerun them when corrections or adjustments were needed. This became important throughout the project, as even small changes in the data setup or model specification could require the analyses to be run again.
“UCloud has been a huge help. I do not think I would have been able to complete this part of the project in the same way without access to it,” Sofia says.
Using Stata and R on UCloud
To conduct the analysis, Sofia used Stata and R on UCloud. Stata and R are tools that researchers use to work with data and carry out statistical analyses.
In Sofia’s project, the tools were used to run regression analyses, where she examines how different factors may be connected — for example how education, income or gender may relate to the amount of time and money people spend on transport, household work or other everyday activities.
She used SUR methods, which make it possible to run several related analyses at the same time. This was relevant because the project looks at activities across a full 24-hour day, where time spent on one activity can be connected to time spent on another. Similarly, money spent on products supporting one activity limits money for other products.
Running several analyses at the same time with large datasets requires a lot of computing power and would have been very time-consuming to do on a normal computer. By using UCloud, Sofia could run the analyses faster and more efficiently, saving her a lot of time.
Contributing to research on sustainable development
Sofia’s research is connected to the UN Sustainable Development Goals by exploring how everyday activities can be used as indicators of broader social and economic conditions.
How people spend their time can tell us something important about everyday life, inequality and opportunities. For example, a joint perspective on time and money spent on transport, household work, or leisure can reveal how resources, responsibilities, and opportunities are distributed across different population groups.
“Time can also be understood as a resource. Looking at how people spend their time gives us another way to understand poverty, inequality and sustainable development,” Sofia explains.
A platform that was easy to get started with
Sofia describes UCloud as easy to access and use, especially once the workflow was in place. She also highlights the support as an important part of the experience.
“The support has been very effective. I have received quick answers to my questions, and that has been a big help,” she says.
Sofia received support from Aalborg University’s local Front Office. Each Danish university has its own Front Office, where researchers can get help with access, use and questions related to UCloud.
Would use UCloud again
Sofia is now employed at the University of Copenhagen, Department of Food and Resource Economics, and she can see herself using UCloud again in future research.
“I would definitely use UCloud again,” she says.
Because UCloud is available to researchers affiliated with Danish universities, Sofia can also continue to use the platform in future research projects at the University of Copenhagen.
A newly inaugurated supercomputing facility expands access to advanced computational resources for researchers and students across Denmark via UCloud.
A new supercomputing facility has been inaugurated at Alsion in Sønderborg, marking an important expansion of Denmark’s digital research infrastructure. The system, named Bitten, will support research and teaching in areas such as artificial intelligence, data analytics, and advanced computing – and will be accessible to users across the country through UCloud.
By providing access via UCloud, the system becomes available within a familiar environment already used by thousands of researchers and students. Users can access advanced computing resources directly through a browser-based interface, selecting tools and applications much like in an app store — without needing to manage or understand the underlying infrastructure. This significantly lowers the barrier to entry, making high-performance computing available not only to specialists, but across disciplines.
The new setup also simplifies the user experience. Infrastructure that was previously distributed across locations is now consolidated in a single data centre, making it easier to navigate and work with computing resources.
Expanding access to advanced computing
The addition of new supercomputing capacity increases what researchers can do — and who can do it. By removing technical and practical barriers, the platform enables more researchers and students to work with large datasets, advanced models, and computational methods, accelerating the path from idea to insight, and from research to real-world application.
Just as importantly, this is made possible within a Danish digital infrastructure where data, software, and computation remain under national control. As reliance on large-scale data and AI continues to grow, questions of data governance, security, and control are becoming increasingly central — not only from a technical perspective, but as a strategic priority for research and innovation.
“Access to computing power is only part of the equation,” said Professor Claudio Pica, Director of the SDU eScience Center representing the consortium behind UCloud. “It is equally important that researchers can work within a trusted environment where data and workflows remain under national control. UCloud makes it possible to combine advanced computing with that level of trust and transparency.”
A shared platform for research and innovation
UCloud is already widely used across Danish universities, supporting a broad range of disciplines and use cases, and now serves more than 23,000 users across research fields. With the addition of new high-performance resources, the platform continues to evolve to meet the growing computational needs of research and education.
The platform also supports startups and spin-out companies by providing access to advanced AI and data analytics, improving opportunities to develop, test, and scale new solutions. In this way, the infrastructure contributes not only to research, but to innovation and competitiveness more broadly.
UCloud is developed and operated in close collaboration between the partners in the Interactive HPC Consortium, consisting of the University of Southern Denmark, Aarhus University, and Aalborg University — reflecting a long-term joint effort across Danish universities to build shared digital research infrastructure.
Part of a broader national effort
The new supercomputing facility, Bitten, is part of Denmark’s national research infrastructure and reflects ongoing collaboration between universities, industry, and technology providers. Access through UCloud plays a central role in ensuring that this investment benefits a broad user base across the country.
The facility is also designed with energy efficiency in mind. Developed in collaboration between SDU, Danfoss, and HPE, the system uses advanced liquid cooling with full heat recovery, allowing excess heat to be reused in the local district heating system. This positions the facility as an example of how digital infrastructure can actively support the energy system — turning data centres from energy consumers into integrated, value-adding components of local infrastructure.
A new data centre for UCloud will open soon. As part of the transition, there will be a period of downtime at the end of April/beginning of Mayand – for some users – a need to move data.
The data centre is being established through a collaboration between SDU, Danfoss and Hewlett Packard Enterprise (HPE) and will host the hardware behind DeiC Interactive HPC – UCloud. At the same time, the infrastructure is being upgraded with new and more powerful hardware.
The UCloud infrastructure will therefore be consolidated in a single data centre instead of being distributed between SDU and AAU. The transition may affect you as a user. Here is an overview of what you should be aware of.
Short downtime during the migration
When the system is migrated to the new data centre at the end of April 2026, all services will be temporarily unavailable until the migration is complete. The new data centre is expected to be fully operational in early May 2026. Exact dates and further details are available in the UCloud documentation.
Moving data
If your data are stored in SDU/K8s, they will be moved automatically to the new data centre during the migration. You do not need to take any action.
⚠️ Special note for users with data in Aalborg (AAU) ⚠️
If you have data in AAU/K8s or AAU/VM, they will not be moved automatically. These data must be transferred to SDU/K8s before 27 April 2026, otherwise they will be lost. Please note that transferring data can take time, so it is a good idea to plan accordingly.
As part of the transition to the new system, all existing compute allocations will expire on 30 April 2026. This happens automatically as part of the migration to the new infrastructure.
Procedures may vary between universities
The procedure for new allocations on the new data centre is determined by your university, and the allocation of compute resources is handled by the university’s local DeiC front office.
More computing power – and a simpler system to use
In the new data centre, UCloud will receive new and more powerful hardware, giving researchers better opportunities to work with large datasets, advanced computations and AI.
At the same time, it will become easier to choose the right resources. Where several different machine types were previously available, users will now simply choose between two types of compute:
CPU – for standard computations
GPU – for tasks such as AI and advanced data processing
This simpler structure makes it quicker and clearer to apply for and use computing power, allowing researchers to focus more on their analyses and less on technical choices.
The new data centre has also been designed with a focus on energy-efficient operation. In collaboration with Danfoss and HPE, advanced cooling and heat recovery systems are used that make it possible to reuse excess heat and reduce energy consumption.
The initiative forms part of ProjectZero in Sønderborg, which is working to make the area’s energy system CO₂-neutral by 2029.
A new open-source application is now available on UCloud, designed for students, researchers, and educators working with complex data and artificial intelligence. RAGFlow – short for Retrieval-Augmented Generation – combines powerful language models with your own academic materials, offering an intelligent way to search, explore, and interact with content.
Whether you’re conducting a literature review, developing a teaching assistant, or building a domain-specific chatbot, RAGFlow provides an intuitive pipeline that transforms unstructured documents into a searchable, AI-ready knowledge base. But RAGFlow is more than just question-answering. It supports the creation of custom workflows and intelligent agents, enabling advanced interactions, data processing, and tool integration – all within a flexible and transparent environment.
What can you do with RAGFlow?
RAGFlow helps large language models (LLMs) generate accurate answers based on real data – not just pre-trained knowledge. It’s built to close the gap between raw academic material and useful insight.
RAGFlow is designed with both beginners and advanced users in mind. At its simplest, you can just upload documents and start asking questions. The interface guides you through the basics, so you can get useful results straight away.
As your needs grow, you can delve deeper into advanced features such as custom chunking, retrieval tests, datasets, and programmable workflows. Comprehensive documentation and tutorials are available, allowing you to learn at your own pace and expand your use of the platform over time.
Key Features:
Data Ingestion & Chunking: Upload PDFs, text files, webpages and more. RAGFlow automatically breaks them into manageable parts.
Embedding & Indexing: These chunks are converted into vector representations so they can be searched by meaning, not just keywords.
Smart Retrieval: When you ask a question, the system finds the most relevant information.
Contextual Generation: An LLM uses this context to generate well-informed responses.
Cited Sources: All answers come with grounded citations, showing where the information came from — supporting transparency and academic rigour.
This process improves the quality of responses and significantly reduces the risk of hallucinated or misleading answers.
From Search to Workflow: Introducing Agents
Beyond document search, RAGFlow also allows you to build and customise your own AI-powered agents. These agents can search, analyse, and use tools on your behalf – forming a pipeline tailored to your specific research needs.
So, what is an agent?
Think of an agent as a specialised AI assistant. You might create one to retrieve data from a source, another to analyse it, and a third to generate a written summary or report. These agents can be chained together into a programmable pipeline – a step-by-step flow where each agent passes its output to the next.
For example, you could build a research assistant that:
Searches for academic papers on a topic
Extracts and summarises the most relevant findings
Runs basic statistical analysis
Outputs the results as a draft report
Unlike typical ‘black-box’ AI tools, which conceal their inner workings, RAGFlow provides full transparency, allowing you to understand exactly how your AI operates. You can inspect, adjust, and understand every stage – from document chunking to embedding, retrieval, and agent reasoning. It’s a flexible and reproducible platform where your agents can be saved, re-run, or even shared with colleagues.
Why use RAGFlow on UCloud?
RAGFlow is available directly on UCloud. This offers several key advantages:
Academic Use Cases: Build assistants for teaching, research discovery, or even entire knowledge bases for your institute or research centre.
No Installation Required: Launch RAGFlow on UCloud with everything preconfigured and ready to use.
Flexible AI Model Support: Choose from models hosted on Hugging Face, Ollama, or take advantage of GPU-accelerated inference with vLLM – all accessible via an API key.
Easy Document Management: Upload and manage a wide range of formats, including PDFs, scanned documents, spreadsheets, and HTML.
Shortly before Christmas 2025, DeiC and the Interactive HPC Consortium entered into a new five-year agreement on the national HPC service, with an annual budget of DKK 10 million. At the same time, the DeiC Board decided to invest a one-off amount of DKK 4 million in expanded GPU capacity. This additional investment has made it possible to significantly upgrade the facility. The upgrade strengthens opportunities for research projects working with large datasets and AI, while also providing better opportunities to use GPU resources more broadly.
In spring 2026, the consortium will establish a new energy-efficient data centre that will house the hardware for DeiC Interactive HPC. Danfoss and HPE are participating in the establishment of the data centre, which is described here.
“Through the new agreement, we are strengthening a shared national research infrastructure that gives researchers across Denmark easy access to advanced computing power. This is an important step for both digitalisation and digital sovereignty in Danish research. At the same time, the collaboration between the universities demonstrates that we can jointly develop solutions that are both technologically strong and sustainable – not least through the establishment of an energy-efficient data centre,” says Professor Claudio Pica, coordinator of the Interactive HPC Consortium.
Collaboration and development in focus
With the new contracts, DeiC and the Interactive HPC Consortium continue and expand a strong and trusted partnership. The long-term nature of the agreements and the strengthened financial framework reflect a solid collaboration and a shared commitment to developing the solution in close dialogue with users and DeiC’s professional forums. The aim is a solution that evolves in line with researchers’ needs and priorities.
At the same time, the agreements build on the service’s high level of security and strengthen the already well-established framework for reporting on usage and operations. Finally, the partners have established clear shared expectations regarding the handling of future development work and system integrations.
Acting Head of HPC at DeiC, Rune Gamborg Ørum, sees the new agreement as an important step forward:
“Beyond the opportunities offered by the new facility itself, I am pleased that our joint work on the contracts has also created a clear collaborative structure around the service. This will make a real difference for users, as it allows us to work together to support the many different ways researchers use DeiC Interactive HPC.”
In February 2026, the partners met in Aarhus for a joint workshop focusing on putting the agreement into practice and ensuring a well-functioning collaboration on technical development, user support and training activities in the coming years. Among other things, participants discussed how existing users can transition smoothly to the new data centre and the new GPU resources.
DeiC Interactive HPC
DeiC Interactive HPC provides interactive and user-friendly supercomputing for researchers at Danish universities. The service is based on the UCloud platform, operated by the Interactive HPC Consortium consisting of the University of Southern Denmark (SDU), Aarhus University (AU) and Aalborg University (AAU).
Today, DeiC Interactive HPC has approximately 22,000 users among students and researchers and supports a wide range of research fields. Through DeiC Interactive HPC, DeiC and the universities provide researchers across the country with access to scalable computing power via UCloud.
In the Master’s programme in Cognitive Science at Aarhus University, UCloud plays a central role in teaching Natural Language Processing (NLP). For instructor and PhD student Mina Almasi, the platform is essential in enabling students to work hands-on with complex models – regardless of the limitations of their own computers.
From Theory to Hands-On Learning
In a white classroom in Nobelparken, Mina stands in front of 15 students. On the screen behind her, lines of Python code appear in neat, symmetrical rows as she explains which code libraries the students need to access.
In her teaching, she uses the Coder Python application in UCloud because the course is based on Python programming. But the choice of platform is not just about software – it is about giving students the opportunity to translate theory into practice.
According to Mina, NLP teaching previously tended to remain at a more theoretical level, due to limited access to both models and the computing power needed to test theories in practice – especially when it came to large language models. With UCloud, students can now work directly with language models (LLMs) and make use of powerful GPUs and CPUs. This allows them to test theories themselves and experiment hands-on with the tools they are learning about.
“We still teach the theory, but now we can also have students use the tools in practice. They can code on their own and gain insight into how a large language model works by working directly with it through UCloud,” she explains.
A Standardised Setup that Democratises the Classroom
Another advantage of using UCloud in NLP teaching is that the platform ensures equal access for all students, regardless of the computer they own.
“There is a kind of democratisation of the classroom, because you don’t need the latest computer. You can use a five-year-old machine to run very heavy tasks that the newest tools in Natural Language Processing require,” she explains.
At the same time, the standardised setup makes teaching more seamless. All students work with the same standard configuration in UCloud, so any issues that arise are the same for everyone. This creates a shared sense of problem-solving, as challenges can be addressed collectively rather than handled individually by students on their own. As Mina puts it:
“Instead of stopping the lesson to solve individual problems, the problems become collective and an opportunity for learning for everyone. If we have a software issue – for example, a Python library version that is outdated or incompatible – it affects everyone, and we can solve it together.”
Preparing Students for Working Life
For Mina, using UCloud also helps prepare students for the reality that awaits them after graduation. According to her, many of the students who go on to IT positions will likely use cloud computing platforms rather than coding on local machines. In this way, the teaching becomes direct preparation for future job tasks and gives students experience with the technologies they will encounter in practice.
Advice for Other Instructors
Mina has used UCloud since her bachelor’s degree and finds that the platform makes teaching both smoother and more engaging.
“I recommend that other instructors make use of the platform. You just have to get started – but feel free to ask colleagues for advice on how they use it. Get some inspiration, because UCloud is a fantastic tool. It can do a great many things, but like other systems, it can feel a bit overwhelming at first, so it’s a good idea to get some guidance along the way before you begin.”
You can now apply for compute time on UCloud. DeiC has opened the first 2026 call for applications for access to Denmark’s national HPC facilities – and Interactive HPC – UCloud is part of this call.
So if your research needs extra compute resources on UCloud, now is the time to apply. These calls only open twice a year, so this is a great opportunity to consider applying in this round. Researchers (and PhD students) at Danish universities can apply.
We are pleased to announce that even more people have discovered how UCloud can support their research – and that we have now reached an important milestone of 20,000 users.
The 20,000 users include both students and researchers, which is a truly remarkable number for a national supercomputing platform. UCloud is among the most successful High-Performance Computing platforms in Europe and stands out with a user base that is both significantly larger and more diverse than that of other – and often larger – supercomputing facilities across Europe. This applies both in terms of the number of users and the representation across research fields, levels of experience, and academic backgrounds.
Making High-Performance Computing accessible to all
UCloud was developed with a clear goal: to make High-Performance Computing accessible to all researchers affiliated with a Danish university – regardless of research area, experience, or academic discipline. We are therefore proud to see that the platform is widely used by both researchers and students across disciplines and levels.
We are pleased to help enable important research by providing the computing power needed to make research work easier and more time-efficient. At the same time, UCloud makes it possible to process sensitive data on a secure platform. Our platform is an important contribution to research that creates real value in the world beyond academia.
A milestone driven by collaboration
This milestone is the result of many years of focused work to create a platform that combines user-friendliness with high performance. It is also the result of the strong consortium collaboration between the University of Southern Denmark, Aarhus University, and Aalborg University, which jointly develop UCloud and continuously expand the platform to meet researchers’ needs.
A new data center to support the growing user base
To support the rapidly growing number of users on the platform, a new data center to host UCloud’s next-generation hardware was recently established. This expansion strengthens UCloud’s capacity and ensures that the platform is future-proof and ready to support both current and upcoming users. You can read more about the expansion here: University Collaboration Strengthens National Research Infrastructure with New Green Data Center.
We look forward to welcoming even more users to UCloud – and to continuing to support Danish research with modern, scalable, and secure supercomputing infrastructure.
CVAT, Computer Vision Annotation Tool, is an interactive video and image annotation tool, designed to facilitate the annotation of video and image data and accelerate the creation of high-quality datasets for computer vision tasks. CVAT is available on the UCloud platform, in the Application Store.
The webinar will show how to use CVAT on UCloud to:
Label and annotate data with the help of AI and OpenCV tools, including:
Use of cvat-cli
Run built-in model for detection and auto-annotation
Use of GPUS with built in models for faster annotation
Adding custom models (e.g. YOLO)
Efficiently manage large visual datasets with MinIO:
Allow CVAT to directly pull images from your UCloud MinIO buckets for annotation and export annotated data back, reducing manual imports/exports and ensuring data availability.
Using UCloud allows users to create fully reproducible and secure workflows that leverage high performance computing resources. Those features are often necessary for large dataset and accurate computer vision tasks.
Target audience: Researchers across all Departments, particularly who require high-precision data labeling, AI interested.