Kategorier
Interactive HPC Supercomputing UCloud Vejledning Webinarer og vejledninger - video Workshop

Webinaroptagelse: Fine-Tuning and Deploying  Large Language Models

In this video we will guide you through the complete pipeline of fine-tuning large language models (LLMs) for specialised tasks such as medical question-answering using NeMo Framework and Triton Inference Server.

  • Prepare and preprocess open-source datasets for fine-tuning.
  • Apply Parameter-Efficient Fine-Tuning (PEFT) using LoRA with NVIDIA NeMo Framework.
  • Deploy optimised LLMs using NVIDIA Triton Inference Server and TensorRT-LLM.
  • Generate a synthetic Q&A dataset using Label Studio connected to a live inference backend.
  • Fine-tune and evaluate your customised LLM for domain-specific applications.

All workflows will be executed inside a UCloud project environment with access to GPU resources.

Target audience: Machine learning practitioners, researchers, and engineers interested in LLM customisation, domain adaptation, or scalable model deployment.

Technical Level: Intermediate to Advanced.

Notebooks: https://github.com/emolinaro/ucloud-workshop-28-05-2025