Webinaroptagelse: Fine-Tuning and Deploying Large Language Models

In this video we will guide you through the complete pipeline of fine-tuning large language models (LLMs) for specialised tasks such as medical question-answering using NeMo Framework and Triton Inference Server.

Prepare and preprocess open-source datasets for fine-tuning.
Apply Parameter-Efficient Fine-Tuning (PEFT) using LoRA with NVIDIA NeMo Framework.
Deploy optimised LLMs using NVIDIA Triton Inference Server and TensorRT-LLM.
Generate a synthetic Q&A dataset using Label Studio connected to a live inference backend.
Fine-tune and evaluate your customised LLM for domain-specific applications.

All workflows will be executed inside a UCloud project environment with access to GPU resources.

Target audience: Machine learning practitioners, researchers, and engineers interested in LLM customisation, domain adaptation, or scalable model deployment.

Technical Level: Intermediate to Advanced.

Notebooks: https://github.com/emolinaro/ucloud-workshop-28-05-2025