In this video we will guide you through the complete pipeline of fine-tuning large language models (LLMs) for specialised tasks such as medical question-answering using NeMo Framework and Triton Inference Server.
- Prepare and preprocess open-source datasets for fine-tuning.
- Apply Parameter-Efficient Fine-Tuning (PEFT) using LoRA with NVIDIA NeMo Framework.
- Deploy optimised LLMs using NVIDIA Triton Inference Server and TensorRT-LLM.
- Generate a synthetic Q&A dataset using Label Studio connected to a live inference backend.
- Fine-tune and evaluate your customised LLM for domain-specific applications.
All workflows will be executed inside a UCloud project environment with access to GPU resources.
Target audience: Machine learning practitioners, researchers, and engineers interested in LLM customisation, domain adaptation, or scalable model deployment.
Technical Level: Intermediate to Advanced.
Notebooks: https://github.com/emolinaro/ucloud-workshop-28-05-2025