Implementing Open-Source LLMs in Research Computing: From Model Selection to On-Premises Deployment (Part 1 of 2)
This is a 2 part series. The description below is for both sessions.
The rapid evolution of open-source large language models presents academic researchers with unprecedented opportunities to deploy powerful AI capabilities while maintaining complete control over sensitive data and computational workflows. This tutorial addresses the critical challenge facing research computing centers: how to navigate the complex landscape of open-source LLMs to make informed decisions about model selection, hardware requirements, and deployment strategies that align with existing HPC infrastructure and research objectives.
The session begins with a systematic exploration of leading open-source model families, including DeepSeek's reasoning-optimized variants, Meta's Llama ecosystem, Google's efficiency-focused Gemma models, and specialized options like MedGemma for healthcare applications. Attendees will learn to evaluate models based on benchmark performance, memory requirements, and domain-specific capabilities, while understanding the critical relationship between model parameters, precision formats (FP16/FP8/FP4), and GPU memory allocation. Beyond model selection, participants will explore the open-source framework ecosystem, comparing GUI tools like LM Studio for rapid prototyping against high-performance CLI engines like vLLM and SGLang suitable for SLURM integration and multi-GPU scaling. The session concludes with an introduction to Retrieval-Augmented Generation (RAG) pipelines, which enable researchers to query private document collections. This approach provides a key differentiator from commercial LLM services, ensuring data privacy and institutional control while extending model capabilities with domain-specific knowledge.
Target Audience: Academic researchers, HPC administrators, and research computing professionals seeking to understand open-source LLM deployment options, model selection criteria, and integration strategies for existing research computing infrastructure, with emphasis on data privacy, cost control, and workflow compatibility.