Overview
This meeting is the 2nd annual in-person gathering of the GPU Innovators User Community. In our inaugural meeting last year in Atlanta, we packed the room with passionate discussions about NVIDIA's Grace Hopper and Blackwell architectures—the powerhouse hardware that makes today's AI revolution possible. This year, we're taking the next logical step in our community's journey: transitioning from metal to models.
TL;DR
We'll tackle the questions keeping you up at night—which models fit your research needs, what hardware specifications actually matter, and how to deploy these systems using the open-source tools you already trust.
Lunch is served afterwards at the Blues Museum down the street.
Intended Audience
This talk is designed for research computing professionals, HPC administrators, academic researchers, and AI enthusiasts seeking practical guidance on deploying open-source LLMs, selecting appropriate models, and integrating them into existing research computing infrastructure.
Session Description
The rapid evolution of open-source large language models presents researchers with unprecedented opportunities to deploy powerful AI capabilities while maintaining complete control over sensitive data and computational workflows. This tutorial addresses the critical challenge facing research computing centers: how to navigate the complex landscape of open-source LLMs to make informed decisions about model selection, hardware requirements, and deployment strategies that align with existing HPC infrastructure and research objectives.
The session begins with a systematic exploration of leading open-source model families, including DeepSeek's reasoning-optimized variants, Meta's Llama ecosystem, Google's efficiency-focused Gemma models, and specialized options like MedGemma for healthcare applications. Attendees will learn to evaluate models based on benchmark performance, memory requirements, and domain-specific capabilities, while understanding the critical relationship between model parameters, precision formats (FP16/FP8/FP4), and GPU memory allocation. Beyond model selection, participants will explore the open-source framework ecosystem, comparing GUI tools like LM Studio for rapid prototyping with high-performance CLI engines like vLLM and SGLang, which are suitable for SLURM integration and multi-GPU scaling. The session concludes with an introduction to Retrieval-Augmented Generation (RAG) pipelines, which enable researchers to query private document collections. This approach provides a key differentiator from commercial LLM services, ensuring data privacy and institutional control while extending model capabilities with domain-specific knowledge.
If you attended this talk at PEARC25, you might still find this content worth your while—a LOT has changed since then!
Free Lunch
After the seminar, please join us just steps away at the National Blues Museum for a free BBQ lunch, drinks and live music!
Blues, BBQ & Beer Event
Check out other great events during conference week!