Skip to main content

2 posts tagged with "llm-d release news!"

llm-d tag description

View All Tags

llm-d 0.2: Our first well-lit paths (mind the tree roots!)

· One min read
Robert Shaw
Director of Engineering, Red Hat
Clayton Coleman
Distinguished Engineer, Google
Carlos Costa
Distinguished Engineer, IBM

Our 0.2 release delivers three well-lit paths to accelerate deploying large scale inference on Kubernetes - better load balancing, lower latency with disaggregation, and native vLLM support for very large Mixture of Expert models like DeepSeekR1.

We’ve also enhanced our deployment and benchmarking tooling, incorporating lessons from real-world infrastructure deployments and addressing key antipatterns. This release gives llm-d users, contributors, researchers, and operators, clearer guides for efficient use in tested, reproducible scenarios.

[rest of post to follow]

Announcing the llm-d community!

· 11 min read
Robert Shaw
Director of Engineering, Red Hat
Clayton Coleman
Distinguished Engineer, Google
Carlos Costa
Distinguished Engineer, IBM

Announcing the llm-d community

llm-d is a Kubernetes-native high-performance distributed LLM inference framework
- a well-lit path for anyone to serve at scale, with the fastest time-to-value and competitive performance per dollar for most models across most hardware accelerators.

With llm-d, users can operationalize gen AI deployments with a modular, high-performance, end-to-end serving solution that leverages the latest distributed inference optimizations like KV-cache aware routing and disaggregated serving, co-designed and integrated with the Kubernetes operational tooling in Inference Gateway (IGW).