AI & Machine Learning

Baseten

4.42

provides GPU infrastructure for deploying and scaling machine learning models in production with high performance.

Visit Website

Baseten was founded in 2019 by Tuhin Srivastava, Amir Haghighat, and Philip Howes in San Francisco. The company offers a model inference platform that helps ML teams deploy models to production with optimized GPU utilization and low latency.

Baseten raised over $60 million, including a $40 million Series B led by IVP and Spark Capital in 2024. The company’s growth has been driven by the explosion in demand for LLM inference infrastructure.

The platform’s core product is Truss, an open-source framework for packaging and serving ML models. Baseten handles autoscaling, GPU allocation, and traffic management, letting ML engineers focus on models rather than infrastructure. The platform supports all major model types, from small classification models to 70B+ parameter LLMs.

Baseten differentiates on performance — their infrastructure is optimized for low-latency, high-throughput inference. They offer features like model chaining (running multiple models in sequence), A/B testing, and detailed observability for production ML systems.

Customers include notable AI companies and enterprises running models at scale. The platform processes billions of inference requests and has become a preferred choice for teams that need production-grade ML serving without building their own infrastructure. Baseten employs around 60 people and focuses exclusively on the model deployment and serving layer of the AI stack.