Groq was founded in 2016 by Jonathan Ross in Mountain View, California. Ross previously designed the Tensor Processing Unit (TPU) at Google and left to build a new type of AI chip focused specifically on inference speed.
Groq raised over $360 million, including a $640 million Series D round in mid-2024 at an $2.8 billion valuation. Investors include BlackRock, Samsung Catalyst Fund, and Tiger Global.
Groq’s Language Processing Unit (LPU) uses a deterministic architecture that eliminates the scheduling overhead and memory bottlenecks found in GPU-based inference. The result is remarkably fast token generation — Groq can serve models like LLaMA 3 70B at over 300 tokens per second per user, making AI responses feel nearly instantaneous.
The company’s GroqCloud platform offers API access to popular open-source models at speeds that have attracted significant developer attention. When Groq launched its public demo in early 2024, the speed difference compared to GPU-based APIs was immediately noticeable and went viral on social media.
Groq serves developers, enterprises, and AI companies that need low-latency inference for real-time applications like conversational AI, coding assistants, and autonomous agents. The company is building out its own data center infrastructure and partners with cloud providers. Groq employs around 300 people and is expanding manufacturing capacity to meet growing demand.