Galileo launches 'Agentic Evaluations' to fix AI agent errors before they cost you

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


Galileo, a San Francisco-based startup, is betting that the future of artificial intelligence depends on trust. Today, the company launched a new product, Agentic Evaluations, to address a growing challenge in the world of AI: making sure the increasingly complex systems known as AI agents actually work as intended.

AI agents—autonomous systems that perform multi-step tasks like generating reports or analyzing customer data—are gaining traction across industries. But their rapid adoption raises a crucial question: How can companies verify these systems remain reliable after deployment? Galileo’s CEO, Vikram Chatterji, believes his company has found the answer.

“Over the last six to eight months, we started to see some of our customers trying to adopt agentic systems,” said Chatterji in an interview. “Now LLMs can be used as a smart router to pick and choose the right API calls towards actually completing a task. Going from just generating text to actually completing a task was a very big chasm that was unlocked.”

A diagram showing how Galileo evaluates AI agents at three key stages: tool selection, error detection and task completion. (Credit: Galileo)

AI agents show promise, but enterprises demand accountability

Major enterprises like Cisco and Ema (founded by Coinbase’s former Chief Product Officer) have already adopted Galileo’s platform. These companies use AI agents to automate tasks from customer support to financial analysis, reporting significant productivity gains.

“A sales representative who’s trying to do outreach and outbounds would otherwise use maybe a week of their time to do that, versus with some of these AI-enabled agents, they’re doing that within two days or less,” Chatterji explained, highlighting the return on investment for enterprises.

Galileo’s new framework evaluates tool selection quality, detects errors in tool calls, and tracks overall session success. It also monitors essential metrics for large-scale AI deployment, including costs and latency.

A dashboard showing how Galileo evaluates AI agents at three key stages: tool selection, error detection and task completion. (Credit: Galileo)

$68 million in funding fuels Galileo’s push into enterprise AI

The launch builds on Galileo’s recent momentum. The company raised $45 million in Series B funding led by Scale Venture Partners last October, bringing its total funding to $68 million. Industry analysts project the market for AI operations tools could reach $4 billion by 2025.

The stakes are high as AI deployment accelerates. Studies show even advanced models like GPT-4 can hallucinate about 23% of the time during basic question-and-answer tasks. Galileo’s tools help enterprises identify these issues before they impact operations.

“Before we launch this thing, we really, really need to know that this thing works,” Chatterji said, describing customer concerns. “The bar is really high. So that’s where we gave them this tool chain, such that they could just use our metrics as the basis for these tests.”

Addressing AI hallucinations and enterprise-scale challenges

The company’s focus on reliable, production-ready solutions positions it well in a market increasingly concerned with AI safety. For technical leaders deploying enterprise AI, Galileo’s platform provides essential guardrails for ensuring AI agents perform as intended while controlling costs.

As enterprises expand their use of AI agents, performance monitoring tools become crucial infrastructure. Galileo’s latest offering aims to help businesses deploy AI responsibly and effectively at scale.

“2025 will be the year of agents. It is going to be very prolific,” Chatterji noted. “However, what we’ve also seen is a lot of companies that are just launching these agents without good testing is leading to negative implications… The need for proper testing and evaluations is more than ever before.”



Source link