一站式 Web3 探索中心 | 去中心化应用商店 & Web3 线下活动

在 AI Agents 中一周就像在传统软件中一年。以下是本周 AI 代理中发生的一切，来自 Ramp、Agno、AgentOps、NVIDIA、AutoGen、Context Suite、Replit、Nebius、Firebase、Pipedream、Trae 等。🧵 （保存以备后用）

2/ @nvidia公布了开创性的研究，使用户能够立即获得百科全书长度问题的答案该技术将使代理能够跟踪数月的对话或审查数百万行计算机代码

5/ 博客 v2.0 @AgentOpsAI现已上线！🖇 让您离代理可观测性、基础设施和运维的世界更近一步。@n_sri_laasya

6/ @Firebase 正在利用 Firebase Studio 推进代理 AI 开发。🚀

7/ 认识@contextsuite，第一个 AI 办公套件。人类每年在办公室工作上花费 2.5 万亿小时。上下文可以一次性拍摄大部分内容。@josephsemrai

9/ @kevinlu625 推出了 Orchids - 世界上第一个 AI 工具，可让您与 AI 聊天以构建外观和感觉上都不是“AI 生成”的应用程序和网站。

10/ @Trae_ai开源的 Trae-Agent。你现在可以把“git clone”称为“cd trae-agent”了！🔥

11/ 如果你想大幅加快你的发货迭代，你必须使用 playwright MCP 并告诉你的代理如何在你的 AGENT（.）MD（或光标/克劳德/双子座规则） @ryancarson

12/ 使用 Google Agent 开发工具包（ADK） 100% 开源代码构建具有结构化输出的客户支持票证代理。🤝 @Saboo_Shubham_ @AgentOpsAI原生支持 Google ADK。

13/ @tryramp – 进入代理编排的第一步。@diegozaks 一体化财务运营平台，可节省企业时间和金钱。受到 40,000+ 团队的信赖。

14/ 这个 Claude MCP AI 代理取代了您 $200K+ 的运营团队。它审计了@aryanXmahajan的整个业务，发现了 12 个瓶颈，并构建了 5 个生产就绪的 n8n 代理。

15/ @mckaywrigley分享了他关于如何使用 Claude Code 进行笔记和研究的 1 小时教程。📝

16/ @JulianGoldieSEO 分享这个新的 AI作系统 🤯

17/ @JulianGoldieSEO测试了每个人工智能网站建设者，只有一个他会真正使用的——MiniMax。

20/ @nebiusaistudio 博客：代理 101 – 大规模推出生产级 AI 代理 🤖 全部由 Nebius AI Studio 提供支持 – 30 多个开源模型，快速推理，经济实惠的层级，以及无缝的即插即用兼容性。感谢您提到 @AgentOpsAI!

21/ “可靠性是代理的游戏名称，在可预见的未来，这不太可能仅仅在模型层面上解决。” @anaganath

Reliability is the name of the game for agents, and it's unlikely to be solved purely at the model layer for the foreseeable future. This is creating green shoots for infrastructure builders, with a few interesting trends starting to emerge: 1. Simulation as CI for agents: a) The most valuable piece of data today is trajectory data i.e. collections of task (P) -> {t1, t2... tk} mappings. With more trajectory data, agents can be improved with techniques like RFT. b) Since these trajectories can be quite specific to a company's underlying data (D), you need to be able to actually simulate the behavior of agents within your environment vs. rely on 3P trajectory data. So, how might you do this? - Maintain an agent and MCP registry for an enterprise, and a staging environment. Bootstrap a metadata layer that contains the objective of each agent, the tools it has access to, the scope of each agent vis.a.vis each tool etc. Your SDK may need to generate MCP servers on the fly for certain internal applications. - Execute scenarios in staging for each agent by providing prompt / task variations, inspecting the tool calls produced and evaluating performance against a multi-objective reward function (e.g. performance against the objective, minimization of tool invocations). - A critical component is accurately providing quantifiable reward functions for each agent that unlock high-fidelity evals and close the loop for reliable CI. - All of this needs to be productized: easy-to-adopt infrastructure that developers can extend, but with batteries included. You can start to see a new paradigm forming—not unit tests for code, but simulation harnesses for agents. What happens when you get trajectory data? 2. Enterprises will move to "context lakes": - An evolving, queryable memory layer that serves as a hub for agent trajectories enriched by enterprise data stored in the delta lake / SNOW. A potent mix of a knowledge base, a semantic cache, and an execution log. - Extremely fast reads for inference-time retrieval that supports high QPS. - As mentioned in a prior post, the semantic cache (really interesting opportunity for startups) will cluster task–trajectory pairs (e.g., via k-means), enabling fast retrieval and “result fusing” during planning or tool selection. Agents will dip into the context lake constantly. High QPS, low-latency context fetch will become as important as fast embedding search is today. 3. Agent authentication becomes a first-class concern: -Traditional OAuth and API key models break down when agents act on behalf of users and themselves, across long-lived sessions. -You need a framework for agent identity, delegation, and scoping—one that supports things like tool level permissions, task bound credentials and delegation graphs. We’re entering an era where testing software means simulating behavior, querying software means retrieving context, and securing software means authenticating autonomous agents.

22/ @jxnlco分享了为什么您的编码代理不再需要 rag，以及 rag 发生了什么。💭

23/ @AgentOpsAI 已准备好开始我们的代理托管产品的入职项目。如果您想生产您的代理，请私信我。📩 @braelyn_ai @AlexReibman @ssslomp