一站式 Web3 探索中心 | 去中心化應用商店 & Web3 線下活動 | OKX

AI Agents 的一周就像傳統軟體的一年。以下是本週 AI 代理中發生的一切，來自 Ramp、Agno、AgentOps、NVIDIA、AutoGen、Context Suite、Replit、Nebius、Firebase、Pipedream、Trae 等。🧵 （留到以後再用）

2/ @nvidia公佈了突破性的研究，該研究將使用戶能夠立即獲得百科全書長度的問題的答案該技術將使代理能夠跟踪數月的對話或審查數百萬行計算機代碼

5/ 博客 v2.0 @AgentOpsAI現已上線！🖇 讓您離代理可觀測性、基礎設施和操作的世界更近一步。@n_sri_laasya

6/ @Firebase 正在利用 Firebase Studio 推進代理 AI 開發。🚀

7/ 認識@contextsuite，第一個 AI 辦公室套件。人類每年在辦公室工作上花費 2.5 兆小時。上下文可以一次性拍攝大部分內容。@josephsemrai

9/ @kevinlu625 推出了 Orchids - 世界上第一個 AI 工具，可讓您與 AI 聊天，以構建看起來和感覺上都不是“AI 生成”的應用程序和網站。

10/ @Trae_ai開源的 Trae-Agent。您現在可以將“git clone”稱為“cd trae-agent”！🔥

11/ 如果您想大幅加快運輸的迭代速度，您必須使用 playwright MCP 並告訴您的代理如何在您的 AGENT（.）MD（或遊標/克勞德/雙子座規則） @ryancarson

12/ 使用 Google Agent 開發工具包（ADK） 100% 開源代碼構建具有結構化輸出的客戶支持票證代理。🤝 @Saboo_Shubham_ @AgentOpsAI原生支援 Google ADK。

13/ @tryramp – 代理協調的第一步。@diegozaks 一體化財務運營平台，可節省企業時間和金錢。受到 40,000+ 團隊的信賴。

14/ 這款 Claude MCP AI 代理取代了您 $200K+ 的營運團隊。它審計了@aryanXmahajan的整個業務，發現了 12 個瓶頸，並構建了 5 個生產就緒的 n8n 代理。

15/ @mckaywrigley分享了他關於如何使用 Claude Code 進行筆記和研究的 1 小時教程。📝

16/ @JulianGoldieSEO 分享這個新的 AI 操作系統 🤯

17/ @JulianGoldieSEO測試了每個人工智慧網站建立器，只有一個他會真正使用的 - MiniMax。

20/ @nebiusaistudio 部落格：代理人 101 – 大規模推出生產級 AI 代理 🤖 全部由 Nebius AI Studio 提供支持 – 30 多個開源模型、快速推理、具成本效益的層級，以及無縫的即插即用兼容性。感謝包含 @AgentOpsAI！

21/ 「對於代理來說，可靠性是遊戲的名稱，而在可預見的未來，這不太可能僅僅在模型層面上解決..」@anaganath

Reliability is the name of the game for agents, and it's unlikely to be solved purely at the model layer for the foreseeable future. This is creating green shoots for infrastructure builders, with a few interesting trends starting to emerge: 1. Simulation as CI for agents: a) The most valuable piece of data today is trajectory data i.e. collections of task (P) -> {t1, t2... tk} mappings. With more trajectory data, agents can be improved with techniques like RFT. b) Since these trajectories can be quite specific to a company's underlying data (D), you need to be able to actually simulate the behavior of agents within your environment vs. rely on 3P trajectory data. So, how might you do this? - Maintain an agent and MCP registry for an enterprise, and a staging environment. Bootstrap a metadata layer that contains the objective of each agent, the tools it has access to, the scope of each agent vis.a.vis each tool etc. Your SDK may need to generate MCP servers on the fly for certain internal applications. - Execute scenarios in staging for each agent by providing prompt / task variations, inspecting the tool calls produced and evaluating performance against a multi-objective reward function (e.g. performance against the objective, minimization of tool invocations). - A critical component is accurately providing quantifiable reward functions for each agent that unlock high-fidelity evals and close the loop for reliable CI. - All of this needs to be productized: easy-to-adopt infrastructure that developers can extend, but with batteries included. You can start to see a new paradigm forming—not unit tests for code, but simulation harnesses for agents. What happens when you get trajectory data? 2. Enterprises will move to "context lakes": - An evolving, queryable memory layer that serves as a hub for agent trajectories enriched by enterprise data stored in the delta lake / SNOW. A potent mix of a knowledge base, a semantic cache, and an execution log. - Extremely fast reads for inference-time retrieval that supports high QPS. - As mentioned in a prior post, the semantic cache (really interesting opportunity for startups) will cluster task–trajectory pairs (e.g., via k-means), enabling fast retrieval and “result fusing” during planning or tool selection. Agents will dip into the context lake constantly. High QPS, low-latency context fetch will become as important as fast embedding search is today. 3. Agent authentication becomes a first-class concern: -Traditional OAuth and API key models break down when agents act on behalf of users and themselves, across long-lived sessions. -You need a framework for agent identity, delegation, and scoping—one that supports things like tool level permissions, task bound credentials and delegation graphs. We’re entering an era where testing software means simulating behavior, querying software means retrieving context, and securing software means authenticating autonomous agents.

22/ @jxnlco分享了為什麼您的編碼代理不再需要 rag，以及 rag 發生了什麼。💭

23/ @AgentOpsAI 已準備好開始我們的代理託管產品的入職專案。如果您想生產您的代理，請私信我。📩 @braelyn_ai @AlexReibman @ssslomp