AI 周报 — 2026年5月第1周

周期：北京时间 2026-04-27 ~ 2026-05-03
本期重点：agentic workflow 正在从“演示能力”进入“可交付产品”阶段。大厂继续卷 coding / research / design 三类高价值工作流，开源阵营则把重点放在 long-context、tool use、multimodal 三条主线。

🔥 本周热点

1. OpenAI 发布 GPT-5.5，继续把重心压在 agentic coding 与 computer use

OpenAI 在 4 月 23 日发布 GPT-5.5，并在 4 月 24 日将 GPT-5.5 / GPT-5.5 Pro 带到 API。官方定位非常明确：更擅长 coding、research、data analysis、tool use 和 computer workflows，同时保持接近 GPT-5.4 的延迟表现。

看点：这不是单纯“更强一点”的模型升级，而是继续强化“把模糊任务拆开并自己干完”的能力。
对开发者的意义：AI coding agent、自动化研究助手、跨工具工作流会进一步受益。
来源：https://openai.com/index/introducing-gpt-5-5/

2. Google Cloud Next ‘26 把企业 AI 叙事推进到“Agent Platform”层

Google 在 Cloud Next ‘26 上集中发布 Gemini Enterprise Agent Platform、Gemini Enterprise app、Agentic Data Cloud、TPU 8 系列等，重点不是单个模型，而是企业如何 构建、治理、部署、运行 agent。

看点：Google 不再只卖模型，而是在卖完整 agent stack。
对企业的意义：低代码 Agent Studio、Agent Inbox、跨云数据访问，让 agent 更接近真正的业务系统组件。
来源：https://blog.google/innovation-and-ai/infrastructure-and-cloud/google-cloud/google-cloud-next-26-recap/

3. “AI 购物代理”进入平台博弈：Amazon 起诉 Perplexity

这周最值得关注的讨论之一，不是新模型 benchmark，而是 平台是否允许 AI agent 替用户完成交易。OpenTools 报道 Amazon 正在起诉 Perplexity，试图阻止其购物 agent 插入 Amazon 与消费者之间。

看点：这可能决定未来 agent 是否能真正成为“用户代理人”，还是只能做推荐层。
趋势信号：2026 年 AI 竞争已经从“谁更会回答问题”进入“谁拥有交易入口与执行权”。
来源：https://opentools.ai/news

4. 开源长上下文竞争继续加速：DeepSeek-V4 把 1M context 做成 agent 友好形态

Hugging Face 的分析文章指出，DeepSeek-V4-Pro / V4-Flash 不只是把 context 做大到 1M token，更重要的是围绕 agent 长流程任务，优化了 KV cache、attention 结构、tool-call schema 和跨轮 reasoning 保留。

看点：长上下文不再只是 marketing 参数，而是开始真正服务于长流程 agent。
对生态的意义：开源模型在“可部署 agent 基座”方向继续逼近闭源阵营。
来源：https://huggingface.co/blog/deepseekv4

5. Claude Design 发布，AI 正在从“写代码”扩展到“出设计稿与原型”

Anthropic 推出 Claude Design，支持通过对话生成设计稿、原型、slides、one-pager，并可衔接 Claude Code 交付实现。它由 Claude Opus 4.7 驱动，定位非常直接：让设计探索、原型验证和 handoff 更快。

看点：设计工作流正式成为大模型厂商争夺的新主战场。
对团队的意义：PM、设计师、市场团队和开发之间的 handoff 可能被明显压缩。
来源：https://www.anthropic.com/news/claude-design-anthropic-labs

🛠️ 新工具 / 产品发布

1. GPT-5.5 / GPT-5.5 Pro

OpenAI 本周最重要发布。主打 agentic coding、browser / tool use、complex task completion，并已进入 ChatGPT、Codex 与 API。

来源：https://openai.com/index/introducing-gpt-5-5/

2. Deep Research Max

Google 推出新一代 Deep Research / Deep Research Max，基于 Gemini 3.1 Pro，支持 MCP、文件与私有数据源接入、原生图表生成，更像“可编排研究 agent”而不是单纯摘要工具。

来源：https://blog.google/innovation-and-ai/models-and-research/gemini-models/next-generation-gemini-deep-research/

3. Gemini Enterprise Agent Platform

Google 面向企业推出端到端的 agent 平台，包含 Agent Studio、模型接入、治理与扩展能力，明确瞄准企业 agent 开发和运行场景。

来源：https://blog.google/innovation-and-ai/infrastructure-and-cloud/google-cloud/google-cloud-next-26-recap/

4. Gemini 文件生成与导出能力

Gemini 现在可直接在对话中生成 PDF、Word、Excel、Docs、Sheets、Slides、Markdown 等文件，把“聊天结果”直接转成可交付产物。

来源：https://blog.google/innovation-and-ai/products/gemini-app/generate-files-in-gemini/

5. Claude Design

Anthropic 把 Claude 推进到设计与原型工具层，支持视觉稿、交互原型、PPTX / PDF / HTML 导出，并可 handoff 给 Claude Code。

来源：https://www.anthropic.com/news/claude-design-anthropic-labs

6. Cursor SDK（public beta）

Cursor 发布 TypeScript SDK，让团队可以直接用 Cursor 的 runtime、sandbox 和模型能力构建 programmatic coding agents，并接入本地、云端或 self-hosted 工作流。

来源：https://cursor.com/blog/typescript-sdk

7. Mistral Vibe Remote Agents + Mistral Medium 3.5

Mistral 把 coding agent 从本地终端搬到云端并行执行，同时发布 Medium 3.5 作为默认模型，强化长流程 coding / productivity 任务。

来源：https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5

8. DeepSeek-V4-Pro / DeepSeek-V4-Flash

开源阵营本周最值得关注的 agent 基座之一：1M context、MoE 架构、针对工具调用与长轨迹任务优化。

来源：https://huggingface.co/blog/deepseekv4

9. NVIDIA Nemotron 3 Nano Omni

NVIDIA 发布新的 omni-modal understanding 模型，覆盖文档、音频、视频、GUI 和 general multimodal reasoning，明显面向 document / media / computer-use agents。

来源：https://huggingface.co/blog/nvidia/nemotron-3-nano-omni-multimodal-intelligence

10. Advanced Account Security（ChatGPT / Codex）

虽然不是模型升级，但这是本周很实用的产品更新：OpenAI 为高风险用户提供更强的账户保护，包括 passkeys / security keys 等机制。

来源：https://openai.com/index/advanced-account-security/

📊 模型更新

GPT-5.5（OpenAI）

强化方向：agentic coding、computer use、research、tool use
特征：更强任务自治、更少 token 消耗、API 已上线
来源：https://openai.com/index/introducing-gpt-5-5/

Claude Opus 4.7（Anthropic）

强化方向：高级软件工程、长任务一致性、视觉分辨率与创意质量
特征：Anthropic 明确强调其在复杂 coding workflow 与多步任务上的稳定性提升
来源：https://www.anthropic.com/news/claude-opus-4-7

Gemini 3.1 Pro 驱动的 Deep Research / Deep Research Max

强化方向：autonomous research、MCP、私有数据接入、图表生成
特征：模型能力正被包装成“研究 agent 服务”而不是单纯 API
来源：https://blog.google/innovation-and-ai/models-and-research/gemini-models/next-generation-gemini-deep-research/

DeepSeek-V4-Pro / V4-Flash

强化方向：1M context、agent 轨迹承载、tool-use 稳定性
特征：更低 FLOPs、更低 KV cache，更适合长流程 agent 部署
来源：https://huggingface.co/blog/deepseekv4

NVIDIA Nemotron 3 Nano Omni

强化方向：document intelligence、audio/video understanding、GUI / computer use
特征：open-weight omni-modal 模型继续补强企业文档与多媒体场景
来源：https://huggingface.co/blog/nvidia/nemotron-3-nano-omni-multimodal-intelligence

💡 值得关注的趋势

1. Agent 竞争从“模型分数”转向“完整交付链路”

本周最明显的变化，是各家不再只发模型，而是在发 agent runtime、平台、SDK、sandbox、handoff、权限与治理。Cursor SDK、Gemini Enterprise Agent Platform、Mistral Remote Agents 都是同一方向。

2. 三大高价值工作流已成共识：Coding / Research / Design

Coding：GPT-5.5、Claude Opus 4.7、Cursor、Mistral Vibe
Research：Deep Research Max、长上下文模型
Design：Claude Design、Gemini 文件生成这说明 AI 的主战场已经不是“聊天体验”，而是高频知识工作。

3. Long-context 正在从参数宣传变成 agent 基础设施

DeepSeek-V4 的核心价值不只是 1M context，而是如何让长任务真的跑得动、跑得稳、跑得便宜。未来谁能把 context 成本压下去，谁就更适合 agent 时代。

4. 多模态 agent 继续升温

Nemotron 3 Nano Omni、Claude Design、Google 的图表 / 文件 / 视觉生成能力都说明：未来 agent 不只是 text in / text out，而是要理解文档、图像、音频、视频，并生成可交付内容。

5. 平台与渠道控制会成为下一轮 AI 摩擦点

Amazon vs Perplexity 这类事件说明，真正难的可能不是模型能不能“帮你买”，而是平台愿不愿意让 agent 代你完成购买。支付、身份、风控、平台准入，会成为下一阶段大问题。

结语

本周的共同主题很清晰：AI 正在从“会回答”升级到“会执行、会交付、会接入企业系统”。如果说 2025 年大家在争谁更聪明，那么 2026 年更像是在争——谁能把 agent 做成真正能上班的系统。

🔥 本周热点#

1. OpenAI 发布 GPT-5.5，继续把重心压在 agentic coding 与 computer use#

2. Google Cloud Next ‘26 把企业 AI 叙事推进到“Agent Platform”层#

3. “AI 购物代理”进入平台博弈：Amazon 起诉 Perplexity#

4. 开源长上下文竞争继续加速：DeepSeek-V4 把 1M context 做成 agent 友好形态#

5. Claude Design 发布，AI 正在从“写代码”扩展到“出设计稿与原型”#

🛠️ 新工具 / 产品发布#

1. GPT-5.5 / GPT-5.5 Pro#

2. Deep Research Max#

3. Gemini Enterprise Agent Platform#

4. Gemini 文件生成与导出能力#

5. Claude Design#

6. Cursor SDK（public beta）#

7. Mistral Vibe Remote Agents + Mistral Medium 3.5#

8. DeepSeek-V4-Pro / DeepSeek-V4-Flash#

9. NVIDIA Nemotron 3 Nano Omni#

10. Advanced Account Security（ChatGPT / Codex）#

📊 模型更新#

GPT-5.5（OpenAI）#

Claude Opus 4.7（Anthropic）#

Gemini 3.1 Pro 驱动的 Deep Research / Deep Research Max#

DeepSeek-V4-Pro / V4-Flash#

NVIDIA Nemotron 3 Nano Omni#

💡 值得关注的趋势#

1. Agent 竞争从“模型分数”转向“完整交付链路”#

2. 三大高价值工作流已成共识：Coding / Research / Design#

3. Long-context 正在从参数宣传变成 agent 基础设施#

4. 多模态 agent 继续升温#

5. 平台与渠道控制会成为下一轮 AI 摩擦点#

结语#