如何反playwright被检测

作为爬取方（你现在是爬虫端），想让反爬方（网站）无法发现你在用 Playwright，尤其是使用有头（headed）+ 带个人资料窗口（launchPersistentContext + userDataDir）** 的场景，核心思路是：**

彻底消除 Playwright 默认注入的痕迹（__pwInitScripts、playwright__binding、CDP 泄漏、navigator.webdriver 等）；
让浏览器行为尽量像真实人工使用个人资料的 Chrome；
没有 100% 保证（Cloudflare Bot Management、DataDome、Akamai 等顶级反爬仍在持续升级），但以下 2026 年最新方法能把检测率压到极低（很多能通过 CreepJS、Sannysoft、rebrowser-bot-detector 等测试）。

最推荐：直接换用 Patchright（Python 专属，2026 年最强 drop-in 替代）
Patchright 是官方 Playwright 的打过补丁的 undetected 版本，专门修复了 CDP 泄漏、console.enable、Runtime.enable 等 Playwright 独有痕迹。它完全兼容原 API，支持 headed + persistent context + 个人资料。

安装（只需一次）：
pip install patchright
patchright install chromium # 或 patchright install chrome 使用真实 Google Chrome

使用示例（直接替换原来的 playwright，保持你的个人资料窗口）：
from patchright.sync_api import sync_playwright

with sync_playwright() as p:
# 带个人资料 + 有头模式（完全保留你的 cookies、扩展、登录状态）
context = p.chromium.launch_persistent_context(
user_data_dir=“你的个人资料路径”, # 例如 ./my_chrome_profile
headless=False, # 有头
viewport={“width”: 1280, “height”: 720},
args=[
“–disable-blink-features=AutomationControlled”, # Patchright 已内置
]
)
page = context.new_page()
page.goto(“https://目标网站.com”)
# … 你的爬取逻辑
# context.close() # 关闭时会保存个人资料

为什么有效：

自动打补丁移除 __pwInitScripts、playwright__binding 等全局变量；
禁用 Console.enable 和 Runtime.enable 泄漏；
内置 --disable-blink-features=AutomationControlled 并清理其他自动化 flag；
已通过 Cloudflare、DataDome、Akamai、Kasada、Fingerprint.com、CreepJS 等测试。

Node.js 用户：用 rebrowser-playwright（GitHub: rebrowser/rebrowser-playwright），同样是 drop-in 替换，效果类似。

进阶：CDP 连接真实已启动的 Chrome（最干净的无注入方式）
不让 Playwright 自己 launch，而是先手动/用其他工具启动一个真实 Chrome（带个人资料），再让 Playwright 通过 CDP “接管”它。这样 Playwright 几乎不注入任何自己的脚本。

步骤：

手动启动 Chrome（或用 SeleniumBase）：
google-chrome --remote-debugging-port=9222 --user-data-dir=./my_profile --no-first-run
Playwright 连接：
from patchright.sync_api import sync_playwright # 推荐仍用 patchright

with sync_playwright() as p:
browser = p.chromium.connect_over_cdp(“http://localhost:9222”)
context = browser.contexts[0] # 自动拿到你的个人资料 context
page = context.pages[0] if context.pages else context.new_page()
# 你的爬取…

优点：完全避免 Playwright 的启动痕迹，适合你已经习惯的有头个人资料窗口。很多 2025-2026 年教程都推荐这种“hijack 真实浏览器”方式。

如果还用原版 Playwright，必须手动 + stealth 插件（次选）

pip install playwright playwright-stealth

from playwright.sync_api import sync_playwright
import playwright_stealth

with sync_playwright() as p:
browser = p.chromium.launch(headless=False)
context = browser.new_context() # 或 launch_persistent_context
page = context.new_page()

# 关键：应用 stealth
playwright_stealth.stealth_sync(page)

# 额外手动补丁（针对你之前提到的检测变量）
page.add_init_script("""
    delete Object.getPrototypeOf(navigator).webdriver;
    Object.defineProperty(window, '__pwInitScripts', { get: () => undefined });
    Object.defineProperty(window, '__playwright__binding__', { get: () => undefined });
""")

但原版 + stealth 已不如 Patchright 强大，对顶级反爬效果较弱。

通用必备行为伪装（无论用哪种方案都要加）
即使技术痕迹清干净，反爬还会看行为：

随机人类操作：用 page.mouse.move() 模拟曲线移动、随机停顿、轻微抖动。
随机延迟：每个操作后 await page.wait_for_timeout(random.randint(800, 2500))。
真实代理：住宅 IP + 轮换（BrightData、Oxylabs、Smartproxy）。
指纹一致：用同一 user_data_dir + 固定 viewport、语言、时区；避免频繁换 UA。
绕过挑战：集成 CAPTCHA 解决服务（2Captcha、Capsolver）或 Turnstile 自动处理。

测试自己是否成功
访问这些检测页（用你的脚本打开）：

如果基本全绿，说明对大多数网站已足够隐蔽。

总结优先级（针对你有头 + 个人资料场景）：

首选 Patchright（最简单、效果最好）
CDP connect 真实 Chrome
原版 + stealth + 手动补丁（备用）

pip install playwright playwright-stealth

/** Comments(0)*/