Spinning up
Spinning up
GEO / AI Search
Definition
Open-source web archive that captures billions of pages monthly and ships them as a public corpus. Used to bootstrap most foundation model training datasets. The CCBot user-agent identifies its crawler.
Related terms
GPTBot
OpenAI's web crawler used to populate ChatGPT's training data and live citations. Identified by the User-Agent string GPTBot/1.0. Sites that block it via robots.txt are excluded from ChatGPT-generated answers.
ClaudeBot
Anthropic's web crawler for Claude. Identified by User-Agent ClaudeBot/1.0 and anthropic-ai. Allowing it in robots.txt is required to be cited in Claude-generated answers.
Same category
We run GEO / AI Search-led engagements every week. A 20-minute call gets you a read on whether common crawl is the right next move for your stack.