# This robots.txt was initially based on https://seirdy.one/robots.txt # Opt out of online advertising so malware that injects ads on the # site won't get paid. The ads.txt file contains a standard # placeholder to forbid any compliant ad networks from paying for # ad placement on this domain. User-Agent: Adsbot Disallow: / Allow: /ads.txt Allow: /app-ads.txt # "By allowing us access, you enable the maximum number of # advertisers to confidently purchase advertising space on your pages. # Our comprehensive data insights help advertisers understand the # suitability and context of your content, ensuring that their ads # align with your audience's interests and needs. This alignment # leads to improved user experiences, increased engagement, and # ultimately, higher revenue potential for your publication." # (https://www.peer39.com/crawler-notice) # --> fuck off. User-agent: peer39_crawler User-Agent: peer39_crawler/1.0 Disallow: / # The next three borrowed from https://www.videolan.org/robots.txt # "This robot collects content from the Internet for the sole purpose # of helping educational institutions prevent plagiarism. [...] we # compare student papers against the content we find on the Internet # to see if we can find similarities." # (http://www.turnitin.com/robot/crawlerinfo.html) # --> fuck off. User-Agent: TurnitinBot Disallow: / # "NameProtect engages in crawling activity in search of a wide range # of brand and other intellectual property violations that may be of # interest to our clients." # (http://www.nameprotect.com/botinfo.html) # --> fuck off. User-Agent: NPBot Disallow: / # "iThenticate is a new service we have developed to combat the # piracy of intellectual property and ensure the originality of # written work for publishers, non-profit agencies, corporations, and # newspapers." # (http://www.slysearch.com/) # --> fuck off. User-Agent: SlySearch Disallow: / # "BLEXBot assists internet marketers to get information on the link # structure of sites and their interlinking on the web, to avoid any # technical and possible legal issues and improve overall online # experience." # (http://webmeup-crawler.com/) # --> fuck off. User-Agent: BLEXBot Disallow: / # "Providing Intellectual Property professionals with superior brand # protection services by artfully merging the latest technology with # expert analysis." # (https://www.checkmarknetwork.com/spider.html/) # "The Internet is just way to big to effectively police alone." # (ACTUAL quote) # --> fuck off. User-agent: CheckMarkNetwork/1.0 (+https://www.checkmarknetwork.com/spider.html) Disallow: / # "Stop trademark violations and affiliate non-compliance in paid # search. Automatically monitor your partner and affiliates’ online # marketing to protect yourself from harmful brand violations and # regulatory risks. We regularly crawl websites on behalf of our # clients to ensure content compliance with brand and regulatory # guidelines." # (https://www.brandverity.com/why-is-brandverity-visiting-me) # --> fuck off. User-agent: BrandVerity/1.0 Disallow: / # "Pipl assembles online identity information from multiple # independent sources to create the most complete picture of a digital # identity and connect it to real people and their offline identity # records. When all the fragments of online identity data are # collected, connected, and corroborated, the result is a more # trustworthy identity." # --> fuck off. User-agent: PiplBot Disallow: / # Gen-AI data scrapers # OpenAI # --> fuck off. User-agent: ChatGPT-User User-agent: GPTBot Disallow: / # Official way to opt-out of Google's generative AI training: # User-agent: Google-Extended Disallow: / # Official way to opt-out of LLM training by Apple # User-agent: Applebot-Extended Disallow: / # Anthropic-AI crawler posted guidance after a long period of crawling # without opt-out documentation: # User-agent: ClaudeBot Disallow: / # "FacebookBot crawls public web pages to improve language models for # our speech recognition technology." # # --> fuck off. User-Agent: FacebookBot Disallow: / # More of the same shit and misc # --> fuck off. User-agent: Amazonbot User-agent: anthropic-ai User-agent: Bytespider Disallow: / User-agent: AhrefsBot User-agent: Clickagy User-agent: SemrushBot User-agent: SemrushBot-BA User-agent: SemrushBot-COUB User-agent: SemrushBot-CT User-agent: SemrushBot-SI User-agent: SemrushBot-SWA User-agent: SiteAuditBot User-agent: SplitSignalBot Disallow: /