Web Archive is an online service (non-profit) that has long been recording various websites by archiving them in a virtually permanent manner so that anyone can analyze the history and changes of a specific website.
I am an advocate of oblivion, and above all, of being in control of one’s own data. Additionally, I often use the Web Archive engine for my work and am well aware of the drawbacks that this type of service causes. Although the underlying idea is interesting, I do not like that the service goes to great lengths to make it difficult to opt out.
There would indeed be a convenient system to instruct search engines (crawlers) on how to treat a specific website. The robots.txt
file would allow each website to define what is allowed to be recorded and what is not. However, Web Archive disregards this.
Let’s see then, as of today, how to be partially ignored by Web Archive. I say partially ignored because it seems that the engine continues to visit pages that are not searchable.
First, the robots.txt
file must be configured unnecessarily to instruct Web Archive to ignore the website:
User-agent: archive.org_bot
Disallow: /
Next, add the verify.txt
file to the root of the website with the following content:
please remove from archive.org
Finally, send an email to [email protected]
requesting the deletion of the domain and associated data from the Web Archive archive:
I am NAME SURNAME owner of EXAMPLE.COM. I'm officially requesting the immediate removal of my site from all archive.org products. The "User-agent: archive.org_bot Disallow: /" code present in our robots.txt file is not being honored. It can be seen at:
https://www.example.com/robots.txt
I am requesting removal of EXAMPLE.COM from all stored dates, including today, and all days going forward. I have been the sole owner of this domain since inception. I have sent this message from my private address, but you can reply to any address hosted at the domain which should be removed. I have also placed a confirmation message at the following link:
https://www.example.com/verify.txt
Thank you for your prompt attention.
DMCA Notice:
I am the site owner and sole copyright holder for each of the domains cited above. This letter is official notification under Section 512(c) of the Digital Millennium Copyright Act ("DMCA"), and I seek the removal of the aforementioned infringing material from your servers. Archive.org does not have any right or permission to reproduce, sell or display my websites in any way, shape or form. I am providing this notice in good faith and with the reasonable belief that rights I own are being infringed. Under penalty of perjury I certify that the information contained in the notification is both true and accurate, and I am the copyright owner and therefore have the authority to act on behalf of the owner of the copyright(s) involved. Thank you for your prompt assistance with this matter.
NAME SURNAME
EXAMPLE.COM
You should receive a confirmation of the deletion shortly.
Let it be clear, the world will not be better or safer after this action, but it remains an action to consider. More and more services today crawl any public data, even for commercial purposes. Various generative AIs, search engines, surveillance services (Clearview AI), cybercrime come to mind…