Archiving

Not to confuse with Compression.

  • Internet Archive Wayback Machine – huge foundation, slow servers.
  • Archive.today – owner is weird sometimes. Uses the WARC format, ZIP can be downloaded.
    • saves web pages to other domains apparently associated with the project: archive.is, archive.vn

Use a text mode browser like lynx, links2 or w3m and dump its output:

lynx -dump -display_charset UTF-8 input.html > output.txt
w3m -dump -o display_charset=UTF-8 input.html > output.txt

(source)

  • Last modified: 2021-09-18 21:32