Archiving
Not to confuse with Compression.
Webpages
- print as PDF
- wget
- Archiving websites with wget (Petekeen.net)
- Hyperfiler – save as a single HTML file.
- ArchiveWeb.page – Browser application to save files as web archive bundles.
online tools
- Internet Archive Wayback Machine – huge foundation, slow servers.
- Archive.today – owner is weird sometimes. Uses the WARC format, ZIP can be downloaded.
- saves web pages to other domains apparently associated with the project: archive.is, archive.vn
convert HTML to TXT
Use a text mode browser like lynx
, links2
or w3m
and dump its output:
lynx -dump -display_charset UTF-8 input.html > output.txt w3m -dump -o display_charset=UTF-8 input.html > output.txt
(source)