====== Archiving ====== //Not to confuse with [[compress|Compression]].// ===== Webpages ===== * print as PDF * https://github.com/webrecorder/warcit * https://webrecorder.net * https://guides.lib.vt.edu/webarchiving/openwarc * wget * [[https://www.petekeen.net/archiving-websites-with-wget|Archiving websites with wget]] (Petekeen.net) * [[https://codeberg.org/chowderman/hyperfiler|Hyperfiler]] – save as a single HTML file. * [[https://archiveweb.page/|ArchiveWeb.page]] – Browser application to save files as web archive bundles. ==== online tools ==== * [[https://web.archive.org/save|Internet Archive Wayback Machine]] – huge foundation, slow servers. * [[https://archive.today|Archive.today]] – owner is weird sometimes. Uses the WARC format, ZIP can be downloaded. * saves web pages to other domains apparently associated with the project: archive.is, archive.vn ==== convert HTML to TXT ==== Use a text mode browser like ''lynx'', ''links2'' or ''w3m'' and dump its output: lynx -dump -display_charset UTF-8 input.html > output.txt w3m -dump -o display_charset=UTF-8 input.html > output.txt ([[https://www.abeautifulsite.net/downloading-a-list-of-urls-automatically|source]])