====== Archiving ======
//Not to confuse with [[compress|Compression]].//

===== Webpages =====
  * print as PDF
  * https://github.com/webrecorder/warcit
  * https://webrecorder.net
  * https://guides.lib.vt.edu/webarchiving/openwarc
  * wget
    * [[https://www.petekeen.net/archiving-websites-with-wget|Archiving websites with wget]] (Petekeen.net)
  * [[https://codeberg.org/chowderman/hyperfiler|Hyperfiler]] – save as a single HTML file.
  * [[https://archiveweb.page/|ArchiveWeb.page]] – Browser application to save files as web archive bundles.

==== online tools ====
  * [[https://web.archive.org/save|Internet Archive Wayback Machine]] – huge foundation, slow servers.
  * [[https://archive.today|Archive.today]] – owner is weird sometimes. Uses the WARC format, ZIP can be downloaded.
    * saves web pages to other domains apparently associated with the project: archive.is, archive.vn

==== convert HTML to TXT ====
Use a text mode browser like ''lynx'', ''links2'' or ''w3m'' and dump its output:
<code bash>
lynx -dump -display_charset UTF-8 input.html > output.txt
w3m -dump -o display_charset=UTF-8 input.html > output.txt
</code>

([[https://www.abeautifulsite.net/downloading-a-list-of-urls-automatically|source]])