MHTifier and Notes

Needed to hack MHT (MHTML) files under unreal circumstances, trying to inject some sense (viz JavaScript) into the output of a "legacy" BI thingy… Nu.

Prior art

It's an ancient (not to say obsolete) format, so you'd expect some…

  1. mht-rip-0.8.c: 2010, in C.
  2. mhtconv: libmht-0.1, 2009, in C.
  3. spackager-0.5.5 (GitHub): 2012-04, and in Python!

But, anyway, I ended up writing my own.


Created a repo and pushed it to GitHub:

  1. $ git init
  2. $ git add
  3. $ git commit -m "Created repo, committing code and initial doc."
  4. $ git remote add origin
  5. $ git push

(Actually, procedure was much uglier, all this non-fast-forward annoyance…)


Enhancements or possible bugs, that I've no time to perfect:

  1. Cleanest would've been to use stdin/out, but turned out inconvenient, annoying even, so added command line options.
  2. Python's stdlib module's performance (premature optimization?):
    email.message_from_bytes( # Parser is "conducive to incremental parsing of email messages, such as would be necessary when reading the text of an email message from a source that can block", so I guess it's more efficient to have it read stdin directly, rather than buffering.
  3. Encodings (ascii, UTF-8) and de/coding was painful, and probably still buggy.
  4. Base64 encoded binaries: my editor, Geany, suffocates, I think, when wrapping these long lines?
  5. Verify index.html is present.

Et cetera…

Firefox and Chromium

  1. UnMHT :: Add-ons for Firefox
  2. Chrome Gets MHTML Support - Save and View Files in .mht Format

The real world is a special case