Bookmarklet

Developing bookmarklet(s) for the Open Knesset (Hebrew) project. Lots of ideas, plans… A simple first version I’m calling OKify (Hebrew) will linkify all familiar strings on a page. Later, embed OKpop, which would open popups instead of just linking, etc.

Trials and tribulations. Here’s a record of the development process, lots of issues solved (or not), ideas that came up… Written during Hasadna‘s weekly hacking sessions, a hackathon weekend or two, pair programming with Oz, and getting help from many. ;o)

Bookmarklet

What to put in a bookmarklet?

  1. Bookmarklets are limited in size, but there’s no standard — depends on browser. So most bookmarklets just bootstrap a script from somewhere.
  2. So? Latency. Code in bookmarklet doesn’t need to be downloaded. Kind of a cache. Except, our code is still under development, evolves quickly, and much easier to update hosted code than have users replace their bookmarklet.
  3. How to prevent HTTP caching to force reloading the code? Some always bypass caches by adding a random string to the URL. We’re serving static files — JavaScript, CSS, maybe images — and I guess Apache will do a good job of handling HTTP caching by checking the file system timestamps.

Bootstrapping

  1. How to load our script? Essentially, inserting a script element with our code’s URL:
  2. s = document.createElement("script");
    s.src = "http://example.com/foo.min.js";
    document.documentElement.childNodes[0].appendChild(s);
  3. Seems document.head works, usually, too. Why bother inserting into head? Remove the script node after loading? Maybe I should post about all this separately.
  4. There’s a loader in CoffeeScript (GitHub), but it is so trivial that the extra compilation step seems more trouble than…
  5. CoffeeMarklet makes more sense, if/when we’d want more than just a bootstrap. But, “perpetual beta” — our code will probably keep changing, so better to reload it every time.
  6. SOP: is it going to be a problem?

CoffeeScript

  1. Translate CS to JS on the server side, or in the browser, or does the browser just speak CS?

Development

  1. Loading from localhost while developing? Works: telling the bookmarklet to load from localhost, with s.src = "http://localhost/okify.js", since I’ve a Web server running locally.
    $ coffee --compile --output /var/www/ okify.coffee

    But, this breaks (ie, browser complains) when running the bookmarklet within an HTTPS site!

  2. Loading from a file, s.src = "file:///var/www/okify.js", doesn’t work: “Not allowed to load local resource: file:///var/www/okify.js”.
  3. Refreshing/reloading: injecting a script into the DOM runs it again, overwriting the “global” variable @okify (window.okify), but pollutes the DOM; I’m running the bookmarklet repeatedly during development… So?
  4. Logging to the console: eg:
    console.log jQuery

    prints:

    function (a,b){return new c.fn.init(a,b)} — in localhost/okify.js:15

jQuery et al

  1. Ben Alman’s jQuery Bookmarklet Generator wraps any code with a jQuery loader and generates a bookmarklet from all this! Online, even.
  2. Detect if already loaded? If we weren’t loading it from a bookmarklet, then, eg:
    <script>window.jQuery || document.write('<script src="//ajax.googleapis.com/ajax/libs/jquery/1.7/jquery.min.js">\x3C/script>')</script>
  3. Minimum version?
  4. CDN? Google’s, for very good reasons!
  5. Avoid conflicts? (Note that due to using JSONP our code spans closures.)

Dependencies

  1. Obviously, we need jQuery, and possibly other scripts. How to chain-load them? How to manage dependencies? Popular solutions: RequireJSLABjs (GitHub)…
  2. If it’s just for jQuery, of which a suitable version was maybe loaded already, then it’s inefficient to load a loader, but if we ever wanted additional libraries, a complex dependencies graph, then we’d better.
  3. Chaining: wait for either script.onload or script.onreadystatechange to execute (not load!) our code.

Build

  1. How to compile our CoffeeScript code into a bookmarklet, ie, a URL using “javascript:” scheme? Bookmarklet Crunchinator does it online. I’d want something for the command line, too, so can script the flow?
  2. Grunt…

Familiar names

  1. Tastypie provides an API for AJAX, renders lots of data into beautiful JSON. But, too much data; eg, a query for all MK’s data returns 140KiB, when all we need are their names and URLs, probably less than 10KiB. Despite the hackathon excitement and commotion, Meir, in a typical feat, hacked Tastypie or something to allow filtering the JSON the API sends, culling our response from 140KiB to 23KiB.
  2. How to cache the returned JSON? Hardcoded in the script?

JSONP

  1. We can’t use XHR because of SOP. So it’s either CORS or JSONP. Or XHR2?
  2. JSONP gave me lots of grief… but it was past midnight, after 14 hours of another Open Knesset hackathon at The Hub (Tel Aviv). Couldn’t think straight. The minute I walked out, esprit d’escalier, I knew what was wrong. An “@” fixed it. ;o)
    1. How does JSONP work? Browser creates a separate script (or fakes it) so it can have a different origin assigned?
    2.  Both jsFiddle and JS generated from CoffeeScript wrap code in closures, so when the browser invokes the returned JS — JSON wrapped in a function call — the function’s definition can’t be found, and the call fails completely silently! (jQuery says “when data is retrieved from remote servers (which is only possible using the script or jsonp data types), the error callbacks and global events will never be fired”.)
  3. jQuery adds a random URL query parameter to prevent HTTP caching. We don’t want that!
  4. [Embed a jsFiddle here to play around with the AJAX API?]

Walking the DOM and linkifying

  1. Can’t just search & replace over entire document or DOM, because we only want to match (visible) text, not stuff in HTML attributes, scripts, comments, etc.
    1. Eg, with jQuery, shouldn’t replace with html() so as to avoid touching HTML tags themselves.
    2. But text(), if node contains children, wipes out the HTML markup completely. Can only use it on text nodes.
    3. Also, html() destroys and re-creates DOM nodes so that “event handlers and other bound data would be lost”.
  2. Was harder than usual, but eventually I found existing solutions.
    1. Gary Haran‘s uses innerHTML to render a document fragment and then replaces the text node with fragment’s children. And avoids replacements inside blacklisted elements. Pure DOM API, no jQuery.
    2. Ben Alman‘s…
    3. Alexander Dickson‘s (unescaped RegExps?)…
    4. James Padolsey‘s…
  3. Seems jQuery doesn’t traverse text nodes as conveniently as DOM elements? Do we use the DOM API, childNodes[] (a property, not method), then? Or jQuery’s contents()?
    function walk(e) { // e is a jQuery wrapped element.
    	e.contents().each(function () {
    		if (this.nodeType == 3) // TEXT_NODE
    			this.nodeValue = linkify(this.nodeValue);
    		else if (this.nodeType == 1) // ELEMENT_NODE
    			walk($(this)); // Recursively. Wrapped so can use contents().
    	});
    }
  4. This didn’t work. Neither does if (this.nodeType == 3) $(this).html( linkify(this.nodeValue));
  5. What to do about other nodeTypes?
  6. Are there other places (elements) we don’t want to go (into)? Scripts?

RegExp

  1. Data must always be sanitized! And special (regular expressions) characters escaped. See discussion of techniques on Simon Willison’s blog.

Finally, it works!

Here’s an early version. Ouch, except no HTTPS. Yet.

GitHub

Django



1 Response to Bookmarklet

  1. Pingback: Bookmarklet | OKpop