BSON→JSON

bson2json: just a quick & dirty converter, BSON→JSON, and rudimentary schema analysis, for preparing to migrate a MongoDB dump to LevelDB.

BSON

  1. Dump files are just concatenated serialized objects. BSON format is amenable to efficient streaming, so we just iterate over the "records" by piping a standard file/stream into a BSON parser, and stringifying each to JSON:
    fs.createReadStream process.argv[2] # CLI: expect path.
    	.pipe new require 'bson-stream'
    	.on 'data',(o)->
    		console.log JSON.stringify o
    

Schema

  1. Collect (and count) variations in object shapes, to gain insight into variety of "documents" stored in that particular file.
    schema={}
    Mapping signatures to counts, where signatures are sorted lists of properties' keys, stringified to JSON:
    .on 'data',(o)->
    	s=JSON.stringify (Object.keys o).sort()
    	schema[s] or=0
    	schema[s]++
    .on 'end',(o)->
    	console.log 'Schema variations:',(Object.keys schema).length,schema
    
  2. Also show a union of all schema variations:
    .on 'data',(o)->
    	…
    	for k in ks
    		used[k]=true # (JS Set API sucks.)
    .on 'end',(o)->
    	console.log 'Keys used:',Object.keys used
    

EOF and callbacks?

  1. Node guarantees process won't exit before all queued callbacks execute.
  2. But…?

Multiple files

  1. Out of laziness — we said quick & dirty — just looped over all files with a shell script:
    #!/bin/sh
    for f in *.bson
    do ./bson2json.coffee "$f" > "${f%.bson}.json"
    done
    
  2. So files are processed sequentially, synchronously, in command line order (however Bash expands).
  3. In a more invested migration script we'll iterate over them recursively.

--
The real world is a special case