for any info/changes follow me: @nickmilon

mongoUtils.helpers module

some helper functions and classes

exception mongoUtils.helpers.MongoUtilsError[source]

Bases: exceptions.Exception

Base class for all MongoUtils exceptions.

exception mongoUtils.helpers.CollectionExists(collection_name='')[source]

Bases: mongoUtils.helpers.MongoUtilsError

mongoUtils.helpers.col_stats(collection_obj, indexDetails=True, scale=1024)[source]

collection statistics scale default in MegaBytes, give it 2 ** 30 for GigaBytes

mongoUtils.helpers.coll_validate(coll_obj, scandata=False, full=False)[source]

see validate

mongoUtils.helpers.coll_range(coll_obj, field_name='_id')[source]

returns (minimum, maximum) value of a field

  • coll_obj: a pymongo collection object
  • field_name: (str) name of field (defaults to _id)
>>> coll_range(db.muTest_tweets_users, 'id_str')
(u'1004509039', u'999314042')
mongoUtils.helpers.coll_chunks(collection, field_name='_id', chunk_size=100000)[source]

Provides an iterator with range query arguments for scanning a collection in batches equals to chunk_size for optimization reasons first chunk size is chunk_size +1 similar to undocumented mongoDB splitVector command try it in mongo console: db.runCommand({splitVector: "mongoUtilsTests.muTest_tweets", keyPattern: {_id: 1}, maxChunkSizeBytes: 1000000}) seems our implementation is faster than splitVector when tried on a collection of ~300 million documents

  • collection: (obj) a pymongo collection instance

  • field_name: (str) the collection field to use, defaults to _id, all documents in

    this field must be indexed otherwise operation will be slow, also collection must have a value for this field

  • chunk_size: (int or float) (defaults to 100000)
    • if int requested number of documents in each chunk
    • if float (< 1.0) percent of total documents in collection i.e if 0.2 means 20%
  • an iterator with a tuple (chunk number, query specification dictionary for each chunk)
>>> coll_chunks(db.muTest_tweets, 'id_str', 400)
>>> for i in rt: print i
(0, {'id_str': {'$lte': u'523829721985851392', '$gte': u'523829696790663168'}})
(1, {'id_str': {'$lte': u'523829751329611777', '$gt': u'523829721985851392'}})
(2, {'id_str': {'$lte': u'523829763937681408', '$gt': u'523829751329611777'}})
mongoUtils.helpers.coll_update_id(coll_obj, doc, new_id)[source]

updates a document’s id by inserting a new doc then removing old one


Very dangerous if you don’t know what you are doing, use it at your own risk.
Never use it in production
Also be careful on swallow copies
  • coll_obj: a pymongo collection
  • doc: document_to rewrite with a new_id
  • new_id: value of new id

a tuple - tuple[0]: True if operation was successful otherwise False - tuple[1]: Exception if unsuccessful or delete results if success - tuple[2]: Insert results if successful

mongoUtils.helpers.coll_copy(collObjFrom, collObjTarget, filter_dict={}, create_indexes=False, dropTarget=False, write_options={}, verbose=10)[source]

copies a collection using unordered bulk inserts similar to copyTo that is now deprecated

  • collObjFrom:
  • collObjTarget: destination collection
  • filter_dict: a pymongo query dictionary to specify which documents to copy (defaults to {})
  • create_indexes: creates same indexes on destination collection if True
  • dropTarget: drop target collection before copy if True (other wise appends to it)
  • write_options: operation options (use {‘w’: 0} for none critical copies
  • verbose: if > 0 prints progress statistics at verbose percent intervals
mongoUtils.helpers.db_capped_create(db, coll_name, sizeBytes=10000000, maxDocs=None, autoIndexId=True)[source]

create a capped collection

mongoUtils.helpers.db_convert_to_capped(db, coll_name, sizeBytes=1073741824)[source]

converts a collection to capped

mongoUtils.helpers.db_capped_set_or_get(db, coll_name, sizeBytes=1073741824, maxDocs=None, autoIndexId=True)[source]

sets or converts a collection to capped see more here autoIndexId must be True for replication so must be True except on a stand alone mongodb or when collection belongs to local db

mongoUtils.helpers.client_schema(client, details=1, verbose=True)[source]

returns and optionally prints a mongo schema containing databases and collections in use

  • client: a pymongo client instance
  • details: (int) level of details to print/return
  • verbose: (bool) if True prints results
class mongoUtils.helpers.muBulkOps(collection, ordered=True, ae_n=0, ae_s=0, dwc=None)[source]

Bases: object

a wrapper around BulkOperationBuilder provides for some automation

New in version 1.0.6.

  • ae_n: (int) auto execute every n operations (defaults to 0 to refrain from auto execution)
  • ae_s: (int) auto execute seconds since start or last execute before a new execute is automatically initiated useful when we want to ensure that collection data are relative fresh set it to 0 (default to disable auto execute b
  • dwc: (dict) or None default write concern to use in case of autoexecute_every DO NOT pass a WriteConcern object just a plain dict i.e {‘w’:1}
frmt_stats = '{:s}db:{:s} collection:{:s} cnt_operations_executed:{:16,d} cnt_operations_pending:{:6,d}'
__init__(collection, ordered=True, ae_n=0, ae_s=0, dwc=None)[source]

Initialize a new BulkOperationBuilder instance.

execute(write_concern=None, recreate=True)[source]

executes if any pending operations still exist call it on error or something

class mongoUtils.helpers.muCollection(database, name, create=False, codec_options=None, read_preference=None, write_concern=None, **kwargs)[source]

Bases: pymongo.collection.Collection

just a plain pymongo collection with some extra features it is safe to cast an existing pymongo collection to this by:

>>> a_pymongo_collection_instance.__class__ = muCollection
stats(indexDetails=True, scale=1024)[source]

collection statistics (see col_stats())


see coll_index_names()

validate(scandata=False, full=False)[source]

see coll_validate()

class mongoUtils.helpers.muDatabase(client, name, codec_options=None, read_preference=None, write_concern=None)[source]

Bases: pymongo.database.Database

just a plain pymongo Database with some extra features it is safe to cast an existing pymongo database to this by:

>>> a_pymongo_database_instance.__class__ = muDatabase
Returns:database statistics
collstats(details=2, verbose=True)[source]
Returns:database collections statistics
Returns:server status
capped_create(colName, sizeBytes=10000000, maxDocs=None, autoIndexId=True)[source]
convert_to_capped(colName, sizeBytes=1073741824)[source]
capped_set_or_get(colName, sizeBytes=1073741824, maxDocs=None, autoIndexId=True)[source]

see db_capped_set_or_get().


returns names of all collections starting with specified strings

Parameter:list startingwith: starting with prefixes i.e. [“tmp_”, “del_”]

drops all collections names starting with specified strings

Parameter:list startingwith: starting with prefixes i.e. [“tmp_”, “del_”]
js_fun_add(fun_name, fun_str)[source]

adds a js function to database to use the function from mongo shell you have to execute db.loadServerScripts(); first

  • fun_name (string): a name for this function
  • fun_str (string): js function string
js_fun_add_default(file_name, fun_name)[source]
Returns:all user js functions installed on server
mongoUtils.helpers.pp_doc(doc, indent=4, sort_keys=False, verbose=True)[source]

pretty print a boson document

class mongoUtils.helpers.AuxTools(collection=None, db=None, client=None)[source]

Bases: object

a collection to support generation of sequence numbers using counter’s collection technique


counter’s collection guarantees a unique incremental id value even in a multiprocess/mutithreading environment but if this id is used for insertions. Insertion order is not 100% guaranteed to correspond to this id. If insertion order is critical use the Optimistic Loop technique

  • collection: (obj optional) a pymongo collection object

  • db: (obj optional) a pymongo database object

  • client: (obj optional) a pymongo MongoClient instance
    • all parameters are optional but exactly one must be provided
    • if collection is None a collection db[AuxCol] will be used
    • if db is None a collection on db [‘AuxTools’][‘AuxCol’] will be used
__init__(collection=None, db=None, client=None)[source]

resets sequence

sequence_set(seq_name, val=1)[source]

sets sequence’s current value to val if doesn’t exist it is created


returns sequence’s current value for particular name

sequence_next(seq_name, inc=1)[source]

increments sequence’s current value by incr, if doesn’t exist sets initial value to incr

class mongoUtils.helpers.SONDot(data=None, **kwargs)[source]

Bases: bson.son.SON

A SON class that can handle dot notation to access its members (useful when parsing JSON content)

>>> son = SONDot([('foo', 'bar'), ('son2', SON([('son2foo', 'son2Bar')]))])
>>> son.son2.son2foo


don’t use dot notation for write operations i.e = ‘bar’ (it will fail silently !)

mongoUtils.helpers.parse_js(file_path, function_name, replace_vars=None)[source]
helper function to get a js function string from a file containing js functions
useful if we want to call js functions from python as in mongoDB map reduce.
Function must be named starting in first column and end with ‘}’ in first column (see relevant functions in js directory)
  • file_path: (str) full path_name
  • function name: (str) name of function
  • replace_vars: (optional) a tuple to replace %s variables in functions

a js function as string

mongoUtils.helpers.parse_js_default(file_name, function_name, replace_vars=None)[source]

fetch a js function on default directory from file_name (see parse_js())

mongoUtils.helpers.geo_near_point_q(geo_field, Long_Lat, query={}, minDistance=None, maxDistance=None)[source]

geo near point query constructor

  • geo_field: (str) name of geo indexed field (i.e. location)
  • Long_Lat: (tuple or list) [longitude, latitude]
  • query: (dict) an other query specifications to be combined with geo query (defaults to {})
  • minDistance: minimum distance in meters (defaults to None)
  • maxDistance: up to distance in meters (defaults to None)

query dictionary updated with geo specs