for any info/changes follow me: @nickmilon

mongoUtils.importsExports module

Classes used to import/export data to mongoDB

mongoUtils.importsExports.import_workbook(workbook, db, fields=None, ws_options={'dt_python': True}, stats_every=1000)[source]

save all workbook’s sheets to a db consider using ImportXls class instead which is more flexible but imports only a single sheet

Parameters:

see ImportXls class

Example:
>>> from pymongo import MongoClient
>>> from mongoUtils import _PATH_TO_DATA
>>> db = MongoClient().test
>>> res = import_workbook(_PATH_TO_DATA + "example_workbook.xlsx", db)
>>> res
[{'rows': 368, 'db': 'test', 'collection': 'weather'}, {'rows': 1007, 'db': 'test', 'collection': 'locations'}]
class mongoUtils.importsExports.Import(collection, drop_collection=True, stats_every=10000)[source]

Bases: object

generic class for importing into a mongoDB collection, successors should use/extend this class

Parameters:
  • db: a pynongo database object that will be used for output

  • collection: a pymongo collection object that will be used for output

  • drop_collection: (defaults to True)
    • True drops output collection on init before writing to it
    • False appends to output collection
  • stats_every: int print import stats every stats_every rows or 0 to cancel stats (defaults to 10000)

format_stats = '|{db:16s}|{collection:16s}|{rows:15,d}|'
format_stats_header = '...................................................\n| db | collection | rows |\n...................................................'
__init__(collection, drop_collection=True, stats_every=10000)[source]
import_to_collection()[source]

successors should implement this

_import_to_collection_before()[source]

successors can call this or implement their’s

_import_to_collection_after()[source]

successors can call this or implement their’s

print_stats()[source]
class mongoUtils.importsExports.ImportXls(workbook, sheet, db, coll_name=None, row_start=None, row_end=None, fields=True, ws_options={'negatives_to_0': False, 'dt_python': True, 'integers_only': False}, stats_every=10000, drop_collection=True)[source]

Bases: mongoUtils.importsExports.Import

save an an xls sheet to a collection see

Parameters:
  • workbook: path to a workbook or an xlrd workbook object

  • sheet: name of a work sheet in workbook or an int (sheet number in workbook)

  • db: a pymongo database object

  • coll_name: str output collection name or None to create name from sheet name (defaults to None)

  • row_start: int or None starting raw or None to start from first row (defaults to None)

  • row_end:int or None ending raw or None to end at lastrow (defaults to None)

  • fields:
    • a list with field names

    • or True (to treat first row as field names)

    • or None (for auto creating field names i.e: [fld_1, fld_2, etc]

    • or a function that:
      • takes one argument (a list of row values)

      • returns a dict (if this dict contains a key ‘_id’ this value will be used for _id)

      • >>> lambda x: {'coordinates': [x[0] , x[1]]}
        
  • ws_options: (optional) a dictionary specifying how to treat cell values
    • dt_python : bool convert dates to python datetime
    • integers_only : round float values to int helpful coz all int values are represented as floats in sheets
    • negatives_to_0 : treat all negative numbers as 0’s
  • drop_collection: (defaults to True)
    • True drops output collection on init before writing to it
    • False appends to output collection
  • stats_every: int print import stats every stats_every rows or 0 to cancel stats (defaults to 10000)

  • drop_collection: if True drops collection on init otherwise appends to collection

Example:
>>> from pymongo import MongoClient
>>> from mongoUtils import _PATH_TO_DATA
>>> db = MongoClient().test
>>> res = ImportXls(_PATH_TO_DATA + "example_workbook.xlsx", 0, db)()
>>> res
{'rows': 367, 'db': u'test', 'collection': u'weather'}
__init__(workbook, sheet, db, coll_name=None, row_start=None, row_end=None, fields=True, ws_options={'negatives_to_0': False, 'dt_python': True, 'integers_only': False}, stats_every=10000, drop_collection=True)[source]
ws_options
ws_options_set(options_dict)[source]
fix_name(name, cnt=0)[source]
auto_field_names(fields)[source]
row_to_doc(valueslist, _id=None)[source]
ws_convert_cell(cl)[source]
Parameters:
  • cl an xlrd cell object
import_to_collection()[source]