MongoDB Data Processing (Python)

Processing data from MongoDB in Python

Pavan Kulkarni

2 minute read

This post will give an insight of data processing from MonogDB in Python.

This post is to give an insight of data processing from MongoDB in Python. I will be developing this project in IntelliJ. The full code is available on my GitHub Repo

I will be using the python-mongodb driver - PyMongo

Pre-requisites

  1. Python 3.x
  2. IntelliJ
  3. Anaconda Environment

Setup IntelliJ for MongoDB

  1. Install pymongo from terminal. In my case, the below command will install pymongo in the Anaconda environment. Check this post to install Python with Anaconda.

    Pavans-MacBook-Pro:~ pavanpkulkarni$  python -m pip install pymongo
    
  2. Open IntelliJ and import the project from my GitHub Repo

  3. Make sure you choose conda as your venv for IntelliJ by going to File --> Project Structure --> Platform Settings --> SDKs --> + --> Python SDK --> <Choose conda >

  4. Now we are ready to get coding.

Dive into the code

Connect to MongoDB

import pymongo
from pymongo import MongoClient

client = MongoClient('mongodb://127.0.0.1:27017')
db = client.super_hero_db
collection = db.students

Retrieve one sample document from MongoDB

students.find_one()

Output:
	{'_id': ObjectId('5afcc577aebca2bc98a7135e'),
	 'courses_registered': [{'cid': 'CS003', 'sem': 'Spring_2011'},
	                        {'cid': 'CS006', 'sem': 'Summer_2011'},
	                        {'cid': 'CS009', 'sem': 'Fall_2011'}],
	 'id': 12,
	 'name': 'Black Panther',
	 'year_graduated': '2011'}

Count number of documents in the given collection

print("Number of documents  : ", students.count())
Number of documents  :  13

We can count documents based on filters as well

students.find({'id':5}).count()
1

Get all documents from collection

import pprint

# Get all docs from MongoDB
for student in students.find():
    pprint.pprint(student)

Get documents based on a filter

for student in students.find({'id':2}):
    pprint.pprint(student)

Let’s go ahead and insert new document to Mongo.

newStudent = {
    "id": 13,
    "name":"Thanos",
    "courses_registered": [
        {   "CID": "CS003",
            "sem": "Spring_2011"},
        {	"CID": "CS002",
             "sem": "Summer_2011"},
        {	"CID": "CS001",
             "sem": "Fall_2011"}],
    "year_graduated": "2011"
    }

students.insert_one(newStudent)

for student in students.find({'id':13}):
    pprint.pprint(student)


Output:

{'_id': ObjectId('5b030807f2729ee0e207059a'),
 'courses_registered': [{'CID': 'CS003', 'sem': 'Spring_2011'},
                        {'CID': 'CS002', 'sem': 'Summer_2011'},
                        {'CID': 'CS001', 'sem': 'Fall_2011'}],
 'id': 13,
 'name': 'Thanos',
 'year_graduated': '2011'}

As you see above, _id is generated, for the newly inserted document.

Finally, we see how to delete a document

students.delete_one({'id': 13})
print("After deleting new document")
for student in students.find({'id':13}):
    pprint.pprint(student)

Output:


After running delete_one(), one record is deleted from MongoDB.

The other pymongo API can be found here

References

  1. https://api.mongodb.com/python/current/
  2. https://www.jetbrains.com/idea/
  3. https://docs.mongodb.com/manual/