How to get distinct values from PyMongo with selected fields?

In MongoDB I have a dataset of University data. I need to get only distinct name and sourceURL, My collection contains:

dict_keys(['_id', 'name','univ_id','sourceUrl'])

Using PyMongo I am able to get distinct/unique URLs by using:

data = data_col.find({"univ_id": "glaacuk"}).distinct('sourceUrl')

and I am able to get name and sourceURL (but not distinct values)by using:

data = data_col.find({"univ_id": "glaacuk"}, {'sourceUrl': 1, 'name': 1, '_id': 0})

I have tried using data = data_col.find({"univ_id": "glaacuk"}, {'sourceUrl': 1, 'name': 1, '_id': 0}).distinct('sourceUrl') to get only distinct name and sourceUrl, but it didn’t worked.

How can I get only distinct name and sourceURL ?

Thanks in advance.

Answer

To get distinct values on a single column you can use .distict().

To get distinct values on multiple columns use .aggregate() with a $group stage.

Example:

from pymongo import MongoClient

collection = MongoClient()['mydatabase']['mycollection']

collection.insert_many([{'name': "a", 'age': 23},
                        {'name': "a", 'age': 23},
                        {'name': "a", 'age': 24},
                        {'name': "b", 'age': 23},
                        {'name': "b", 'age': 23},
                        {'name': "b", 'age': 23}])

for record in collection.aggregate([{'$group': {'_id': {'name': '$name', 'age': '$age'}}}]):
    print(record)

prints:

{'_id': {'name': 'a', 'age': 24}}
{'_id': {'name': 'a', 'age': 23}}
{'_id': {'name': 'b', 'age': 23}}

(Use a $project stage to pretty that up if needed)