Firestore delete non-existent ancestor documents and sub-collection

I want to clean my collection data. I have the collection name “teams”. It has data and sub-collection name “players”.

I have deleted “teams” document by simple delete query of Firestore but as we know we can’t delete sub-collection (players) by deleting main/ancestor docID. We must have to fetch all the documents from “players” collection and then delete them first. After we should delete the ancestor (teams’ doc) document so it will be clear all for the collection.

It’s not possible to fetch those orphaned documents from “teams” collection. so what is the way to clean these documents from the collection?

I want to delete all the document (italic)

~ PS: I have created a firebase cloud function to delete sub-collections documents while deleting the ancestor doc.

exports.deleteOrphanedTeamsDoc = functions.firestore
  .document('teams/{teamID}')
  .onDelete(async (snap, context) => {
    var teamID = context.params.teamID;
    console.log("Deleted teamID --->>> " + teamID);

    const store = admin.firestore();
    var teamsPlayer = await store.collection('teams').doc(teamID).collection('players').get()

    teamsPlayer.docs.forEach(async(val) => {
      await store.collection('teams').doc(teamID).collection('players').doc(val.id).delete();
    });
  });

So with the help of the above code, I can delete fresh teams docID with sub-collections too.

But what about all the orphaned docs that available in my “teams” collection.

Update 1:

I tried the code of Renaud Tarnec, Sorry but I am new to function so not many ideas for it. I clicked on the run button but getting some issues

enter image description here

6:46:13.625 pm
scheduledFunction
Function execution took 12608 ms, finished with status: 'error'
6:46:13.622 pm
scheduledFunction
at processTicksAndRejections (internal/process/task_queues.js:97:5)
6:46:13.622 pm
scheduledFunction
at runMicrotasks (<anonymous>)
6:46:13.622 pm
scheduledFunction
at /workspace/index.js:161:53
6:46:13.622 pm
scheduledFunction
ReferenceError: promises is not defined 
6:46:01.018 pm
scheduledFunction
Function execution started

I think issues is here ReferenceError: promises is not defined at

const parentsSnapshotsArray = await Promise.all(promises);

Answer

But what about all the orphaned docs that available in my “teams” collection.

As you mentioned, your Cloud Function will not be triggered for the teams documents that were already deleted.

What you could do to delete the orphan player docs is to run a scheduled Cloud Function every X minutes/hours.

The following Cloud Function uses a CollectionGroup query to get all the player docs and delete them. Note that you need to have a Firestore index for the query. Note also how we use Promise.all() in order to return a unique Promise when all the asynchronous work is done; this is key to correctly manage the lifecycle of your Cloud Function.

exports.scheduledFunction = functions.pubsub.schedule('every 5 minutes').onRun((context) => {

    const playersRef = admin.firestore().collectionGroup('players');
    const playersSnap = await playersRef.get();

    const promises = [];
    playersSnap.forEach((doc) => {
        promises.push(doc.ref.delete());
    });
    return Promise.all(promises)

});

Now, we need to add an extra business logic. The player docs shall be deleted only if the parent team doc does not exist.

The following code should do the trick (untested):

exports.scheduledFunction = functions.pubsub.schedule('every 5 minutes').onRun(async (context) => {

    const playersRef = admin.firestore().collectionGroup('players');
    const playersSnap = await playersRef.get();

    const docParentIdsToDelete = [];

    const docParentIdsTreated = [];
    const promisesParentDocs = [];
    playersSnap.forEach((doc) => {
        const parentTeamRef = doc.ref.parent.parent;
        const parentTeamId = parentTeamRef.id;
        if (docParentIdsTreated.indexOf(parentTeamId) < 0) {
            // We need to check if the parent exists
            promisesParentDocs.push(parentTeamRef.get());
            docParentIdsTreated.push(parentTeamId);
        }
    });

    const parentsSnapshotsArray = await Promise.all(promisesParentDocs);

    parentsSnapshotsArray.forEach(snap => {
        if (!snap.exists) {
            // The parent team doc DOES NOT exist. It is shown in italic in the Firebase console.
            // => We need to delete the players child docs
            docParentIdsToDelete.push(snap.id);
        }
    });
    
    const promisesDeletion = [];
    playersSnap.forEach((doc) => {
        const parentTeamId = doc.ref.parent.parent.id;
        if (docParentIdsToDelete.indexOf(parentTeamId) > -1) {
            // We need to delete the player doc
            promisesDeletion.push(doc.ref.delete());
        }
    });
    
    return Promise.all(promisesDeletion);
    
});

Basically, we first get all the player docs. Then we loop to check if the parent team doc exists or not (using an array to minimize the number of queries). If it does not exist we push its ID to an array => the player child docs need to be deleted. Then we loop again on the player docs and we delete the desired ones (again by pushing the deletion promises to an Array which is passed to Promise.all()). There might be some room for optimizing the code and reduce the number of loops but the logic is there (If I didn’t do any mistake :-)).