How to handle adding up to 100K entries to Firestore database in a Node.js application

Here is my function where I am trying to save data extracted from an Excel file. I am using XLSX npm package to extract the data from Excel file.

function myFunction() {
const excelFilePath = "/ExcelFile2.xlsx"
if (fs.existsSync(path.join('uploads', excelFilePath))) {
    const workbook = XLSX.readFile(`./uploads${excelFilePath}`)
    const [firstSheetName] = workbook.SheetNames;
    const worksheet = workbook.Sheets[firstSheetName];

    const rows = XLSX.utils.sheet_to_json(worksheet, {
        raw: false, // Use raw values (true) or formatted strings (false)
        // header: 1, // Generate an array of arrays ("2D Array")
    });

    // res.send({rows})

    const serviceAccount = require('./*******-d75****7a06.json');

    admin.initializeApp({
        credential: admin.credential.cert(serviceAccount)
    });

    const db = admin.firestore()
    rows.forEach((value) => {
      db.collection('users').doc().onSnapshot((snapShot) => {
        docRef.set(value).then((respo) => {
          console.log("Written")
        })
        .catch((reason) => {
          console.log(reason.note)
        })
      })

      
    })
    console.log(rows.length)

}

Here is an error that I am getting and this process uses up all of my system memory:

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

Answer

It’s pretty normal in Firebase/Firestore land to have errors like this when trying to add too much data at once.

Firebase Functions tend to time out, and even if you configure them to be able to run all the way to 9 minutes, they’ll still time out eventually and you end up with partial data and/or errors.

Here’s how I do things like this:

  1. Write a function that writes 500 entries at a time (using batch write)

  2. Use an entry identifier (let’s call it userId), so the function knows which was the last user recorded to the database. Let’s call it lastUserRecorded.

  3. After each iteration (batch write of 500 entries), have your function record the value of lastUserRecorded inside a temporary document in the database.

  4. When the function runs again, it should first read the value of lastUserRecorded in the db, then write a new batch of 500 users, starting AFTER that value. (it would select a new set of 500 users from your excel file, but start after the value of lastUserRecorded.

  5. To avoid running into function timeout issues, I would schedule the function to run every minute (Cloud Scheduler trigger). This way, it’s very highly likely that the function will be able to handle the batch of 500 writes, without timing out and recording partial data.

If you do it this way, 100k entries will take around 3 hours and 34 minutes to finish.

Leave a Reply

Your email address will not be published. Required fields are marked *