Wait for Scapy callback function

I am new to Scrapy and Python on general.

Here is the code:

import scrapy
import json

class MOOCSpider(scrapy.Spider):
    name = 'mooc'
    start_urls = ['https://www.plurk.com/search?q=italy']
    custom_settings = {
        'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter',
    }
    global_id = 1458122036

    def parse(self, response):
        

        url = 'https://www.plurk.com/Search/search2'

        headers = {
         ...omitted...
          }


        for i in range(1,10):
            formdata = {
            "after_id": str(self.global_id)
            }
            yield scrapy.FormRequest(url, callback=self.parse_api, formdata=formdata, headers=headers)


    def parse_api(self, response):
        raw = response.body
        data = json.loads(raw)
        posts = data["plurks"]
        users = data["users"]


        l = len(posts)
        i = 0
        for post in posts:
            i = i + 1
            if (i == l):
                self.global_id = post["plurk_id"]
            
            ...omitted code...
            
            yield {
                'Author': user_name,
                'Body': post['content'],
                'app': 'plurk'
            }



The problem that I have is that Scrapy is making first all the requests in the for loop and then it is executing the code in parse_api. What I would like to do is let scrapy do one iteration of the for loop, call the callback function, wait for it to return and then do another iteration.

This because the id that I need for the next request will be set in the global_id variable by the callback function.

Answer

Answering my own question:

Now the parse method does just one request and calls once the parse_api method. Parse_api processes the response and sets the global_id variable. Once it’s done processing its own response it makes another request passing itself as the callback function. By doing this you are guaranteed that the global_id variable will be properly set, since the new request will be made only once parse_api has finished running.

request.cb_kwargs["loop_l"] is used to pass an additional argument to the callback function. This time it’s a counter that controls the number of requests we want to make. When the counter is equal to 100 we stop the crawling

import scrapy
import json

plurk_id = []
class MOOCSpider(scrapy.Spider):
    name = 'mooc'
    start_urls = ['https://www.plurk.com/search?q=']
    custom_settings = {
        'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter',
    }
    global_id = 1455890167

    url = 'https://www.plurk.com/Search/search2'

    headers = {
       ...OMITTED...
          }


    def parse(self, response):
        
        formdata = {
            "after_id": str(self.global_id)
        }
        request = scrapy.FormRequest(self.url, callback=self.parse_api, formdata=formdata, headers=self.headers)
        request.cb_kwargs["loop_l"] = str(0)
        yield request

    def parse_api(self, response, loop_l):
        int_loop_l = int(loop_l)
        int_loop_l = int_loop_l + 1
     
        if (int_loop_l == 200):
            return
        raw = response.body
        data = json.loads(raw)

       ...omitted code...
       ... GET AND SET THE NEW global_id FROM THE RESPONSE ...
        
        # make another request with the new id
        formdata = {
            "after_id": str(self.global_id)
            }
        request = scrapy.FormRequest(self.url, callback=self.parse_api, formdata=formdata, headers=self.headers)
        request.cb_kwargs["loop_l"] = str(int_loop_l)
        yield request