.. _scp01: ======================== SCP01: Disallowed domain ======================== What it does ============ Finds URLs in :attr:`~scrapy.Spider.start_urls` whose netloc is not in :attr:`~scrapy.Spider.allowed_domains`. Why is this bad? ================ The default implementation of :meth:`~scrapy.Spider.start` sets :attr:`~scrapy.Request.dont_filter` to ``True``. As a result, URLs from :attr:`~scrapy.Spider.start_urls` are sent by default even if their domain is not in :attr:`~scrapy.Spider.allowed_domains`. However, any follow-up :class:`~scrapy.Request` yielded from a :attr:`~scrapy.Request.callback` that points to that domain will be filtered out, which is usually not what you want. Example ======= .. code-block:: python import scrapy class MySpider(scrapy.Spider): name = "myspider" allowed_domains = ["b.example"] start_urls = [ "https://a.example/", ] Use instead: .. code-block:: python import scrapy class MySpider(scrapy.Spider): name = "myspider" allowed_domains = ["a.example"] start_urls = [ "https://a.example/", ]