How to crawl large sites with Screaming Frog SEO Spider

The more data you crawl, the more memory you need to store and process it. Screaming Frog SEO Spider uses a hybrid mechanism with customizable parameters that requires some customization to crawl efficiently on a large scale.

By default, SEO Spider uses RAM for processing and storing data instead of the hard disk on your PC. This provides high speed and flexibility, but can lead to some disadvantages, especially when scanning on a large scale.

To avoid these problems during crawling, you need to configure the settings of the Screaming Frog SEO Spider program.

First, you need to change the data storage mode:

  1. Click the “Configuration” option → Go to “System”
  2. Next, select “Storage Mode” → and set “Database Storage”.
Setting up Storage Mod in Screaming Frog
Install Database Storage at Screaming Frog

Now data will be written to the hard disk instead of RAM. It is advisable to use an SSD for the hard disk. One of the main advantages of this setting is that all results are simply written to the database, and if a crash occurs, the data will not be lost (note: there were no crashes after switching to this mode).

Second, you need to set the amount of RAM:

  1. Click the “Configuration” option → Go to “System”.
  2. Next, select “Memory Allocation”.
Setting up RAM in Screaming Frog

The developers recommend using at least 4 GB for scanning 2 million pages and at least 8 GB for more than 5 million pages. However, for most users, 16-32 GB of storage is sufficient.

To scan a large number of website pages, you may need to disable some features:

  1. Click the “Configuration” option
  2. Go to “Spider”.
  3. Where we leave only internal links if we do not plan to search for broken external links.
Set up a large number of pages to scan
  1. Next, you need to disable the collection of images so that the program does not collect them on the site. To do this, click the Configuration → Go to Exclude option and insert the following exceptions:
http.*\.jpg
http.*\.JPG
http.*\.jpeg
http.*\.JPEG
http.*\.png
http.*\.PNG
http.*\.gif
http.*\.pdf
http.*\.PDF
Disable image collection in Screaming Froog
  1. Next, you need to configure it: Click the “Configuration” option → and go to “Speed”. It is recommended to set from 7 to 10 threads (10-30 URLs per second) to ensure comfortable work when crawling large sites.
  2. Then click on the “Configuration” option and replace User-Agent with Screaming Frog SEO Spider with Googlebot to prevent competitors from blocking crawling/parsing.
Replacing User-Agent from Screaming Frog SEO Spider with Googlebot
Installing Googlebot User-Agent
  1. The last step is to save all our settings and changes and make them the default so that you don’t have to configure them every time you scan a new project. Click the “Configuration” option, then go to → “Profiles” and click → “Save Current Configuration as Default”.
Save default settings in Screaming Froog

After these settings, add the URL of the site you need and you can crawl large websites without any problems.

5/5 - (4 votes)