Screaming Frog and the Search Console URL Inspection API

Today is February 10th and I wanted to talk about Screaming Frog as well as the new Search Console URL Inspection API. 

Listen to the Opinionated SEO on Your Favorite App

Let’s jump right in, what is screaming frog and what can you do with it?

Let’s start at the beginning, it’s free but limited, and you don’t run it online. The paid version is 149 pounds / year, or about $200 US dollars. So you are talking about $16 a month if you wanted to break it down that way.Buy it, write it off, and call it a day, it will save you that money in the first hour you use it.

So what can it do?

Screaming frog is an SEO spider, think GoogleBot but you get to save the data it captures.

Here’s a list of the top features for it:

  • Find broken links, errors, and redirects
  • Page title and meta data analysis
  • Met robots and directives review
  • Audit hreflang
  • Discover exact duplicate pages
  • Generate xml sitemaps
  • Create site visualizations.
  • You can do up to 500 URLs in the free version, unlimited to your computer’s capacity on the paid version.

But the real power comes in when you use the paid version, and I’m going to just touch on a few of the features.

  • Javascript rendering: Actually being able to view the page as a fully rendered version. This includes a screenshot as well as being able to review source code for the rendered version.
    • We’re using this to better understand how the rendering affects crawlability, as well as what’s on the site and what isn’t depending on JS.
    • If you do any kind of client side rendering, using react, or just not seeing Google crawling your site like you think it should, this is a great way to better understand.
  • URL Inspection API Integration – I was going to save this for last, but it’s just too good to not put at the top of the list. There are a few integrations you get with screaming frog out of the box. 
    • 1 – Google Analytics (Get actual data during crawls) – User and session metrics, goal conversions, and ecommerce transactions and revenue data for landing pages, so you can view your top performing pages when performing a technical or content audit. Even things like bounce rate by page.
    • 2 – Google Search Console – this includes the inspection API integration. Get Clicks, Impressions, CTR, & Position. Now you also get inspection data including last crawl, coverage, mobile usability, indexing status, canonical info, and rich snippet info
    • 3 – Page speed insights – think core web vitals and the associated speed related data by page.
    • And non Google related companies like Majestic for backlinks, ahrefs for things like backlinks and URL rating, and Moz for data like page authority and backlink info from their open site explorer metrics.
  • Site Visualizations – See the structure of your site in a visualized way, this can help you better understand your structure and how Google likely interprets it. Is your site too flat, or are key pages too far from your homepage?
  • Custom Extraction – One of the most amazing components of Screaming frog is being able to pull content from your site. It doesn’t have to be visible, and it can even be code (html / css). An example I use is on our pages that are search result pages on the site. I use the custom extraction to pull the number of results and it gets put into a field. I cross reference that against other fields, especially indexing fields, and see how the number of results impact things like ranking, traffic, and index status.
  • Meta Tags – Titles, descriptions, etc – with character counts. It helps understand duplicates, too short, too long, etc. 
  • You can even generate XML sitemaps, this is helpful if your system won’t allow you to generate a sitemap or you want to create a static one.
  • Find out all pages that have broken links both internally and externally. Use this to make sure that you aren’t linking to non-existent pages. This is really helpful for external links that maybe don’t exist anymore and would be a bad user experience.
  • 301s – Find out what pages are redirecting and see if that’s an internal link that you can link directly to instead.

Overall screaming frog does some amazing things, and it’s great for up to about 50k pages. I find once you get past that, it isn’t as usable as there’s almost too much data. You can go way beyond that and pull data into excel, or even export just segments of data, but realize that you have a million row limit in excel, but pulling a million rows of data using screaming frog is typically beyond most desktop systems. That’s where you’ll want to look into hosted solutions that run for days on end and have more tolerances for issues that may arise.

Integration with the new Search Console URL Inspection API which was officially launched on January 31st, 2022 by Google – and it allows you to see indexation status of your URLs via an API. This was integrated in the latest release in the last few weeks and you’ll see a good share of tutorials out there. 

If you don’t know what an API is, it’s a way to send a simple request to a server and it responds with an answer. Think asking Google “What’s the weather in my city” and it responds with the current temperature. You’ll send a weather request to a weather API with a city name, and the API may return current weather high and low, tomorrow’s weather, and more. The API may give you a lot or a little data depending on how you ask, but it’s really just a way to access a services database without actual database access. 

For this API, I was working on a PHP version as that’s the language I have any real familiarity with, but did not get too far once Screaming Frog announced their integration.

You probably want to know – what am I doing with this new data, and what data do we get that we can use?

First, the data is used to understand indexation status in Google, errors that are coming up, and last crawl date.

And what am I doing with that data? Let’s start with indexation status – you have a few variations of the summary of indexation, things like index and submitted, indexed not submitted in sitemap, crawled not indexed, discovered but not crawled, soft 404, error, and probably a few more I’m forgetting. By pulling this data on pages, you can get a better understanding of how Google sees your site.

I’ve been doing a lot of focusing on discovered not crawled as these pages are just simply known to Google, but they haven’t even pulled the html yet. Getting these into the process is key, and then the next is getting the ones that have been crawled but not indexed. I’m working to identify any reasons why that might be.

For items like soft 404, or errors, I’m identifying what about those pages cause these errors, are they server related, page related, code, messaging, or something else. All of these give me clues to further dive down and allow for some A/B testing to further come up with large scale solutions.

You can pull a sitemap from your website, and run it in screaming frog to cross reference indexation status to get an idea of what’s going on with all the pages on your site. Search console has a 1000 row limit on exports, while this allows you to pull up to 2000 each day. So you can get twice the data than is allowed in the interface, plus you can have multiple users to pull more data. In all, I’ve found that the data seems to be a bit more granular than the coverage reports in the graphic interface and so I’m preferring to use the API with excel to get coverage info.

In summary, this API allows you to better understand how Google is taking the pages of your site and indexing them, how long ago it visited, and any issues encountered.

Good luck, and we’ll see you real soon!