Wed, 23 Jun 2021

Python Script SEO Content Analysis of your Competitor

12 May 2021, 09:24 GMT+10

Analyzing the content of your competitors will offer you valuable insights concerning your operations and goals. This basic Python script can provide you with data on n-Grams in seconds.

This Python Script may be an elementary version of a content analysis of your competitor. The most plan is to induce a fast summary of what the writing focus appearance like. A lean approach is to fetch all computer addresses within the sitemap, take apart the URL slugs and run an n-Gram analysis on it. If you would like to understand a lot about n-Gram analysis, please even have a glance at our Free N-Gram Tool. you'll apply it not just for computer address however conjointly keywords, titles, and so on

As a result, you'll get a listing of used n-Grams within the URL slugs along with the number of pages that used this n-Gram. This analysis will solely take a couple of seconds, even on massive sitemaps, and can run with lower than fifty lines of code.

Additional approaches

If you would like to induce deeper insights, I will suggest traveling on with these approaches:

  • Fetch the content of every universal resource locator within the sitemap
  • Create n-Grams found in headlines
  • Create n-Grams found in the content
  • Extract keywords with Textrank or Rake
  • Extract familiar entities for your SEO business

But let's begin easy and take a primary consider the hollow with this script. Supported your feedback, I could add a lot of refined approaches. Before you run the script, you simply have to be compelled to enter the sitemap URL you want to analyze. Once running the script, you'll notice your leads to sitemap_ngrams.csv. Open it in Excel or Google Sheets and make merry with analyzing the data.

Here is the Python code:

# Sitemap Content Analyzer

# Author: Stefan Neefischer

import advertools as adv

import pandas as pd

def sitemap_ngram_analyzer(site):

sitemap = adv.sitemap_to_df(site)

sitemap = sitemap.dropna(subset=["loc"]).reset_index(drop=True)

# Some sitemaps keeps urls with "/" on the end, some is with no "/"

# If there is "/" on the end, we take the second last column as slugs

# Else, the last column is the slug column

slugs = sitemap['loc'].dropna()[sitemap['loc'].dropna().str.endswith('/')].str.split('/').str[-2].str.replace('-', ' ')

slugs2 = sitemap['loc'].dropna()[~sitemap['loc'].dropna().str.endswith('/')].str.split('/').str[-1].str.replace('-', ' ')

# Merge two series

slugs = list(slugs) + list(slugs2)

# adv.word_frequency automatically removes the stop words

word_counts_onegram = adv.word_frequency(slugs)

word_counts_twogram = adv.word_frequency(slugs, phrase_len=2)

output_csv = pd.concat([word_counts_onegram, word_counts_twogram], ignore_index=True)\

.rename({'abs_freq':'Count','word':'Ngram'}, axis=1)\

.sort_values('Count', ascending=False)

#Save input csv with scores

output_csv.to_csv('sitemap_ngrams.csv', index=False)

print("csv file saved")

# Provide the Sitemap that should be analyzed

site = ""


#the results will be saved to sitemap_ngrams.csv file

More Pennsylvania News

Access More

Sign up for Pennsylvania State News

a daily newsletter full of things to discuss over drinks.and the great thing is that it's on the house!