SEO Consultant shall facilitate users to analyze the websites for Search Engine Optimization (SEO) perspective. It uses Hadoop to process large datasets to extract the required info by dividing bulk data into smaller pieces and then process them via the MapReduce paradigm. SEO Consultant shall provide four main features i.e. Broken Links Searcher, Backlinks Profile Analysis, Page Content Analysis with Suggestions for Improvement and a Website Crawler. These are the vital features to be analyzed for a website according to SEO point of view. After a complete analysis, SEO Consultant will generate a detailed report against the given URL.
Introduction SEO Consultant
SEO Consultant will be a Hadoop based application. “ Hadoop is an Apache open-source framework written in java that allows distributed processing of large datasets across clusters of computers using simple programming models” .
SEO Consultant will accept large datasets as input. It will use the Hadoop MapReduce technique, which is a processing technique for distributed computing. The technique performs two jobs Map and Reduce. A map will convert a dataset into another dataset, where each section is broken down into records (key, value pairs). The output of the Map job will be then given as an input to the Reduce. Reduce will combine those records into a smaller set of records having the same key.
SEO consultant provides four types of reports of a website analysis faster than other tools available in the market because it will be processed on commodity hardware using Hadoop MapReduce. Following Features provided by SEO Consultant:
- Backlinks Profile Analysis
Backlinks are considered very important in SEO (Search Engine Optimization). When you have valuable backlinks of your website. Your website will get a priority to be indexed by Search Engines. It is also important for page ranking.
It includes the counting backlinks available in the dataset of web pages. We shall use distributed links analysis techniques so that millions of pages can be analyzed in inadequate time. In addition to counting, we also aim to analyse the quality of those backlinks using different techniques, whether those are from authority sites that provide positive value in SEO or from low impact sites.
Backlink profile analyzer will analyze back all the links that are pointed to the available links in the dataset which will be given as an input. Backlink profile analyzer not only analyze the number of total backlinks but it also traces how many links are coming from each website and how many unique backlinks are there in the data set. Furthermore, it also fetches the anchor texts of all these backlinks available in the dataset.
The backlink profile analyzer will tell you about the quality of your link. If the quality of your links is negative, you can take immediate and how many backlinks you have to your website.
- Broken Links Searcher
Broken links are those links, that point to an invalid web resource or not pointing anywhere (404 error). Broken links leave negative effect on site SEO. Search Engine’s spider will stop crawling and consider it dead end of the website. Search Engine will consider the website under poor usability. So it’s important to correct them to remove them completely.
The Broken Links Searcher will open all links of the given URL and continue up to defined depth to search the broken links. Broken Links searcher will distinguish all broken links on your website and will also show the broken URLs.
Page Analysis and Suggestions for Improvement
Content analysis include performing different statistics on page contents and suggesting users the best practices. Some of these features include:
- Title Tag contentand its length in characters.
- Meta Description, Meta Keywords content and its length in characters.
- Headings count of each heading tag in tabular form and content of each tag in a sequence.
- Total Images used in website link and Alt text of all images also highlights missing Alt tags.
- HTML to Text Ratio it will show text content size and html content size and will show its HTML to Text ratio.
- WWW Resolve a redirection will be checked in place to redirect traffic from your non-preferred domain. i.e. When user navigate to a website such as http://www.example.com, user might type that URL or user might type http://example.com.
- IP Canonicalization a check is made on domain’s IP whether it is redirected to original domain name or not.
- In-page Links and their type weather it is internal or external.
- Sitemap.xml, Robot.txt, Favicon and Custom 404 Page existence will be checked if any of thing is missing will be reported.
- URL Rewrite will check whether website have SEO friendly URLs or need to on Rewrite module in .htaccess for SEO friendly the URLs.
- Underscores in the URLs will check whether underscores are used in URLs of your website.
- Iframe and Embedded Objects will check whether Iframe or Embedded Objects are used in your website or not.
- Load time of given website and suggestions to improve it.
- DOCTYPE version, Language and Encoding type will be check either it is declared on website or not and show the report.
- Website Crawler
Website Crawler will browse WWW (World Wide Web). It will move from one page to another and insert these webpages into a local database. It will create a copy of all the web pages that will be analysed later on.
Web crawler will allow the user to find SEO related info for the sites which are not available in the dataset. So web crawlers will download the contents of that site from the internet and perform its analysis against the sites available in the dataset. We aim to explore some open source crawlers and choose one with better features and their programmatic access.
System Architecture of SEO Consultant
Motivation and Scope
SEO Consultant will tell you about the SEO quality of a website. It will show on which aspects a website needs improvements to make SEO quality better and rank a website on top of Search Engines. SEO Consultant will be Hadoop based which means you will get a response very efficiently. It will compute the results very fast no matter how many large datasets you have given.