SEO Crawling
In simple words, scanning of the website by search engines software programs.
Some of the famous WebCrawler’s are listed below
· Google’s WebCrawler is “Googlebot or Spiders”, Yahoo’s WebCrawler is “SLURP”, Bing’s WebCrawler is “Bingbot”, Duckduckgo’s crawler is “Duckduckbot”, Chinese search engine Baidu’s crawler is “Baiduspider”, Russian search engine Yandex crawler is “Yandexbot”, another Chinese search engine Sogou.com crawler is “Sogou spider”, France based search engine Exalead crawler is “Exabot”, Facebook crawler is “Facebot”, Amazon Alexa’s crawler is “la_achiver”
There are 100s of WebCrawler’s out there.
The crawlers are designed software to scan the webpage and gather information like title of the webpage, Meta description, words, code, links, sitemaps, images, videos, audio etc. The crawlers collect the information and follow links on the webpage. They, then bring back all the information to search engine’s hub or servers. The main purpose of crawlers is to discover everything on the internet and store at a central place. This is also called Indexing.
SEO Indexing
The information collected by crawlers is stored in in search engines and indexed based on the content. The search engines algorithms define how the data should be stored, retrieved and processed to be presented when needed. Like a manual, the indexing defines the content of the website. Indexing is the initial step to be discovered on search engines. It speeds up the process of finding the relevant information.
To crawl and index the webpages, it is imperative for search engines to find the pages.
“Remember pages not crawled are pages not indexed and finally not ranked.”
The search engines have crawl budget that means that only certain number of pages can be crawled in a given day. The budget is assigned based on how big the website is and which URLs are worth crawling based on demand & updates. The pages can be recrawled again based on popularity, content updates, and type of page category.
It takes time for search engines to find your website. Even though the search engines crawl the web to find the websites, it is always suggested to submit your website for crawling. In Google, you can submit the website by adding the sitemap on search console. Sitemap is a file that gives information about all the pages on your website. Bing has public URL Submission Tool and Bing Webmaster tools to submit the website for crawling. Yahoo is powered by Bing’s Index, so if you submit to Bing, it automatically indexes your website for Yahoo.
To optimally benefit from search engine crawl and indexing, let the bots or spiders know if you want the webpage to be crawled or not. You can do this by using Robot.txt file. The search engines try to get to the robot.txt files before indexing to check on special instructions. These instructions can be “NoIndex” tells search engines not to index the page, “Disallow” does not allow the page to crawl and “NoFollow” tags tell the search engine not to follow the links on the page. The robot.txt file sits at the root of the domain. It is very helpful in reducing the crawl time spent on the website and crawlers can focus on more relevant webpages or content.
Almost 72.6% of internet users will access the web just via their smartphones by 2025, equivalent to nearly 3.7 billion people. Increase in mobile visitors and Google’s mobile first indexing are game changers. Desktop is heading towards history museum.
Mobile First Indexing is crawling and indexing the mobile pages first. The desktop websites that are not mobile responsive impact negatively on the ranking. Lack of mobile friendly experience is not good for the visibility of the website.
The next move to come is to shift the weight to Al-first Indexing & Machine learning. Google is slowly but steadily moving towards results shown on searches based on AL and machine learning for e.g Google News.
50% of all searches will be voice searches by 2020. Customers are using voice search when driving, watching tv, doing an activity like cooking, walking, working etc. The search engines got one shot to crawl and index the right answer. Schemas and Structured data give the information needed by crawlers for voice first experience. Conversational content hub captures the 80% of informational intent.
Why crawling and indexing is important for digital marketing
· As a digital marketer you want to make sure that there are no broken links on your website that are shown to the customer.
· You can create good linking structure for easy and smooth surfing of the website both by crawlers and customers.
· The website technology is advanced to keep up with dynamic demands of digital world.
· Adding new and relevant content will encourage the crawler to crawl your website more often resulting in higher rank.
· Checking on website speeds can reduce the crawl time and increase the crawl pages. When pages load quickly it keeps the crawlers and customers happy.
Author: Armina Fareed, Digital Marketing Consultant for small businesses and nonprofits
Comments