Sitemaps
Sitemaps are used to help Google understand which pages should be crawled. Google and other Search Engines do not guaranty that all URLs in the Sitemap(s) will be indexed but creating and submiting your Sitemap(s) is highly recommended.
A sitemap is a layout of the website’s most important pages that helps search engines find, crawl and index all of the site’s content. Each Sitemap can contain a maximum of 50k URLs so if your site is huge with millions of pages that you want to have indexed you will need to create as many Sitemaps as requested.You create sitemaps for your URLs – the most used and important type of Sitemap, images but also news.
Although Sitemaps are not required they will help your SEO especially id the site is new or with only limited visibility although if not well implemented they can also create chaos!
There are many plugins out there that can create Sitemaps for you so you don’t have to create them manually. Most of them also update automatically. Yoast is the most used one.
Beware though that you may not want to have all URLs indexed – and these should therefore no be included in your sitemaps. If you have URLs set as noindex or are blocked in the Robots.txt file, these should not be included in the Sitemaps.
Please note that the entry for most cases is not required so can be ignored.
Most importantly avoid any duplicate, noindex, redirects or other faulty URLs. Also no need to include other URLs that you simply do not want or need to be indexed.
Once you are ready with with your Sitemap and you are confident it is error-free, it is strongly recommended that you submit is in your Google Search Console account by entering the URL in the Sitemaps section. Once this is done you can check for errors if any, controlling indexation and investigate why some are not indexed and correcting errors.
Example
<?xml version=”1.0″ encoding=”UTF-8″?>
<urlset xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9″>
<url>
<loc>https://www.mysite.com/en </loc>
<lastmod>2019-03-03</lastmod>
</url>
<url>
<loc>https://www.mysite.com/de </loc>
<lastmod>2019-03-03</lastmod>
</url>
….
</urlset>
Robots.txt
The Robots file (robots.txt) controls what crawlers can access or not.
Example
Block one folder
User-agent: *
Disallow: /folder/
Block one file
User-agent: *
Disallow: /file.html
Block all
User-agent: *
Disallow: /
Allow all
User-agent: *
Disallow:
User agents: google
User-agent: Googlebot