Creating and Submitting Sitemaps
Monday, September 22nd, 20089. Creating & Submitting Sitemaps to Search Engines
Sitemaps play an important role when optimising your website for search engines as it helps them indexing your website (especially deep linked pages) and manage their crawl activities.
A sitemap is either an HTML page, XML or TEXT file that contains a list of all the pages available on your website or at least the most important pages (for large websites).
The HTML format sitemaps, are mostly used for visitors as a quick navigation point through the website, while a more advanced way is XML (Google search engines) or TEXT file (YAHOO search engine).
XML sitemap
An XML sitemap can be created either manually or could be an automated system using “Sitemap Generators” which are widely available on the NET.
A standard XML sitemap should look like this (italic text is optional):
<?xml version=”1.0″ encoding=”UTF-8″?>
<urlsetxmlns=”http://www.sitemaps.org/schemas/sitemap/0.9″>
<url>
<loc>http://www.example.com/</loc>
<lastmod>2005-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
</urlset>
The Sitemap must:
- Begin with an opening <urlset> tag and end with a closing </urlset> tag.
- Include a <url> entry for each URL as a parent XML tag.
- Include a <loc> child entry for each <url> parent tag.
XML tag definitions
The available XML tags are described below.always, hourly , daily, weekly, monthly, yearly, never
<urlset> – required
Encapsulates the file and references the current protocol standard.
<url> – required
Parent tag for each URL entry. The remaining tags are children of this tag.
<loc> – required
URL of the page. This URL must begin with the protocol (such as http) and end with a trailing slash, if your web server requires it. This value must be less than 2048 characters.
<lastmod> – optional
The date of last modification of the file. This date should be in W3C Datetime format. This format allows you to omit the time portion, if desired, and use YYYY-MM-DD.
<changefreq> – optional
How frequently the page is likely to change. This value provides general information to search engines and may not correlate exactly to how often they crawl the page.
Valid values are:
The value “always” should be used to describe documents that change each time they are accessed. The value “never” should be used to describe archived URLs.
Please note that the value of this tag is considered a hint and not a command. Even though search engine crawlers consider this information when making decisions, they may crawl pages marked “hourly” less frequently than that, and they may crawl pages marked “yearly” more frequently than that. It is also likely that crawlers will periodically crawl pages marked “never” so that they can handle unexpected changes to those pages.
<priority> – optional
The priority of this URL relative to other URLs on your site. Valid values range from 0.0 to 1.0. This value has no effect on your pages compared to pages on other sites, and only lets the search engines know which of your pages you deem most important so they can order the crawl of your pages in the way you would most like.
The default priority of a page is 0.5.
Please note that the priority you assign to a page has no influence on the position of your URLs in a search engine’s result pages. Search engines use this information when selecting between URLs on the same site, so you can use this tag to increase the likelihood that your more important pages are present in a search index.
Also, please note that assigning a high priority to all of the URLs on your site will not help you. Since the priority is relative, it is only used to select between URLs on your site; the priority of your pages will not be compared to the priority of pages on other sites.
Entity escaping
We require your Sitemap file to be UTF-8 encoded (you can generally do this when you save the file). As with all XML files, any data values (including URLs) must use entity escape codes for the characters listed in the table below.
Character Escape Code
Ampersand “&” – ”&”
Single Quote “‘” – “'”
Double Quote ‘”‘ – “"”
Greater Than “>” – “>”
Less Than ”>” – ”<”
In addition, all URLs (including the URL of your Sitemap) must be encoded for readability by the web server on which they are located and URL-escaped.
More information on this can be found on Google website >>
TEXT SITEMAP
A text sitemap is simply a text file with a list of URLs, one on each line, and is supported by Yahoo! Site Explorer. Unlike XML sitemaps, it does not allow for metadata about each URL.
Here is some sample text sitemap code:
http://www.eireseo.ie/seo-tutorials/
http://www.eireseo.ie/seo-tutorials/2008/09/22/creating-and-submitting-sitemaps.html
http://www.eireseo.ie/seo-tutorials/2008/09/22/search-engine-optimization-white-paper.html
http://www.eireseo.ie/seo-tutorials/2008/09/22/page-content.html
http://www.eireseo.ie/seo-tutorials/2008/09/22/getting-the-page-title-right.html
http://www.eireseo.ie/seo-tutorials/2008/09/22/keywords-research.html
Referencing & Submitting your sitemap
Once verified and you are sure that the sitemap is properly formated you can either reference to it in your robots.txt file as below:
User-agent: *
Sitemap: http://yourdomain.com/sitemap.xml
which tells the crawler how to find your sitemap or by subitting your sitemap to search engines directly through:
Google Webmaster Tools: http://www.google.com/webmasters (you must have a Google account)
Yahoo! Site Explorer: https://siteexplorer.search.yahoo.com (you must have a Yahoo! account)
Once you have an account with the above search engines, you will be presented with a link and a code to verify your website (either by adding a Meta tag or uploading a file on the server).















