2005-07-23

RSS Tutorial for Content Publishers and Webmasters

このチュートリアルはRSSと呼ばれるウェブ上で利用されるフォーマットの特徴や利点を解説し、更にその技術的な概要も手短に説明します。また、RSSと同じようなフォーマットであるAtomについての情報も含まれます。読者はXMLやその他ウェブ上で利用される技術について良く知っていることと仮定します。このチュートリアルは徹底的なものではないので、より多くの情報については更なる情報セクションを参照してください.

RSSの概説
フィードに含めるコンテンツを選択する
フィードを公開する
フィードを通知する
バージョン
どのフォーマットを選択するべきか？
良いフィードを生成するためのテクニック
フィード・ツール
更なる情報
この文書について

RSSの概説

ニュース・ヘッドラインや検索結果、更新履歴、雇用情報など、日々アクセスするウェブ上の情報について考えてみてください。それらの大量なコンテンツはリスト、おそらくHTMLのli要素ではないですが、リスト的な傾向を持つ情報と考えることが出来ます。

多くの人々はたくさんのこれらリストを追いかける必要がありますが、ひとたび手に余るほどの量になったら追いかけることは難しくなります。それぞれのページを訪問し、読み込み、どのようなページであったかを思い出した上で、リストの最後にチェックした時の場所を探さなければならないからです。

RSSはXMLベースのフォーマットなので、他の情報やメタデータと共にハイパーリンク、閲覧者がそのリンクを辿るかどうか決められるように、のリストを配信することが出来ます。

また、コンピューターで情報を取得し理解することができるので、全ての興味のあるリストを追いかけ、個人々々に合わせることが可能です。つまり、これはむしろHTMLのように直接人々に見せられるものではなく、人に代わってコンピューターが利用するためのフォーマットです。

これを可能にするためには、ウェブサイトは他のファイルやリソースと同じように利用できるフィード、またはチャンネルを作成しなければならないでしょう。一度フィードを用意したのならば、コンピューターはリストの最新の項目を得るために定期的にそれを取得することが出来るようになります。大抵の場合、人々はそれをアグリゲーター、単一のインターフェイスで管理と表示を行うことの出来るプログラム、で行うでしょう。

フィードはリスト的な傾向を持つ情報、例えばリンクと共に内容物そのものの配信(ウェブログのような)、にも使うことが出来ます。しかし、このチュートリアルではリンクの配信にRSSを利用することに焦点を置きます。

フィードには何が入っているのか？

フィードはリンクによって特定されるアイテムまたはエントリのリストから成ります。それぞれのアイテムは、同じようにリンクと関連付けられた様々なメタデータを持つことが出来ます。

最も基本的なエントリのメタデータは、リンクのタイトルとその概要です。例えば、ニュース・ヘッドラインの配信の場合、これらはニュースの見出しと最初の段落または要約になるでしょう。例えば、シンプルなエントリは以下のようになります。

<item>
  <title>Earth Invaded</title>
  <link>http://news.example.com/2004/12/17/invasion</link>
  <description>The earth was attacked by an invasion fleet
  from halfway across the galaxy; luckily, a fatal
  miscalculation of scale resulted in the entire armada
  being eaten by a small dog.</description>
</item>

加えて、フィードそのものもそれに関連付けられるメタデータを持つことが出来るので、タイトル(例: "Bob’s news headlines")や概要、その他の発行者や著作権情報などを付記することが出来ます。

完全なフィードがどのようなものかは、"フォーマットのバージョンとモジュール"を参照してください。

フィードはどうやって使うのか？

アグリゲーターが一般的なフィードを利用するもので、それらには様々なタイプのものがあります。ウェブ・アグリゲーター(ポータルなどと呼ばれることもあります)はフィードをウェブ・ページに表示してくれます。代表的なものはMy Yahoo!でしょう。アグリゲーターはE-メール・クライアントやユーザーのデスクトップに統合された形、または独立した専用のソフトウェアとして提供されているかもしれません。

アグリゲーターは、複数の関連したフィードをまとめて表示する機能や既に閲覧したエントリを隠す機能、フィードやエントリをカテゴリ分けする機能などいろいろな機能を提供します。

フィードの他の利用としては、フィードは機械的に扱え、検索エンジンはサイトのどの部分が重要であるかどうかやどの部分が単なるナビゲーションや広告であるかどうかを把握する必要が無いため、検索エンジンやその他ソフトウェアでサイトのトラッキングに利用することが挙げられます。人々が彼らのウェブ・サイトでフィードを再利用すること、必要に応じてあなたのコンテンツを表示するといったことを許可することもできます。

なぜフィードを用意すべきなのか？

Your viewers will thank you, and there will be more of them, because it allows them to see your site without going out of their way to visit.

While this seems bad at first glance, it actually improves your site’s visibility; by making it easier for your users to keep up with your site - allowing them to see it the way they want to - it’s more likely that they’ll know when something that interests them is available on your site.

For example, imagine that your company announces a new product or feature every month or two. Without a feed, your viewers have to remember to come to your site and see if they find anything new - if they have time. If you provide a feed for them, they can point their aggregator or other software at it, and it will give them a link and a description of developments at your site almost as soon as they happen.

News is similar; because there are so many sources of news on the Web, most of your viewers won’t come to your site every day. By providing a feed, you are in front of them constantly, improving the chances that they’ll click through to an article that catches their eye.

しかし、僕のコンテンツを寄付することになるのではないか？

いいえ！(あなたが望むなら)あなたは依然として自分のコンテンツに対する著作権を保有します。

フィードで配信する情報を、完全な内容か煽りだけかといったように制御することもできます。Your content can still be protected by your current access control mechanisms; only the links and metadata are distributed. You can also protect the RSS feed itself with SSL encryption and HTTP username/password authentication too, if you’d like.

In many ways, syndication is similar to the subscription newsletters that many sites offer to keep viewers up-to-date. The big difference is that they don’t have to supply an e-mail address, lowering the barrier of privacy concerns, while still giving you a direct channel to your viewers. Also, they get to see the content in the manner that’s most convenient to them, which means that you get more eyes looking at your content.

フィードに含めるコンテンツを選択する

Any list-oriented information on your site that your viewers might be interested in tracking or reusing is a good candidate for a feed. This can encompass news headlines and press releases, job listings, conference calendars and rankings (like ‘top 10’ lists).

For example;

News & Announcements - headlines, notices and any list of announcements that are added to over time
Document listings - lists of added or changed pages, so that people don’t need to constantly check for different content
Bookmarks and other external links - while most people use RSS for sharing links from their own sites, it’s a natural fit for sharing lists of external links
Calendars - listings of past or upcoming events, deadlines or holidays
Mailing lists - to compliment a Web-based archive of public or private e-mail lists
Search results - to let people track changing or new results to their searches
Databases - job listings, software releases, etc.

While it’s a good start to have a “master feed” for your site that lists recent news and events, don’t stop there. Generally, each area of your site that features a changing list of information should have a corresponding feed; this allows viewers to precisely target their interests.

For example, if your news site has pages for World news, national news, local news, business, sports, etc., there should be a feed for each of these sections.

If your site offers a personalized view of data (e.g., people can choose categories of information that will show up on their home page), offer this as a feed, so that the viewers’ Web pages match the content of their feeds.

A great example of this is the variety of feeds that Netflix provides; not only can you keep track of new releases, but also personalised reccommendations and even a listing of the movies in your queue.

Another good example is Apple’s iTunes Music Store RSS feed generator; you can customize it based on your preferences, and the views it allows match those provided in the Music Store itself.

Finally, remember that feeds are just as - if not more - useful on an Intranet as they are on the Internet. Syndication can be a powerful tool for sharing and integrating information inside a company.

フィードを公開する

There are a number of ways to generate a feed from your content. First of all, explore your content management system - it might already have an option to generate an RSS feed.

If that option isn’t available, you have a number of choices;

Self-scraping - The easiest way to publish a feed from existing content. Scraping tools fetch your Web page and pull out the relevant parts for the feed, so that you don’t have to change your publishing system. Some use regular expressions or XPath expressions, while others require you to mark up your page with minimal hints (usually using <div> or <span> tags) that help it decide what should be put into the feed.
Feed integration - If your site is dynamically generated (using languages like Perl, Python or PHP), it may have a RSS library available, so that you can integrate the feed into your publishing process.
Starting with the feed - Alternatively, you can manage the list-oriented parts of your content in the RSS feed itself, and generate your Web pages (as well as other content, like e-mail lists) from the feed. This has the advantage of always having the correct information in the feed, and tools like XSLT make this option easy, especially if you’re starting from scratch.
Third party scraping - If none of these options work for you, some people on the Web will scrape your site for you and make the feed available. Be warned, however, that this is never as reliable or accurate as doing it yourself, because they don’t know the details of your content or your system. Also, using third parties introduces another point of failure in the delivery process; problems there (network, server or business) will cause your feed to be unavailable.

For more information about all of these options, see “Feed Tools” and “More Information”.

フィードを通知する

An important step after publishing a feed is letting your viewers know that it exists; there are a lot of feeds available on the Web now, but it’s hard to find them, making it difficult for viewers to utilize them.

Pages that have an associated RSS feed should clearly indicate this to viewers by using a link containing like ‘RSS feed’. For example,

    <a type="application/rss+xml" href="feed.rss">RSS feed for this page</a>

where ‘feed.rss’ is the URL for the feed. the ‘type’ attribute tells browsers that this is a link to an RSS feed in a way that they understand.

Additionally, some programs look for a link in the <head> section of your HTML. To support this, include a <link> tag;

<head>
  <title>My Page</title>
  <link rel="alternate"
 type="application/rss+xml"
 href="feed.rss"
 title="RSS feed for My Page">
</head>

These links should be placed on the Web page that is most similar to the feed content; this enables people to find them as they browse.

Note that Atom feeds should use application/atom+xml rather than application/rss+xml in both styles of use.

Finally, there are a number of guides and registries for RSS feeds that people can search and browse through, much like the Yahoo directory for Web sites; it’s a good idea to register your feed; see More Information.

フォーマットのバージョンとモジュール

There are a number of different versions of the RSS format in use today, but the main choices are RSS 1.0 and RSS 2.0. Each version has its benefits and drawbacks; RSS 2.0 is known for its simplicity, while RSS 1.0 is more extensible and fully specified. Both formats are XML-based and have the same basic structure.

There’s one more choice; Atom is an effort in the IETF (an Internet standards body) to come up with a well-documented, standard syndication format. Although it has a different name, it has the same basic functions as RSS, and many people use the term “RSS” to refer to RSS or Atom syndication.

This section presents a quick overview of each; for more information, see their specifications and supporting materials.

RSS 2.0

RSS 2.0 is championed by UserLand’s Dave Winer. In this version, RSS stands for “Really Simple Syndication,” and simplicity is its focus.

This branch of RSS is based on RSS 0.91, which was first documented at Netscape and later refined by Userland.

Included in 2.0.1 - the latest stable version of this branch - are channel metadata like link, title, description; image, which allows you to specify a thumbnail image to display with the feed); webMaster and managingEditor, to identify who’s responsible for the feed, and lastBuildDate, which shows when the feed was last updated.

Items have the standard link, title and description metadata, as well as other, more experimental facilities like enclosure, which allows attachments to be automatically downloaded (don’t expect these features to be supported by all aggregators, however). Finally, items can have a guid element that identifies the item uniquely; this allows some advanced functionality in some aggregators.

Here’s an example of a minimal RSS 2.0 feed:

<?xml version="1.0"?>
<rss version="2.0">
  <channel>
 <title>Example Channel</title>
 <link>http://example.com/</link>
 <description>My example channel</description>
 <item>
    <title>News for September the Second</title>
    <link>http://example.com/2002/09/01</link>
    <description>other things happened today</description>
 </item>
 <item>
    <title>News for September the First</title>
    <link>http://example.com/2002/09/02</link>
 </item>
  </channel>
</rss>

In the RSS 2.0 roadmap, Winer states that this branch is, for all practical purposes, frozen, except for clarifications to the specification.

However, exensions to the format are allowed in separate modules, using XML Namespaces to avoid conflicts in their names. For example, if you had an ISBN module to track books, it might look like this;

<item xmlns:book="http://namespace.example.com/book/1.0">
  <title>Excession</link>
  <link>http://www.amazon.com/exec/obidos/tg/detail/-/0553575376</link>
  <book:isbn>0553575376</book:isbn>
</item>

Generally, though, you should look for available RSS Modules, rather than defining your own, unless you’re sure that what you need doesn’t exist.

RSS 1.0

RSS 1.0 stands for “RDF Site Summary.” This flavor of RSS incorporates RDF, a Web standard for metadata. Because RSS 1.0 uses RDF, any RDF processor can understand RSS without knowing anything about it in particular. This allows syndicated feeds to easily become part of the Semantic Web.

RSS 1.0 also uses XML Namespaces to allow extensions, in a manner similar to RSS 2.0.

RSS 1.0 feeds look very similar to RSS 2.0 feeds, with a few key differences;

The entire feed is wrapped in <rdf:RDF> … </rdf:RDF> elements (so that processors know that it’s RDF)
Each <item> has an rdf:about attribute that usually, but not always, matches the <link>; this assigns an identifier to each item
There’s an <items> element in the channel metadata that contains a list of items in the channel, so that RDF processors can keep track of the relationship between the items
Some metadata uses the rdf:resource attribute to carry links, instead of putting it inside the element.

RSS 1.0 is developed and maintained by an ad hoc group of interested people; see their Web site for more information about RSS 1.0 and RSS Modules. See below for an example of an RSS 1.0 feed.

Dublin Coreモジュール

The most well-known example of an RSS 1.0 Module is the Dublin Core Module. The Dublin Core is a set of metadata developed by librarians and information scientists that standardizes a set of common metadata that is useful for describing documents, among other things. The Dublin Core Module uses these metadata to attach information to both feeds (in the channel metadata) and to individual items.

This module includes useful elements like dc:date, for associating dates with items, dc:subject, which can be useful for categorizing items or feeds, and dc:rights, for dictating the intellectual property rights associated with an item or a feed.

Here’s an example of a minimal RSS 1.0 feed that uses the Dublin Core Module:

<?xml version="1.0"?>
<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns="http://purl.org/rss/1.0/"
  xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel rdf:about="http://example.com/news.rss">
 <title>Example Channel</title>
 <link>http://example.com/</link>
 <description>My example channel</description>
 <items>
   <rdf:Seq>
     <rdf:li resource="http://example.com/2002/09/01/"/>
     <rdf:li resource="http://example.com/2002/09/02/"/>
   </rdf:Seq>
 </items>
  </channel>
  <item rdf:about="http://example.com/2002/09/01/">
  <title>News for September the First</title>
  <link>http://example.com/2002/09/01/</link>
  <description>other things happened today</description>
  <dc:date>2002-09-01</dc:date>
  </item>
  <item rdf:about="http://example.com/2002/09/02/">
  <title>News for September the Second</title>
  <link>http://example.com/2002/09/02/</link>
  <dc:date>2002-09-02</dc:date>
  </item>
</rdf:RDF>

As you can see, RSS 1.0 is a bit more verbose than 2.0, mostly because it needs to be compatible with other versions of RSS while containing the markup that RDF processors need.

Atom

Both RSS 1.0 and 2.0 are informal specifications; that is, they aren’t published by a well-known standards body or industry consortium, but instead by a small group of people.

Some people are concerned by this, because such specifications can be changed at the whim of the people who control it. Standards bodies bring stability, by limiting change and having well-established procedures for introducing it. To introduce such stability to syndication, a group of people established an IETF Working Group to standardise a format called Atom.

Atom is functionally similar to both branches of RSS, and is also an XML-based format.

For example;

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Example Feed</title>
  <link href="http://example.org/"/>
  <updated>2003-12-13T18:30:02Z</updated>
  <author>
 <name>John Doe</name>
  </author>
  <id>urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6</id>
  <entry>
 <title>Atom-Powered Robots Run Amok</title>
 <link href="http://example.org/2003/12/13/atom03"/>
 <id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
 <updated>2003-12-13T18:30:02Z</updated>
 <summary>Some text.</summary>
  </entry>
</feed>

As you can see, Atom has a feed element that contains both the feed-level metadata as well as the entrys (analogous to RSS’ items), and entry can contain similar metadata, such as title, link, id (instead of RSS 1.0’s rdf:about or RSS 2.0’s guid), and a short textual summary (instead of RSS’ description).

Generally, Atom isn’t as widely supported as RSS 1.0 or 2.0 right now, because it’s relatively new. However, it should catch up quickly, because of the broad base of vendors supporting the standardisation effort.

どのフォーマットを選択するべきか？

One of the most confusing and unfortunate problems in syndication is the large number of formats in use. In addition to those listed above, there are many other formats (e.g., RSS 0.9, 0.91, 0.92) that are commonly encountered on the Web.

For better or worse, the decision isn’t as critical as you might think. Most aggregators and other software use syndication libraries which abstract out the particular format that a feed is in, so that they can consume any popular syndication feed.

As a result, which format to choose is a matter of personal taste. RSS 1.0 is very extensible, and useful if you want to integrate it into Semantic Web systems. RSS 2.0 is very simple and easy to author by hand. Atom is now an IETF Standard, bringing stability and a natural community to support its use.

良いフィードを生成するためのテクニック

RSS and Atom are easy to work with, but like any new format, you may encounter some problems in using them. This section attempts to address the most common issues that arise when generating a feed.

Distinct Entries - Make sure that aggregators can tell your entries apart, by using different identifiers in rdf:about (RSS 1.0), guid (RSS 2.0) and id (Atom). This will save a lot of headaches down the road.
Meaningful Metadata - Try to make the metadata useful on its own; for example, if you only include a short <title>, people may not know what the link is about. By the same token, if you shove an entire article into <description>, it’ll crowd people’s view of the feed, and they’re less likely to stay interested in what you have to say. Generally, you want to put enough into the feed to help someone decide whether they should follow the link.
Encoding HTML - Although it’s tempting, refrain from including HTML markup (like <a href="...">, <b> or <p>) in your RSS feed; because you don’t know how it will be presented, doing so can prevent your feed from being displayed correctly. If you need to include a a tag in the text of the feed (e.g., the title of an entry is “Ode to <title>”), make sure you escape ampersands and angle brackets (so that it would be “Ode to <title>”).
XML Entities - Remember that XML doesn’t predefine entities like HTML does; therefore, you won’t have   © and other common entities available. You can define them in the XML, or alternatively just use an character encoding that makes what you need available.
Character Encoding - Some software generates feeds using Windows character sets, and sometimes mislabels them. The safest thing to do is to encode your feed as UTF-8 and check it by parsing it with an XML parser.
Communicating with Viewers - Don’t use entries in your feed to communicate to your users; for example, some feeds have been known to use the <description> to dictate copyright terms. Use the appropriate element or module.
Communicating with Machines - Likewise, use the appropriate HTTP status codes if your feed has relocated (usually, 301 Moved Permanently) or is no longer available (410 Gone or 404 Not Found).
Making your Feed Cache-Friendly - Successful feeds see a fair amount of traffic because clients poll them often to see if they’ve changed. To support the load, Web Caching can help; see the caching tutorial.
Validate - use the Feed Validator to catch any problems in your feed; it works with RSS and Atom. Also, don’t just run it once; make sure you regularly check your feed, so that you can catch transient errors.

フィード・ツール

This is an incomplete list of tools for creating feeds and checking them to make sure that you’ve done so correctly. Note that there are many more libraries that help parsing feeds; these haven’t been included here because this tutorial focuses on the Webmaster, not consumers of feeds.

xpath2rss - Tool for scraping Web sites using XPath expressions (a method of selecting parts of HTML and XML documents).
Site Summaries in XHTML - Online service (also available as an XSLT stylesheet) that uses hints in your HTML to generate a feed.
myRSS - An online, third-party automated scraping service. Doesn’t require any special markup.
RSS.py - Python library for generating and parsing RSS.
ROME - Java library for parsing and generating RSS and Atom feeds, as well as translating between formats.
XML::RSS - Perl module for generating and parsing RSS.
Online Validator - Check your RSS 1.0, 2.0 and Atom feeds.

Syndicated content - Good list of best practices for creating an RSS feed.
Syndic8 - A community effort to gather, validate and search feeds with lots of other information.
RSS Workshop - A well-regarded introduction to publishing RSS feeds, from the state of Utah Online Services division.
RSS Devcenter - O’reilly’s Web portal for all things RSS.

この文書について

If you do mirror this document, please send e-mail to the address above, so that you can be informed of updates.

All trademarks within are property of their respective holders.

Although the author believes the contents to be accurate at the time of publication, no liability is assumed for them, their application or any consequences thereof. If any misrepresentations, errors or other need for clarification is found, please contact the author.

The latest revision of this document can always be obtained from http://www.mnot.net/rss/tutorial/

Version 0.91 — September 7, 2005