Javascript MEMO

戻る

robots.txtの設定方法

robots.txtの設定方法の書き方

robots.txtの設置有無

「robots.txt」をサーバーに置かない

すべてのクローラーを許可する場合(つまり、ブロックしたいURLがないときは)
robots.txt をサーバーに置く必要はありません。

「robots.txt」をサーバーに置く

指定した条件によりクローラーに対して許可や禁止などの有無を設定できます。
下記は「robots.text」の書き方についてとなります。

文字コード

UTF-8で保存してください。

robots.txtファイルの配置場所

ルートのドメインに置いてください。

結果	URLの例	理由
○	http://www.test.com/robots.txt	ルートにありファイル名も正しいため問題ありません。
×	http://www.test.com/dir/robots.txt	ルートにないため間違いとなります。このケースではサブディレクトリにrobots.txtがあるため間違いとなります。
×	http://www.test.com/robot.txt	「robots.text」ない(ファイル名に「s」がない)ため間違いです。 testrobots.textなども名前が違うため間違いとなります。
○	http://test.com/robots.txt	ルートにありファイル名も正しいため問題ありません。
○	http://sub.test.com/robots.txt	サブドメインのルートにありファイル名も正しいため問題ありません。

robots.txt のフィールド

※Googleがサポートしているフィールドを下記に記します。
有効な行は(1)から(3)で構成されます。
(1)フィールド
(2)コロン
(3)値
スペースは省略可能です。
行の先頭と末尾にある空白文字は無視されます。
コメントを含めるには、コメントの前に # 文字を付けます。
# 文字以降はすべて無視されることに注意してください。

フィールド	説明
user-agent	ルールを適用するクローラーを指定します。どのクローラーにルールを適用するかを指定します。大文字と小文字は区別されません。
allow	クロールを許可するURLパス。指定したクローラーにアクセスを許可するパスを指定します。パスを指定しない場合、ルールは無視されます。大文字と小文字が区別されます。
disallow	クロールを許可しないURLパス。指定したクローラーにアクセスを許可しないパスを指定します。パスを指定しない場合、クローラーはルールを無視します。大文字と小文字が区別されます。
sitemap	サイトマップの完全な URL。

URL の一致判定

マーク	説明
/	ルートおよびその下位にあるすべての URL が一致します。
*	0 個以上の有効な文字を示します。
$	URL の末尾を示します。

サンプル

＜サンプル1＞

User-Agent: *
Disallow: /

[上記の解説]
全て(*)のファイルのクロールをブロックします。
/をつけているため、すべてのディレクトリ及びファイルが対象となります。

＜サンプル2＞

User-Agent: Googlebot
Disallow: /

[上記の解説]
Googleのウェブ検索クローラー(Googlebot)に対して全てのファイルのクロールをブロックします。

＜サンプル3＞

User-Agent: *
Disallow:

[上記の解説]
全てのファイルのクロールを許可する場合は次のように「Disallow: 」の後に何も記述しません。

＜サンプル4＞

User-Agent: *
Disallow: /test/

[上記の解説]
testディレクトリに含まれる子ディレクトリなども全てブロックされます。
(例)

URL	ブロック有無
test.com/test/index.html	対象ファイルは/test/の下位にあるので、Disallow判定によりブロックされます。
test.com/dir/index.html	対象ファイルは/test/の下位にないので、Disallow判定の除外によりクロールされます。

＜サンプル 5＞

User-Agent: *
Disallow: /*.php$

[上記の解説]
「.php」という拡張子がURLの末尾にあるファイルがブロックされます。

(例)

URL	ブロック有無
test.com/test/index.html	「.php」という拡張子がURLの末尾にないためクロール対象となります。
test.com/test/index.php	「.php」という拡張子がURLの末尾にあるためブロックされます。
test.com/test/text/index.php	「.php」という拡張子がURLの末尾にあるためブロックされます。

戻る

back

How to set up robots.txt

How to write robots.txt settings

Whether robots.txt is installed

put "robots.txt" on the server

If you allow all crawlers (no URLs you want to block), you do not need to put robots.txt on your server.

Put "robots.txt" on your server

You can set whether to allow or prohibit crawlers based on the specified conditions.
The following is how to write "robots.text".

Character encoding

Please save it in UTF-8.

Where to place your robots.txt file

Place it in the root domain.

result	Example URL	reason
○	http://www.test.com/robots.txt	There is no problem as it is in the root and the file name is correct.
×	http://www.test.com/dir/robots.txt	This is incorrect because it is not in the root. In this case, this is incorrect because the robots.txt is in a subdirectory.
×	http://www.test.com/robot.txt	This is incorrect because there is no "robots.text" (there is no "s" in the file name). Testrobots.text is also incorrect because it has a different name.
○	http://test.com/robots.txt	There is no problem as it is in the root and the file name is correct.
○	http://sub.test.com/robots.txt	There is no problem as it is at the root of the subdomain and the file name is correct.

Robots.txt fields

*The fields supported by Google are listed below.
A valid line consists of (1) to (3).
(1) Field
(2) Colon
(3) Value
Spaces are optional.
Whitespace characters at the beginning and end of a line are ignored.
To include a comment, precede the comment with the # character.
Note that everything after the # character is ignored.

field	description
user-agent	Specifies which crawlers the rule applies to. Case-insensitive.
allow	URL paths that are allowed to be crawled. Specify the path that the specified crawler is allowed to access. If you do not specify a path, the rule is ignored. Case sensitive.
disallow	URL paths that should not be crawled. Specify paths that should not be accessible to the specified crawler. If you do not specify a path, the crawler will ignore the rule. Case sensitive.
sitemap	The full URL of the sitemap.

URL Matching

mark	description
/	Matches the root and all URLs below it.
*	Indicates zero or more valid characters.
$	Indicates the end of a URL.

sample

＜sample 1＞

User-Agent: *
Disallow: /

[Explanation]
Blocks crawling of all (*) files.
Because / is added, all directories and files are targeted.

＜sample 2＞

User-Agent: Googlebot
Disallow: /

[Description]
Blocks Google's web search crawler (Googlebot) from crawling all files.

＜sample 3＞

User-Agent: *
Disallow:

[Explanation]
If you want to allow crawling of all files, do not write anything after "Disallow:" as follows.

＜sample 4＞

User-Agent: *
Disallow: /test/

[Explanation]
All child directories in the test directory will be blocked.
(Example)

URL	Block presence/absence
test.com/test/index.html	The target is a child of /test/, so it is blocked by the Disallow decision.
test.com/dir/index.html	The target is not a child of /test/, so it is blocked by the Disallow decision.

＜sample 5＞

User-Agent: *
Disallow: /*.php$

[Explanation]
Files with the extension ".php" at the end of the URL will be blocked.

(Example)

URL	Block presence/absence
test.com/test/index.html	The URL will be crawled because it does not have the ".php" extension at the end.
test.com/test/index.php	It will be blocked because the ".php" extension is at the end of the URL.
test.com/test/text/index.php	It will be blocked because the ".php" extension is at the end of the URL.

back