robots.txt – everything you need to know!

Robot reads txt file of WordPress

You have probably heard of this file before. But what exactly is behind the robots.txt file? How does it work and what effects does it have on your website?

We’ll tell you!

robots.txt – what is it?

A robots.txt file is a text file. It is a kind of instruction for bots and crawlers (e.g. Googlebot) that states which directories of a website may be read and which may not. For example, duplicate files can be excluded from indexing.

Without such a robots.txt file, the crawler or bot searches the entire website – potentially every single file. These can then all end up in Google search, for example, even if they shouldn’t end up there at all (e.g. private admin area of the website). This can also have a negative impact on your search engine optimization, as subpages are crawled that are not optimized for search engines.

A crawler is a computer program or bot that automatically searches the World Wide Web for data, analyses and indexes websites.

Example for a robots.txt file

robots.txt Datei devowl

In the screenshot you can see the txt file of the devowl.io website. In the following, we will break down the structure of the robots.txt file.

  • User-agent: *: This specifies which bots and crawlers are allowed to search the website. The asterisk (*) means that all bots/crawlers have access. Alternatively, User-agent: Googlebot could be entered here so that only the Google Bot searches the website.
  • Disallow: /wp-admin/: In Disallow you define which directories, pages and files are not allowed to be read by the bots/crawlers defined in the user agent. In our case, these are pages that begin with devowl.io/wp-admin/ – i.e. the admin login. If instead of /wp-admin/ there is simply /, this would mean that no directory may be searched. If you enter nothing after Disallow:, all pages will be crawled.
  • Allow: /wp-admin/admin-ajax.php: Allow all bots to search the specific PHP file. Again, the page name must match the one displayed in the browser.
  • Sitemap: https://devowl.io/sitemaps.xml: In some cases – like ours – you will also find a sitemap entry in the robots.txt file. This entry tells you where exactly you can find the sitemap of the corresponding website. You can copy the link and paste it into a new tab. All URLs of a website are listed in an XML sitemap.

Regular expressions in robots.txt

Sometimes it is impractical to always specify the full directory, page or file names. For example, if you want to exclude all .pdf files in all subdirectories from crawling. Therefore, there are regular expressions – dynamic rules – that you can use.

  • *: The * symbol – also known as a wildcard – is used when you want to have any characters within a string. For example, in disallow: /uploads/product/*/*.jpg, all JPG images (second asterisk for the image name) that are in a subdirectory (first asterisk) of uploads/product/ will no longer be crawled.
  • $: There is also the dollar sign. This is used in addition to the wildcard symbol if you want to exclude URLs that have a certain ending. For example, disallow: *.pdf$ says that all .pdf files should not be crawled.

Crawl delay: Decide how fast the crawler works!

The crawl delay specifies the delay in seconds with which the bot should crawl the pages. Crawl delay: 10 means that the bot crawls at intervals of 10 seconds per file/page.

What a complex robots.txt file looks like

Theoretically, you can also expand your robots.txt file and include several entries. This could then look something like this:

User-Agent: Googlebot
Disallow: /*.pdf$
User-Agent: Adsbot-Google
Disallow: /
User-Agent: *
Disallow: /wp-admin/

In this example, the following happens:

  • Google Bot: May crawl everything except .pdf files
  • Google Ads Bot: May not crawl the website at all
  • All other bots: May crawl the entire website, except the /wp-admin/ directory.

Where can I find the robots.text file of my website?

The robots.txt file is located in the root directory of your website. You can access the file by calling up the domain of the website in your browser with the suffix /robots.txt. Example: devowl.io/robots.txt

In the same way, you will find the robots.txt file in the root directory when you connect to your webspace via FTP.

If a web crawler or bot wants to search your website, the robots.txt file in the root directory is its first port of call. After it knows what it is allowed to crawl in the first place, it searches the website further.

How can I create a robots.txt file?

In order to create a robots.txt file without much effort, there are now several tools that you can use. One of these tools is the robots.txt generator from ryte.

RYTE robots txt Generator

The robots.txt generator is very beginner-friendly. You can cobble together your robots.txt file in just a few steps. After you have set everything up, click on create and you can download the finished file. All you have to do is upload it to the root directory of your website, e.g. via FTP.

Another option is the robots.text generator from Small SEO Tools.

robots.txt Generator Settings

This generator is also similar to the one from ryte. You can make different settings – like defining the different bots. Then you can create and download the file.

Important: To ensure that your robots.txt file works without problems, you should pay attention to the following aspects:

  • Lower and upper case
  • Save in the root directory
  • Check for correct instructions (are the correct directories or pages disallowed/allowed?).

How can I upload the robots.txt?

If you don’t want to upload the robots.txt file using an FTP client (e.g. Filezilla), you have two other options: Either you contact your hosting provider (if you do not have direct access to the web server) or you use a WordPress plugin. One of these plugins is WP Robots Txt.

WP Robots TXT Plugin

The plugin is free of charge. After installation, you will find it on the left in the menu bar under Settings > Reading. If you scroll down here, you will find the Robots.txt Content entry. Here, you can edit the robots.txt file in WordPress. You can enter the code we generated in the previous step in the corresponding field. Don’t forget to save the changes afterwards.

But you can also add a robots.txt file using the Yoast SEO plugin.

Yoast SEO robots txt

To import your created robots.txt file, go to Tools > File Editor in the menu on the left after installation. Here you will find the option to upload the file.

Tip: To test the changes to the robots.txt file, you can use Google’s robots.txt tester, for example.

robots.txt and SEO

What does the robots.txt file actually have to do with search engine optimization? Not exactly a little. By means of the robots.txt file, you can basically better control which pages should be crawled and indexed by the bots. This not only saves Google time during crawling, but also helps to ensure that only relevant pages of your website end up in Google. This can have a positive influence on your Google ranking.