It is important to clarify that robots.txt is not a way from preventing search engines from crawling your site (i.e. When a bot first arrives at the site it looks for the robots.txt file. A robots.txt file located in a subdirectory isn't valid, as bots only check for this file in the root of the domain. What is a robots.txt File? Robots.txt Usage Can Block Inbound Link Effectiveness As a result, the URL of the page and, potentially, other publicly available information such as anchor text in links to the site, or the title from the Open Directory Project (www.dmoz.org), can appear in Google search results.
I tried clearing the cache in the application’s tmp/cache/ directory, restarted the application server, cleared Varnish‘s and my browser’s cache, but still the content type of the file was returned as:
A robots.txt file is a text file placed in the root of your website or blog that contains instructions for the bots (spiders, search engine bots). It is blocked by the robots.txt file, yet it still appears in position four in Google for the query ‘next blog’. it is not a firewall, or a kind of password protection) and the fact that you put a robots.txt file is something like putting a note “Please, do not enter” on an unlocked door – e.g. Joomla in a subdirectory. Where to place my robots.txt file? The robots.txt file must reside in the root of the domain or subdomain and must be named robots.txt.
While Google won't crawl or index the content of pages blocked by robots.txt, we may still index the URLs if we find them on other pages on the web. Robots.txt : Robots.txt is a text file webmasters create to instruct web robots ( search engine robots ) which pages on your website to crawl or not to crawl. A standard robots.txt its included in your joomla root. Sometimes robots.txt can be delivered conditionally based on user agents, so this is the only method to see exactly what Google is seeing. Check Google Search Console to see the current robots.txt which Google is using. If it does not find one it will look for and gather information about all the files on your site. After experimenting with it for a while, I came to the conclusion that for a very weird reason my Redmine application always returns the robots.txt file using the text/html content type. As you can see, adding an entry to the robots.txt file is not an effective way of keeping a page out of Google’s search results pages.