The desire for greater control over how search engines index and display websites is driving an effort by leading news organizations and other publishers to revise a 13-year-old technology for restricting access.
Currently, Google Inc., Yahoo Inc. and other top search companies voluntarily respect a website's wishes as declared in a text file known as “robots.txt,” which a search engine's indexing software, called a crawler, knows to look for on a site.
The formal rules allow a site to block indexing of individual web pages, specific directories or the entire site, though some search engines have added their own commands.
The new proposal, to be unveiled Thursday by a consortium of publishers at the global headquarters of The Associated Press, seeks to have those extra commands — and more — apply across the board. Sites, for instance, could try to limit how long search engines may retain copies in their indexes, or tell the crawler not to follow any of the links that appear within a webpage.
|
|
|||||||
|
Login
Search
This Month
Month Archive
who employs me
|
Robots.txt doesn't cut it anymore for news sites
Comments
No comments found.
Trackbacks
TrackBack URL: |
email this blog
Don't have a reader account, but still want to commend/castigate? Send an email.
recent articles
tweet o' the moment
News sites i can't live without
The craft
Blogs i admit to viewing
blogs i don't admit to viewing
muzeek
|
|||||