Baidu Blocks Major Search Engines
Baidu, China’s leading internet search company, has blocked Google and Bing from scraping content from its Baidu Baike platform. This Wikipedia-style service, containing nearly 30 million entries, has now restricted access for these search engines, impacting their ability to gather data for artificial intelligence (AI) projects.
Update in Robots.txt
On August 8, Baidu updated its robots.txt file, which specifies which web addresses search engine crawlers can access. This update prevents Googlebot and Bingbot from indexing content on Baidu Baike. Earlier that day, both search engines had partial access, but the new restriction is part of Baidu’s broader strategy to protect its data assets.
The move highlights Baidu’s growing efforts to control its data as the demand for large datasets for AI models increases. It comes after Reddit also blocked various search engines from indexing its content, except for Google, with whom it has a deal for data scraping. Google and Microsoft, both significant players in AI, have been aggressively seeking data to enhance their generative AI systems.
Global Trends in Data Access
Since the launch of ChatGPT by OpenAI in late 2022, there has been a surge in the competition for data to fuel AI innovations. Microsoft, for instance, has previously threatened to cut off data access to rivals unless they stopped using its data for AI. Meanwhile, Baidu’s Baike, with 1.43 million entries, remains accessible to search engines, but not to the likes of Google and Bing.
Current Status
Despite the new restrictions, some older cached content from Baidu Baike still appears in search results on Google and Bing. As AI developers negotiate for access to valuable content, data availability continues to evolve.
What’s Next?
As major AI developers like OpenAI and Microsoft strike deals for exclusive content, the battle for data remains intense. Baidu’s decision underscores the importance of data control in the rapidly growing field of AI
For the latest tech trends and data access issues, follow TechHub for breaking news and insights.