What You Need to Know About the New Bing GPT Integration
2023-11-2 22:17:45 Author: securityboulevard.com(查看原文) 阅读量:10 收藏

In the past few years, Microsoft has heavily invested in OpenAI, forging a relationship with the company behind the well-known generative language model ChatGPT. We suspected in our first ChatGPT post that companies like OpenAI could be tempted to use Bing or Google search engine scraper bots to gather data to train their large language models (LLMs) like ChatGPT. This integration would make it much harder for businesses to opt out of data collection without negatively impacting their business’ online presence.

Earlier this year, Microsoft announced that AI would be integrated in their search engine, Bing, so that users could interact with it directly from the search engine to ask questions. This feature is called new Bing, available for Microsoft Edge users, and uses GPT-4—the same model as ChatGPT.

You may be wondering how you can prevent the new Bing from using your website data at training, or how to stop users from obtaining responses that can only be found on your website, as this could negatively impact your business. We looked into how the Bing–GPT integration works and how businesses can opt out of having their data used by the new Bing.

How to Access the New Bing From the Edge Browser

If you perform a search on Bing—for example, “what is DataDome”—a ‘Chat’ section next to a blue icon will show up below the search bar.

A screenshot of the new Bing search UI which includes an AI chat tab.

DevOps Unbound Podcast

If you click on it, it opens a new page with the new Bing interface, which is set up like a chat.

A screenshot of the new Bing-GPT chat UI

Our “what is DataDome” search query was automatically processed by GPT-4 and the new Bing provided a summary of what DataDome is doing: protecting businesses against online fraud and bad bots!

As a user, you can ask any question directly in the new Bing chat UI, and Bing will use GPT-4 to answer your questions—meaning you won’t need to visit the websites directly to get your answer. Note, however, that the new Bing still lists its sources in the “Learn more” section.

How is Bing gathering data for query responses?

In the first popular version of ChatGPT, based on GPT-3, OpenAI was quite transparent about the source of the training data. They don’t provide this information anymore for the latest versions of GPT, as there is no mention of the training dataset in the GPT-4 technical report.

As we predicted a few months ago, it’s highly likely OpenAI is leveraging its relationship with Bing to use the data collected by Bingbot—the scraper used by Bing to index the web—to gather training data at scale for training their LLMs.

The reason we argue this is highly likely comes from our next finding: what happens when you ask the new Bing to retrieve information from a specific URL?

To conduct our test, we asked the new Bing to summarize the content of a page located on the DataDome website. We asked it to ensure it was using the latest version to try to force it to make a request to our site.

A screenshot of a query made to the new Bing AI chat, asking it to summarize an article on DataDome's website.

Even though we asked Bing GPT to retrieve the latest version of the URL, we don’t see any requests made to the URL, no matter the IP address or the user-agent.

However, in going over the previous 24 hours of our logs, we observed that Bingbot made several requests to this page (among others on our website). This activity appears to be the standard Bingbot scraper analyzing every public page for display on the search engine.

A screenshot of DataDome website logs, showing activity from the Bingbot scraper before the query was made.

This is strong evidence that the new Bing is probably using the content gathered by Bingbot. However, it is not performing HTTP requests in the moment to gather information about URLs provided in the Chat UI.

In future testing we could go further by delivering a special page only to Bingbot, then see if that content is the one used when asking questions about it in Bing’s Chat UI.


文章来源: https://securityboulevard.com/2023/11/what-you-need-to-know-about-the-new-bing-gpt-integration/
如有侵权请联系:admin#unsafe.sh