Sihai network

1.2 billion personal data leaks: hanging on the dark Internet

Troia said he found a fixed line number he handled at & T 10 years ago in people data labs (PDL). He never used this number, but the information he entered at that time was kept here.

It is found that the server contains nearly 3 billion PDL user records, nearly 1.2 billion unique personnel and 650 million unique email addresses.

The amount of data is not only consistent with the propaganda of PDL company, but even researchers can query the data in reverse through the information returned by PDL API.

In addition, by comparing the database with the public data of the two companies, the researchers found that at least to some extent, they originated from them. In the blog post, the researchers specifically described the wording of PDL:

The data found on the open elastic search server almost exactly matches the data returned by the people data labs API. The only difference is that the data returned by PDL also contains education history.

There is no educational information in any data downloaded from the server. Everything else is exactly the same, including accounts with multiple email addresses and multiple phone numbers.

However, Sean Thorne, co-founder of PDL, denied that the company owned the server and said that the owner of the server may have used an expansion product provided by PDL, as well as other data expansion or licensing services.

On the other hand, 4 TB of user data (including 380 million profiles) was confirmed to be from oxydata, but the company also responded that it did not own the server.

So far, researchers are not sure who made the server public on the Internet, but information disclosure means that it will affect the common customers of the two companies and make them face the risk of data abuse.

It's not the first time that except for this incident, the elastic search server has been exposed to the public many times, which also puts the personal data of unsuspecting users and enterprises at risk:

Earlier this year, personal information about more than 20 million Russian citizens was made public on the Elasticsearch server.

In May this year, personal and payment card data with millions of Canadians' CVV codes were exposed again after the online leak of elastic search database owned by freedom mobile.

In December, another database containing personal information about 82 million Americans was revealed online.

The data leakage events related to elastic search server occur frequently, which also attracts the attention of a large number of attackers, because this may be the starting point of their attacks.

Jason Kent, a hacker at cequence security, commented, "we see a new and potentially dangerous data association that is different from the past. If the attacker has a rich data set, he can make a highly targeted attack. This kind of attack can lead to the exposure of password recovery information, financial data, communication mode, social structure, etc., which is a targeted attack way for high-level incumbents.

FBI has yet to respond

The findings were reported to the FBI by two researchers, although the elastic search server typically takes data offline within hours. However, the latter did not give a clear response after receiving the message.

Randy Koch, chief executive of arm insight, analyzes that the massive data disclosure event will cause great damage to those enterprises that are regarded as holding data ownership, and also cause billions of people's information to leak out to all over the world.

The personal data contained is so huge, and the identification of data owners is very complex, so it may cause the problem of the validity of our current privacy and data disclosure notice laws.

If a company with data control collects and synthesizes its user information, it can effectively prevent this event, because the process of data synthesis can not only imitate the real data, but also eliminate the user's recognizable characteristics.

After the correct synthesis, it can not be reverse engineered by hackers, and at the same time retains all the statistical value of the original data set, so it can still be used for analysis, marketing, customer segmentation, AI algorithm training and so on.

However, centralization of data will offset the reputation of the company as a data master, and it is also risky in terms of privacy and compliance.