Join the Community

20,969
Expert opinions
43,783
Total members
330
New members (last 30 days)
130
New opinions (last 30 days)
28,257
Total comments

Why PySyft is the open-source hero technology the world needs

Be the first to comment

Data is the most valuable resource we have today when it comes to solving our greatest challenges. With the right data, and enough of it, there is no limit to the compelling use cases we can create. Imagine a world where we could stop financial crimes like money laundering, help reduce the number of deaths due to breast cancer, and more accurately track and balance the global import and export of goods between countries. Solving problems like these has the potential to save innumerable lives and revolutionise industries. However, creating the solutions to such problems requires massive amounts of data that is distributed across the globe.

This data is available, but not accessible in a central fashion. It is hidden in the databases of individual hospitals, small bank branches, manufacturing facilities, and in the trenches of other siloed databases across the public and private sectors.

This brings up the concept of centralisation. Could we hypothetically centralise the world’s data or centralise only that which is relevant for a particular use case? The natural follow-up to this question, is, of course, should we? There is some nuance to this debate, but the short answer is no. While incredibly useful if only used for good, one massive database is a huge risk if it becomes accessible to bad actors.

So, if one massive database is too risky, then how can we get the world’s data into the hands of data scientists and machine learning engineers to hasten the development of revolutionary solutions? The answer could lie in an open source library called PySyft.

Our ability to develop models and answer difficult questions is limited because data is distributed across the globe, siloed and made entirely inaccessible by legal contracts and stringent partnership agreements. PySyft pushes for privacy enhancing technologies (PETs) that allow data scientists to compute on information they do not own, without ever receiving a copy of the data, on machines that they do not have full control over. It removes the need to move potentially sensitive data to a remote server allowing data owners to keep their data on their machines while allowing data scientists to derive value and innovate solutions. PySyft is developing the future of data sharing through federated data networks powered by PETs, allowing data scientists to leverage more data than ever.

To conceptualise how PySyft could deliver truly revolutionary results, let’s go back to our breast cancer use case. Currently, top performing machine learning models for breast cancer detection use less than 0.1% of the world’s data. Worldwide there are more than 750 million mammography images taken over a decade. If a data scientist wanted to access even a fraction of a fraction of these images they would have to sign partnership agreements, go through governance reviews, deploy secure data stores, manage access, and much more. From time to monetary costs, this is not scalable and does not give us enough data to work with.

However, with federated data networks, hospitals all over the globe could share their data in a safe and secure manner and allow data scientists and developers to securely compute and develop models that vastly improve our understanding of the disease, its progression and diagnostics, saving lives. Those using the data would have no physical access to the medical datasets, would not be able to store the data on their machines and, instead of going through the process of securing five to ten partnership agreements, they could access a network of hundreds or thousands of hospitals.

PySyft, in my opinion, is creating the future of data sharing through the use of federated data networks. The world has enough data to solve many important and unsolved problems. However, stringent access restrictions in centralising data are preventing advancement. We have the computational power, we have the data—PySyft could give us the necessary access.  

 

External

This content is provided by an external author without editing by Finextra. It expresses the views and opinions of the author.

Join the Community

20,969
Expert opinions
43,783
Total members
330
New members (last 30 days)
130
New opinions (last 30 days)
28,257
Total comments

Trending