Private Mouse and Keyboard Behaviour Dataset
Description: Mouse and keyboard dataset can include sensitive personal data (i.e. login credentials, banking information, or text messages). Differential Privacy [1] allows data scientists to train behaviour models without collecting the raw inputs from users.
Goal: Allow data scientists to train models on mouse and keyboard data that they can’t see using differential privacy. Steps:
1- Deploy a domain node using HAGrid [2].
2- Deploy a network node that collects data from different domain nodes and handles the network requests using PySyft and PyGrid [3].
3- Data owners can upload datasets to domain nodes. Noise is added to data once uploaded via differential privacy.
4- Data scientists can log into the network, get a privacy budget and run machine learning models.
Supervisor: Mayar Elfares and Guanhua Zhang
Distribution: 20% Literature, 60% Implementation, 20% Analysis
Requirements: Good Python skills, good knowledge of operating systems and databases
Literature:
[1] Dwork, Cynthia and Aaron Roth. 2014. The Algorithmic Foundations of Differential Privacy. Foundations and Trends in Theoretical Computer Science. 9(3-4), p.211-407.
[2] HAGRid: https://pypi.org/project/hagrid/
[3] PySyft: https://github.com/OpenMined/PySyft