From the course: Protecting Data for Analysis and Machine Learning

Unlock the full course today

Join today to access over 24,600 courses taught by industry experts.

Preprocessing

Preprocessing

- You should avoid using sensitive data like personally identifiable information or PII to perform your analysis or train your machine learning models because you can put your company at risk of potential data exposure. As we mentioned earlier, you should be minimizing the need to collect PII as a best practice, but there will almost always be times where you'll need to either collect the data for your project or you're already been given data from your client that contains PII. In this case, one way to help protect that data and the individuals attached to that data is to use pre-processing techniques such as data anonymization. We'll go through the details of different types of data anonymization later in this course. But for now, let's talk about why this step in the data lifecycle is so important. We already covered the threats and risks related to data breaches, but data analysis and machine learning in particular can inadvertently become a threat to the privacy of your data if…

Contents