My job as a data scientist is to analyze data in order to solve problems. And, as the world digitizes—not only with mobile devices, but also with sensors and other IoT (Internet of Things) technologies such as wearables—the amount of data available grows exponentially.
According to International Data Corporation, the global datasphere will be 163 zettabytes by 2025, up from 16.1 zettabytes in 2016, and the average connected person will interact with connected devices nearly 4,800 times per day.
With so much data being generated and collected, data privacy is becoming increasingly important. Data Privacy Day, established by The Council of Europe in 2007 as Data Protection Day on January 28th each year, is a good time to reflect on what this means.
The Importance of Data
Data is typically gathered in one of three ways:
Observations—Scientists, analysts, and even marketers observe and record customer behavior. Inferences—Based on a user's search history, purchases, or social media activity, data can be inferred. Volunteered—People volunteer to provide data to organizations via surveys and forms. According to International Data Corporation, the global datasphere will be 163 zettabytes by 2025, up from 16.1 zettabytes in 2016, and the average connected person will interact with connected devices nearly 4,800 times per day.
With so much data being generated and collected, data privacy is becoming increasingly important. Data Privacy Day, established by The Council of Europe in 2007 as Data Protection Day on January 28th each year, is a good time to reflect on what this means.
The Importance of Data
Data is typically gathered in one of three ways:
Once data has been compiled, it can be used to solve problems and provide answers. Because data science relies on data that is relevant to the problem or question being addressed, personally identifiable information (PII) is not always required. The most important thing is that the data is representative of the problem you are attempting to solve. To avoid errors or biases in artificial intelligence and machine learning environments, a data scientist must be able to recognize when to exclude data.
The Important of Data Privacy
Typically, data is stored on local servers or in the cloud. It is a company's ethical and legal responsibility to protect its customers' privacy. Many times, that responsibility will fall under the purview of the data engineer or database administrator.
One method of ensuring data privacy is to anonymize data by removing or encrypting direct identifiers to individuals, such as a person's full name, address, email, personal identification number, physical description, or biometric information—the PII—and preventing the ability to reidentify them.
According to McKinsey, effective regulation of data anonymization is an opportunity because it reduces risks to individuals and organizations while increasing data availability for analysis.
Data protection laws vary by country, but common practices include: having a data loss prevention and data discovery strategy; frequent backups; built-in safeguards such as replication, firewalls, encryption, authorization, and authentication; and erasure and recovery strategies.
The General Data Protection Regulation (GDPR) of the European Union is arguably the most comprehensive. The EU Charter of Fundamental Rights states that EU citizens have the right to personal data protection, and under the GDPR, 1,031 fines totaling €1.581 billion were issued in the fiscal year ending March 2022.
It should be noted that these fines were not levied against cyber criminals, but well-known corporations for breaches of the rules such as insufficient legal basis for data processing; non-compliance with general data processing principles; and insufficient technical and organizational measures to ensure information security.
Protecting your online data
There are numerous steps you can take to secure your data. • Use strong passwords that are at least 11 characters long and contain a mix of upper- and lower-case letters, symbols, and numbers—it would take a cyber-criminal at least 400 years to crack a password that met these conditions. Longer passwords make it even more difficult.
The Important of Data Privacy
Typically, data is stored on local servers or in the cloud. It is a company's ethical and legal responsibility to protect its customers' privacy. Many times, that responsibility will fall under the purview of the data engineer or database administrator.
One method of ensuring data privacy is to anonymize data by removing or encrypting direct identifiers to individuals, such as a person's full name, address, email, personal identification number, physical description, or biometric information—the PII—and preventing the ability to reidentify them.
According to McKinsey, effective regulation of data anonymization is an opportunity because it reduces risks to individuals and organizations while increasing data availability for analysis.
Data protection laws vary by country, but common practices include: having a data loss prevention and data discovery strategy; frequent backups; built-in safeguards such as replication, firewalls, encryption, authorization, and authentication; and erasure and recovery strategies.
The General Data Protection Regulation (GDPR) of the European Union is arguably the most comprehensive. The EU Charter of Fundamental Rights states that EU citizens have the right to personal data protection, and under the GDPR, 1,031 fines totaling €1.581 billion were issued in the fiscal year ending March 2022.
It should be noted that these fines were not levied against cyber criminals, but well-known corporations for breaches of the rules such as insufficient legal basis for data processing; non-compliance with general data processing principles; and insufficient technical and organizational measures to ensure information security.
Protecting your online data
There are numerous steps you can take to secure your data. • Use strong passwords that are at least 11 characters long and contain a mix of upper- and lower-case letters, symbols, and numbers—it would take a cyber-criminal at least 400 years to crack a password that met these conditions. Longer passwords make it even more difficult.
Cisco's National Cybersecurity Officer in Singapore, Josh McCloud, has some excellent cybersecurity advice available online. Alternatively, you could delve deeper into the subject by enrolling in Cisco Networking Academy's free Introduction to Cybersecurity course, which is designed to make cybersecurity awareness accessible to all.
If, like me, you are curious about the world around you and enjoy problem solving, all of the data being collected represents a huge opportunity to improve communities and organizations all over the world.
The science of data
Introduction to Data Science is a Cisco Networking Academy primer course that I and a team of learning scientists created to help anyone get their feet wet in the data science field. On our mobile-first' Skills for All learning platform, you can learn about data science at a high level in an intuitive and interactive manner for free.
In the entertainment industry, data science is responsible for classification algorithms that assist viewers in finding videos that they enjoy. The algorithms serve up recommendations based on their profile, including what videos they've watched and what other customers with similar tastes have watched.
Your smartphone's fitness app, or fitness tracker, collects data that is fed into an application that can provide you with valuable health information. These apps must build a model of your movements to identify what constitutes taking a step and the distance you cover with each one in order to calculate how many steps you take per day or the distance you walk. Some fitness trackers even employ self-learning artificial intelligence (AI) software capable of recognizing and adapting to a wide range of movements and learning new fitness activities based on repetitive, cyclical patterns.
In agriculture, farmers use cellphones to send images of plant diseases to researchers. These images are used in image recognition systems to diagnose diseases, and algorithms are then used to predict future outbreaks when combined with environmental data regression.
In medicine, researchers created a machine learning model that uses probability to classify breast cancer images from medical histopathology. This method could eventually detect cancer subtypes and classify benign and malignant tissue.
Data science is a powerful tool for good, and these are only a few examples of how it can be used. On the surface, the cost of data privacy may appear to be an impediment to the potential advances brought about by data science. Data privacy, on the other hand, grants data scientists social license to use that tool responsibly. Everyone benefits.