or Compare and contrast data through various stages in the past and present. or Explain the evolution of data ecosystem. Ans: Initial Stage: In the initial stages of data science, the kind of data that were used for academic purposes or business needs was small, structural, and static. This kind of data was easy to put in the form of rows and columns and displayed via spreadsheets. This type of data was very helpful and easy to handle for statisticians. Traditional tools such as descriptive statistics, predictive modelling, and classifications were used to handle this type of data. Second Stage: Data continued to evolve from small, structural, and basic to large, unstructured and in motion. This change in data behaviour led to the need to develop skills to handle sensor-based data, IoT data, machine learning skills, and concepts like support vector machines. Third Stage: The kind of data that we see today is massive, integrated, and dynamic. In this stage, we deal with a system of data dealing with another system of data. To handle such behaviour, one must learn concepts of machine learning and deep networks. 2. What is Data? Data is a collection of raw, unorganised facts and details like text, observations, figures, symbols and descriptions of things etc. Data does not carry any specific purpose and has no significance by itself. Data is measured in terms of bits and bytes – which are basic units of information in the context of computer storage and processing. 3. List some examples of data. The number of visitors to a website in one month. Inventory levels in a warehouse on a specific date. Individual satisfaction scores on a customer service survey. The price of a competitors’ product. 4. What is Information? Information is processed, organised and structured data. It provides context for data and enables decision making. For example, a single customer’s sale at a restaurant is data – this becomes information when the business is able to identify the most popular or least popular dish. 5. List some examples of Information. Analysing and listing the changes done in a website that has led to an increase or decrease in monthly site visitors. Identifying supply chain issues based on trends in warehouse inventory levels over time. Finding areas for improvement with customer service based on a collection of survey responses. Determining if a competitor is charging more or less for a similar product. 6. What is unlabelled data? Any data that does not have any labels specifying its characteristics, identity, classification, or properties can be considered unlabelled data. Example: photos, videos, or text that do not have any category or classification assigned to it 7. What is Labelled Data? Any data which has a characteristic, category, or attributes assigned to it can be referred to as labelled data. Example: the height of a human, price of a product 8. What is data labelling? Data labelling is defined as a process of identifying raw data- like text, pdf, files, images and classifying and adding one or more labels to it to enable machine learning models to learn from it. Labelling helps the machine learning model identify the attributes of the data to analyse and make predictions. 9. State the difference between structured and unstructured data. Structured data is data that fits neatly into data tables and includes discrete data types such as numbers, short text, and dates. o Example : Excel files Unstructured data doesn't fit neatly into a data table because its size or nature. o Example : audio and video files 10. What is retrospective and prospective data? Retrospective data looks back into the past. In retrospective studies, individuals are sampled and information is collected about their past. Prospective data looks forward into the future. In prospective studies, individuals are followed over time and data about them is collected as their characteristics or circumstances change. Example : A retrospective cohort study is a study where researchers comparatively study the historical data of a group of people who have a particular disease and a group of people who do not have that disease. A prospective cohort study, on the other hand, is a type of study where no one in the sample has the disease being measured when the study commences. 11. What is the importance of Ethics in data Science? Or Why do Data Scientists need to follow Ethics? Data scientists have access to a vast pool of data in their data analysis, hence it becomes essential for them to adhere to ethical guidelines of handling data. The use of protective mechanisms and policies to discourage the mishandling and unethical use of data should be made part of best practices. Some of the negative scenarios that may arise if ethical guidelines are disrespected include: a) A few people can do an immense amount of harm: Many organizations have become vulnerable to data breaches. Hackers worldwide are on the lookout to crack through a reputed organization's firewalls and steal important data from their servers. The stolen data are then sold out for a hefty sum. Eg: To date, Yahoo holds the title for the largest data breach in the history of the internet. In 2016, the company disclosed that it had been the victim of multiple data breaches over the years, starting in 2013. The data breaches had exposed the email addresses, names, dates of birth of around three billion people who have used Yahoo. Another example of a data breach is Marriott (Starwood) hotel. In 2018, Marriott’s data team had confirmed that around 383 million accounts of the guests were compromised in the year 2016. The breach had exposed the names, addresses, contact numbers, and passport information of the guests whose accounts were hacked. b) Lack of consent: One of the leading social networking sites experimented wherein without consent they purposely fed the users in their newsfeed highly extreme point of view, particularly provocative part of the news feed because they were trying to elicit a reaction from the users and then to see if that impacted what the users were posting back. The newsfeed was curated with purposeful intention to see if that ultimately impacted the way a user interacted with the rest of the network. Was the consent from the users taken before performing this experiment? The answer to this is “No”. Refer more here : https://www.analyticsvidhya.com/blog/2022/02/ethics-in- data-science-and-proper-privacy-and-usage-of-data/ 12. What is data governance framework? Data governance framework can be defined as a collection of practices and processes that ensure the authorized management of data in an organization. The primary purpose behind implementing data governance by any organization is to achieve better control over its data assets, including the methods, technologies, and behaviours around the proper management of data. It also deals with the security, privacy, integrity, and management of data flowing in and out of the organization. Some of the benefits of implementing a data governance framework are: o Procedures around regulation and compliance activities become exact. o There is greater transparency within data-related activities. o Increase in value of organization’s data. o Better resolution of issues around current data.
Data Analytics for Businesses 2019: Master Data Science with Optimised Marketing Strategies using Data Mining Algorithms (Artificial Intelligence, Machine Learning, Predictive Modelling and more)