0% found this document useful (0 votes)
13 views8 pages

DMBI

dmbi answers

Uploaded by

adippatil456
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
13 views8 pages

DMBI

dmbi answers

Uploaded by

adippatil456
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
novel, potentially useful, and ultimately tandable patterns in data. Boal is to distinguish between unprocessed data ng that may not be obvious but is valuable or he overall process of finding and interpreting patterns om data involves the repeated app! of the Syllabus w.e- academic year 21-22)(M6-103) representations for the data is found. 5. Data Mining ‘An essential process where intelligent methods are applied to extract data patterns. Deciding which model and parameter may be appropriate. 6. Pattern Evaluation To identify the truly interesting patterns representing knowledge based on interesting measures. 7. Knowledge Presentation ‘Visualization and knowledge representation techniques are used to present mined knowledge to users. Visualizations can be in form of graphs, charts or table. Tech-Neo Publications... SACHIN SHAH Venture Bin 4: 45, 45, 71, 72, 73, 75 the data by bin mean We take average of each bin and replace. value by mean value in corresponding bin. : e Bin 1: 13.83, 13.83, 13.83, 13.83, 13.83, 13, © Bin 2: 20.16, 20.16, 20.16, 20.16, 20.16, 20.16 © Bin 3: 30.67, 30.67, 30.67, 30.67, 30.67, 30.67 * Bin 4: 63.50, 63.50, 63.50, 63.50, 63.50, 63.50 Smooth the data by bin median We replace each value in the bin by its corres median value. Each bin contains 6 data values. SO average of two middle valu itas median, ies in corresponding bin Bin 1: 14, 14,14, 14, 14, 14 Bin 2: 20, 20, 20, 20, 20, 20 Bin 3: 27, 27, 27,27, 27,27 Bin 4: i, in 4: 71.50, 71.50, 71.50, 71.50, 71.50, 71.50 Tech-Neo Publications... SACHIN SHAH se alysis by Clustering salary of college (1826)Fig. 2.6.1 : Outlier Ani &. 2.6.3: Develop a model to Tegression. methods. " Data preprocessing is a data mining technique that involves transforming raw data into an understandable format. Real-world inconsistent, lacking in certain bel data is often incomplete, Vviors or trends, and is likely to contain many errors. Data preprocessing is a proven method of resolving such issues. Data preprocessing prepares raw data for further processing Data applications such as customer relationship management Preprocessing is used in database-driven and rule-based applications (like neural networks) ML) Preprocessing is critical to encode the dataset in a form | In Machine Leaming Processes, data | that could be interpreted and parsed by the algorithm. Data goes througi during a series of steps (1) Data Cleaning 2) Data Inte, ration (3) Data Transformation (4) Data Reduction 5)_Data Discretization (6) _Data Sampling (2) Data Cleaning : Data Processes such as filling in missing values or deletis is cleansed through rows with missing data, smoothing the noisy data, or resolving the inconsistencies in the data. Smoothing sld-New Syllabus we academic year 21-22)(M6-103) place, and all the dependencies are logical, (4) Data Reduction : When the volume of hhuge, databases can become slower, costly to a¢o and challenging to properly store. Data reduction nt a reduced representation of the data data warehouse. There are various methods to data. For example, once a subset of relevant attri is chosen for its significance, anything below a level is discarded, Encoding mechanisms can be to reduce the size of data as well. If all original d labelled as lossless. If some data is lost, then a lossy reduction, aims to pres Aggregation can also be usel example, to condense countless. transactions. Single weekly or monthly value. significantly re¢ Data could My ' discretized to replace raw values with interval | This step involves the reduction of a nuniber 6 % of & continuous attribute by dividing the trios ae 3 (6) Data Sampling : Sometimes, due Na Storage OF memory constraints, a dataset is t the number of data objects. (5) Data Discretization : ‘0 complex to be worked with. Sampling & can be used to select and work with just a subf ‘ dataset 4 \\ \ Provided that it has approximately Properties of the original one, Le rech-neo Publications. SACHIN dimensional table itself in a star schema. When the dimensional table contains less | When. dimensional. table ‘number of rows, we can go for Star schema, number of rows with information and space is we can choose snowflake store space. Work best in any data warehouse/ data mart. | Better for small data ward mart. 7 6.4 Factless Fact Table data warehouse factless fact table is a fact that does not have any measures stored in it. This table will s from different dimension tables. The fac are two types of factless fact tables: Event capturing factless fact Coverage table — Describing condition

You might also like