18IT045_Practical_Work_DS

Binal Kagathara
3 min readNov 18, 2021

--

Kindly perform following tasks for the given dataset.

Dataset: https://www.kaggle.com/gpreda/coronavirus-2019ncov

Task-1:
Dataset Description using Orange tool.
What is need to be done to improve the accuracy of classification result of the given dataset? Get the maximum classification accuracy possible by performing following methods.
→Pre-processing
o Encoding
o Normalization
o Missing value handling
o Feature Selection

Compare your accuracy with and without applying pre-processing steps. Perform the Classification and visualize accuracy before and after preprocessing in Orange/Python.

Task-2:
Generate the Dashboard of preprocessed dataset from task-1.
Find the Maximum data insights by plotting Bar chart, Boxplot, Pie Plot, Stack Plot using PowerBI dashboard visualization.

Following answers need to be submitted in a single PDF file:
1. Provide a screen shot of data description and explain in brief.
2. Provide screen shot(s) of data pre-processing steps showing its significance.
3. Provide a screen shot showing accuracy before and after pre-processing.
4. Provide a screen shot of PowerBI dashboard with description.

load data in orange tool and set the confirmed case as a target variable

work flow of the task is :

Preprocess the data with impute value and discreatazation

Accuracy of before pre processing and after preprocessing is here

Save the data in excel file.

Load the saved data in power bi and create the visualization of data using bar chart ,pie chart, donate chart

--

--

Binal Kagathara

DevOps Engineer| AWS Certified Solution Architect - Associate | IT student