top of page

Data Analyst Projects

Predictive Modeling for Park Visitation using Real World Weather Data

  • Conducted initial data cleaning and exploratory data analysis (EDA) to uncover key dataset characteristics, leveraging advanced feature engineering techniques including mutual information regression, SHAP, and decision tree regression to refine critical predictors.

  • Developed and optimized a Random Forest model focusing on the top 10 influential features, achieving a robust R-squared score of 83.28%, demonstrating strong predictive accuracy in forecasting park visitations based on weather conditions

Multiclass Classification using MultiLayer Perceptron for MNIST Dataset

  • Implemented MLP model using Python with batch normalization, SGD optimizer, and L2 regularization, yielding a 97.33 % accuracy. The best parameters for the MLP model with two hidden layers were: 128 neurons in both hidden layers, ReLU activation, learning rate as 0.09, regularization parameter as 0.1, and 35 epochs.

AtliQ Hardware Sales Insights

  • Orchestrated AtliQ Hardware’s sales analysis, uncovering Rs.985M revenue, 2.5% profit margin over four years, and a 52.8% revenue contribution from Delhi NCR.  

  • Enhanced data integrity with MySQL and Tableau, streamlining ETL processes and standardizing currency conversion.  

  • Conducted detailed customer analysis, identifying Electricalsara Stores as a top contributor with Rs.413M revenue, emphasizing data-driven decision-making and operational efficiency improvements

Text Classification using Logistic Regression and Naive Bayes (NLP)

  • Developed Nave Bayes K-Fold Cross-Validation algorithms from scratch to classify IMDB Reviews and 20 News Group datasets. For IMDb dataset Logistic regression was the best performer with an accuracy of 86.95%. Conversely, for 20 newsgroups, Naive Bayes outperformed LR with an accuracy of 69.03%

bottom of page