Jivitesh Poojary

Jivitesh Poojary

Philadelphia, Pennsylvania, United States
7K followers 500+ connections

About

As a Lead ML Engineer with 11 years of industry experience, I have a foundational…

Activity

Join now to see all activity

Experience

  • Comcast Graphic

    Comcast

    Philadelphia, Pennsylvania, United States

  • -

    Greater Denver Area

  • -

    Charlotte, North Carolina Area

  • -

    Indianapolis, Indiana Area

  • -

    Bloomington, Indiana Area

  • -

    Mumbai Area, India

  • -

    Mumbai

  • -

    Mumbai Area, India

  • -

    Mumbai Area, India

Education

  • Indiana University Bloomington Graphic

    Indiana University Bloomington

    -

    Founder - IU Data Science Club
    Developer - CyberInfrastructure for Network Science
    Researcher - Cognitive Development Lab
    Mentor - R Users group

  • -

    Editor - Nirmaan (College Magazine)
    Technical team - Technovanza (National technical festival)
    Member - Computer Society of India - VJTI Chapter

  • -

Licenses & Certifications

Volunteer Experience

  • Indiana University Bloomington Graphic

    Student Consultant - Data Science Consulting Club

    Indiana University Bloomington

    - Present 8 years 7 months

    Science and Technology

    - Student consultant at Indiana University Data Science Consulting Club where real world data science problems are tackled for clients from research and industry.

  • Indiana University Bloomington Graphic

    R Bootcamp - Teaching Assistant

    Indiana University Bloomington

    - Present 7 years 11 months

    Science and Technology

    - Helped fellow graduate student in understanding the basics of R programming language and evaluated the Datacamp tutorials to provide feedback to instructors.

  • Indiana University Bloomington Graphic

    IU Science Fest - Event Co-ordinator

    Indiana University Bloomington

    - Present 8 years 6 months

    Education

    - Assisted Prof. Tor Lattimore and Prof. Qin Zhang in organizing the event
    - Helped undergraduate and local high school students learn about AlphaGo and other AI enabled computer games

  • Teach For India Graphic

    Teacher

    Teach For India

    - 1 year 3 months

    Education

    - Teach Maths and Science to students for a competitive exams.
    - Managed a group of 50 students on excursions to museums and science centers

Publications

  • Building Customized Text Mining Tools via Shiny Framework: The Future of Data Visualization

    Conference: The 28th Modern Artificial Intelligence and Cognitive Science Conference, At Fort Wayne, Indiana

    With the increasing volume of data, there is a growing need for dynamic data visualization to help reveal instant changes in data patterns. There exist many commercial visualization tools, but traditional scholars are often disengaged from the tool development process; thus, the choice of functionalities is contingent upon tool developers whose choice may not fit the end-users. This collaboration, however, has a potential in bridging the gap between traditional scholars, who are more interested…

    With the increasing volume of data, there is a growing need for dynamic data visualization to help reveal instant changes in data patterns. There exist many commercial visualization tools, but traditional scholars are often disengaged from the tool development process; thus, the choice of functionalities is contingent upon tool developers whose choice may not fit the end-users. This collaboration, however, has a potential in bridging the gap between traditional scholars, who are more interested in sense-making of the text than in the tools, and the data scientists, who are more interested in the tools than in the substance, but must still contextualise the outcomes. Until recently, this collaborative process was hindered by the complexity of customisation procedures and technological hurdles imposed on users with new installations. With the advent of reactive web frameworks, such as Shiny, the user-driven customisation becomes not only feasible, but also essential to advance scientific research. In this paper, we demonstrate a collaborative effort between learned scholars and tool developers, allowing for a computational and humanistic fusion.

    Other authors
    See publication

Courses

  • Advanced Database systems

    IT0304

  • Applied Machine Learning

    CSCI - B659

  • Applied Machine Learning for Signal Processing

    ENGR - E599

  • Applied Machine Learning in Cognitive Science

    PSY - P657

  • Applied Mathematics

    MA0002

  • Applying Machine Learning Techniques in Computational Linguistics

    CSCI - B659

  • Automata Theory

    IT0303

  • Categorical Data Analysis

    STAT - S637

  • Cloud Computing

    IT0405

  • Data Mining

    CSCI - B565

  • Data Structures and Algorithms

    IT0204

  • Database Application

    IT0208

  • Deep Learning Systems

    ENGR - E533

  • Discrete Structures

    IT0201

  • Distributed System

    IT0403

  • Exploratory Data Analysis

    STAT - S670

  • Information Visualizations

    ILS - Z637

  • Introduction to Data Mining

    IT0404

  • Parellel Computing

    IT0401

  • Research Project

    IT3401

  • Search

    ILS - Z534

  • Statistical Analysis for Effective Decision-Making

    SPEA - V506

  • Statistical Learning Theory

    STAT - S782

Projects

  • Winner - 2017 Indiana Medicaid Challenge

    Our team 'Random Variables' won the Indiana Medicaid Data Challenge. Our solution focused on identifying regions in the State of Indiana where mental health care facilities could be improved.

    We were awarded a certificate, an opportunity to present our analysis at the Midwest IT conference and a cash prize of $1000. The presentation was given to decision makers from - challenge organizers Regenstrief Institute, Inc., KSM Consulting, Indiana HIMSS Chapter, Indiana Family and Social…

    Our team 'Random Variables' won the Indiana Medicaid Data Challenge. Our solution focused on identifying regions in the State of Indiana where mental health care facilities could be improved.

    We were awarded a certificate, an opportunity to present our analysis at the Midwest IT conference and a cash prize of $1000. The presentation was given to decision makers from - challenge organizers Regenstrief Institute, Inc., KSM Consulting, Indiana HIMSS Chapter, Indiana Family and Social Services Administration and Indiana MPH.

    https://public.tableau.com/profile/jivitesh.poojary1464#!/vizhome/INMedicaidChallenge-MentalHealth/ProjectOverview?publish=yes

    https://prezi.com/view/w8lmPrFwuUAa4oclYSyI/

    Other creators
    See project
  • CNS-Shiny-Tools

    - Present

    We aim to build a repository of Shiny apps where the user can create quick visualizations by uploading data. Some of the visualization types include:
    - Sankey flow diagram
    - Stream graph

    Other creators
    See project
  • DrivenData - Warm Up: Predict Blood Donations

    - The dataset is from a mobile blood donation vehicle in Taiwan. The Blood Transfusion Service Center drives to different universities and collects blood as part of a blood drive. We want to predict whether or not a donor will give blood the next time the vehicle comes to campus.
    - Using exploratory data analysis some of the data unnecessary attributes were removed. This step was performed in R.
    - The class probabilities were predicted using the predict_proba feature in decision tree and…

    - The dataset is from a mobile blood donation vehicle in Taiwan. The Blood Transfusion Service Center drives to different universities and collects blood as part of a blood drive. We want to predict whether or not a donor will give blood the next time the vehicle comes to campus.
    - Using exploratory data analysis some of the data unnecessary attributes were removed. This step was performed in R.
    - The class probabilities were predicted using the predict_proba feature in decision tree and random forrest classifiers of the scikit package.
    - Data is courtesy of Yeh, I-Cheng via the UCI Machine Learning repository

    See project
  • Kaggle - March Machine Learning Mania 2017

    -

    - The objective of the competition was to forecast outcomes of all possible match-ups in the 2017 tournament.
    - Primarily used five out of the eight available files, the files were joined and a new variable was created to obtain the difference between the seed rank for a given year.
    - Tried two techniques - a logistic regression model and Elo benchmark
    - Was able to reach top 25% of the participants

    See project
  • Kaggle - Road Accidents Data Great Britain 1979-2015

    -

    The data primarily captures road accidents in UK during 2015 and has 70 features/columns and about 250K rows. Data has been fetched from Open Data Platform UK and is being shared under Open Government Licence. For more details refer to Open Data UK.
    Link - https://www.kaggle.com/akshay4/road-accidents-incidence

    - Our objective was to explore the data and predict the severity of the road accidents.
    - This being a real data, a significant proportion of time was spent in cleaning the…

    The data primarily captures road accidents in UK during 2015 and has 70 features/columns and about 250K rows. Data has been fetched from Open Data Platform UK and is being shared under Open Government Licence. For more details refer to Open Data UK.
    Link - https://www.kaggle.com/akshay4/road-accidents-incidence

    - Our objective was to explore the data and predict the severity of the road accidents.
    - This being a real data, a significant proportion of time was spent in cleaning the data
    - Because of the nature of the response variable, logistic regression was the model of choice.

    Other creators
  • Places and Spaces

    -

    The Places & Spaces: Mapping Science exhibit introduce science mapping techniques to the general public and to experts across disciplines for educational, scientific, and practical purposes. Website - http://scimaps.org/

    - We analysed the Apache command logs provided by the client. data cleaning was performed in python to remove unwanted characters in the URLs and also to remove URLs which seems to be accessed by a crawler or a bot.
    - Then data was aggregated in XML, JSON and CSV…

    The Places & Spaces: Mapping Science exhibit introduce science mapping techniques to the general public and to experts across disciplines for educational, scientific, and practical purposes. Website - http://scimaps.org/

    - We analysed the Apache command logs provided by the client. data cleaning was performed in python to remove unwanted characters in the URLs and also to remove URLs which seems to be accessed by a crawler or a bot.
    - Then data was aggregated in XML, JSON and CSV formats to integrate with the base visualizations.
    - The sankey flow diagram was implemented in R Shiny using NetworkD3, the word cloud was implemented in Gephi, the bot activity tracking and tree map were implemented in Tableau.
    - The visualizations helped the clients gain insights regarding improving their website to increase user traffic.

    Other creators
    See project
  • Pattern in U.S. monthly unemployment rate

    -

    - The data obtained from Federal Reserve of St. Louis’ FRED data repository, contains the unadjusted U.S. monthly unemployment rate from January 1948 to January 2017.
    - We analysed the data by breaking this time series down into components (trend, seasonal and oscillation), and finding out what might predict the unemployment rate.
    - After studying different possible causal parameters we concluded that percentage change in Real Growth Domestic Product (RGDP) was inversely related to…

    - The data obtained from Federal Reserve of St. Louis’ FRED data repository, contains the unadjusted U.S. monthly unemployment rate from January 1948 to January 2017.
    - We analysed the data by breaking this time series down into components (trend, seasonal and oscillation), and finding out what might predict the unemployment rate.
    - After studying different possible causal parameters we concluded that percentage change in Real Growth Domestic Product (RGDP) was inversely related to Unemployment rate, and on an average had a time series lag of up to 1.5 years in advance.
    - The data can be obtained here - https://fred.stlouisfed.org/

    Other creators
  • Hurricanes and himmicanes

    -

    - Used R (ggplot2) to visually explore the data of hurricanes in the US, to study if there was a meaningful difference between the distribution of damage caused by hurricanes with female names and the distribution of damage caused by hurricanes with male names
    - We observed that the data had a bias in the naming of hurricanes, and concluded that any observed difference between the damage caused by feminine-named and masculine-named hurricanes, was due to a few outliers.
    - A 2014 paper…

    - Used R (ggplot2) to visually explore the data of hurricanes in the US, to study if there was a meaningful difference between the distribution of damage caused by hurricanes with female names and the distribution of damage caused by hurricanes with male names
    - We observed that the data had a bias in the naming of hurricanes, and concluded that any observed difference between the damage caused by feminine-named and masculine-named hurricanes, was due to a few outliers.
    - A 2014 paper published in PNAS was titled “Female hurricanes are deadlier than male hurricanes.” The paper can be found here - http://www.pnas.org/content/111/24/8782.full

    Other creators
  • Kaggle - Credit Card Fraud Detection

    -

    - The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.
    - The primary objective of the project was to reduce the false negatives at a same time maintaining the accuracy.
    - Our approach was bootstrapped random undersampling of the…

    - The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.
    - The primary objective of the project was to reduce the false negatives at a same time maintaining the accuracy.
    - Our approach was bootstrapped random undersampling of the majority class followed by bagging of these indivisual subsamples.
    - We applied Random Forest Classifier, Adaboost and XGBoost iteratively to obtain the best parameters for each of the models.
    - The implementation was done uisng the scikit machine learning toolkit in Python.
    - The best result we obtained was of zero False Negatives with 60% accuracy for Random Forrest Classifier
    - The accuracy was compared with brute force application of Random Forrest algorithm using Weka

    Other creators
  • Kaggle - House Prices: Advanced Regression Techniques

    -

    - The objective of the project was to accurately predict the house prices in Ames, Iowa as part of the Kaggle competition “House Prices: Advanced Regression Techniques”. The data was provided by Dean De Cock from Truman State University.
    - Feature selection was performed in R using Boruta technique.
    - Different approaches Linear Regression, Logistic Regression, Lasso, Ridge, Adaboost and XGBoost were tried to obtain the best results. This implementation was performed in Python using the…

    - The objective of the project was to accurately predict the house prices in Ames, Iowa as part of the Kaggle competition “House Prices: Advanced Regression Techniques”. The data was provided by Dean De Cock from Truman State University.
    - Feature selection was performed in R using Boruta technique.
    - Different approaches Linear Regression, Logistic Regression, Lasso, Ridge, Adaboost and XGBoost were tried to obtain the best results. This implementation was performed in Python using the Scikit Machine Learning libraries.
    - We obtained the best results using the XGBoost algorithm giving us a rank in the top 20% of the competition.

    Other creators
  • Kaggle - Outbrain Click Prediction

    -

    - The objective of the project was to predict a space-delimited list of ads associated with a display block, ordered by decreasing likelihood of being clicked.
    - Applied maximum likelihood estimation and probability adjustment using Adaptivesmoothing techniques, used Follow the Regression Leader (FTRL – Proximal) algorithm for predicting the probability of ads being clicked.
    - Implementation was done in python on IU High Performance Computing servers.
    - The evaluation metric for the…

    - The objective of the project was to predict a space-delimited list of ads associated with a display block, ordered by decreasing likelihood of being clicked.
    - Applied maximum likelihood estimation and probability adjustment using Adaptivesmoothing techniques, used Follow the Regression Leader (FTRL – Proximal) algorithm for predicting the probability of ads being clicked.
    - Implementation was done in python on IU High Performance Computing servers.
    - The evaluation metric for the project was Mean Average Precision (MAP) with 12 as the number of predicted nodes
    - We obtained the best results using FTRL-Proximal giving us top 15% rank in the competition.

    Other creators
  • Decision making using SAS

    -

    - Real Estate dataset: Analyze the dataset in understanding the impact of certain attributes in prediction of property prices.
    - General Social survey dataset: Performing multivariate regression analysis, getting correlation and variance inflation factorsfor understanding the health index
    - Baseball dataset: Executing bivariate regression analysis for understanding the attendance at a game in relation to win rate,salary, the type of league, etc.
    - Economic survey dataset: Performing…

    - Real Estate dataset: Analyze the dataset in understanding the impact of certain attributes in prediction of property prices.
    - General Social survey dataset: Performing multivariate regression analysis, getting correlation and variance inflation factorsfor understanding the health index
    - Baseball dataset: Executing bivariate regression analysis for understanding the attendance at a game in relation to win rate,salary, the type of league, etc.
    - Economic survey dataset: Performing ANOVA on few attributes and understanding the variation in the data

    Other creators
    • Qiwen Zhu
  • Deloitte - Contract Lifecycle Management (CLM)

    -

    The project was for a leading organization in electronic signature technology and Digital Transaction Management services for facilitating electronic exchanges of contracts and signed documents, where we helped them improve their quote to contract processes using a Apttus CPQ-CLM systems.

    The solution enabled clients the sales, revenue and legal teams to improve their productivity significantly by seamlessly connecting the sales and contract processes into a single, efficient…

    The project was for a leading organization in electronic signature technology and Digital Transaction Management services for facilitating electronic exchanges of contracts and signed documents, where we helped them improve their quote to contract processes using a Apttus CPQ-CLM systems.

    The solution enabled clients the sales, revenue and legal teams to improve their productivity significantly by seamlessly connecting the sales and contract processes into a single, efficient revenue-creation system. The implementation was done on the Force.com platform by leveraging tools like apex, visualforce, test class, Batch processes, data conversion and reports & dashboards.

  • SAP - Salesforce: Data warehousing and data conversion

    -

    Was part of a specialist group for data conversation for one of the biggest semiconductor vendors in the world, worked closely with different technology teams in delivering a robust sales and pricing system

  • Deloitte - Configure, Price and Quote (CPQ)

    -

    The project was for a leading American cloud computing company, where we helped them improve their sales processes using a CPQ system.

    Created a CPQ system for the client using Apttus CPQ packages on Force.com platform resulting in significant improvements in the quoting and pricing processes. The system allowed the client to speed up deal cycles, close more deals, automate sales across channels, analyse real-time performance of sales using interactive dashboards and gain visibility…

    The project was for a leading American cloud computing company, where we helped them improve their sales processes using a CPQ system.

    Created a CPQ system for the client using Apttus CPQ packages on Force.com platform resulting in significant improvements in the quoting and pricing processes. The system allowed the client to speed up deal cycles, close more deals, automate sales across channels, analyse real-time performance of sales using interactive dashboards and gain visibility into changes that could impact revenue targets.

  • Offline English Character Recognition

    -

    - The project was about extracting and recognizing handwritten characters or machine texts from scanned images which could then be used in a wide variety of applications
    - Character recognition was done using a Neural Networks algorithm using two techniques: identity matrices and the other being ASCII codes
    - We made use of different types of neural networks namely: feed-forward back propagation network, Elman back-propagation network and the fitting network. The simulation was done using…

    - The project was about extracting and recognizing handwritten characters or machine texts from scanned images which could then be used in a wide variety of applications
    - Character recognition was done using a Neural Networks algorithm using two techniques: identity matrices and the other being ASCII codes
    - We made use of different types of neural networks namely: feed-forward back propagation network, Elman back-propagation network and the fitting network. The simulation was done using MATLAB

    Other creators
  • IBM - The Great Mind Challenge 2012 - Paperless Hospital Service

    -

    - We developed a website for a hospital to leverage hospital services where patient need not perform any paper-work while getting admitted and treated, by providing a seamless application that would handle the information of thousands of patients and provide efficient healthcare services
    - We were primarily engaged in Server side scripting using JSP (Java server pages). The interface was built using IBM Tools – WASCE for application server, DB2 as database server

    Other creators

Honors & Awards

  • Most Time Saver Award - Comcast Advertising Hackathon

    Comcast Advertising

  • CIO Recognition Award

    Dish Network, CIO

  • Winner - Indiana Medicaid Data Challenge

    HIMSS Indiana Chapter

  • Citadel Chicago Data Open - Finalist

    Citadel | Citadel Securities

  • Certificate of Appreciation

    Government of Maharashtra

    For outstanding work in the Chief Minister’s Fellowship Program, 2015

  • Outstanding Award

    Deloitte

    Driving business critical data engineering and conversion processes areas which helped the team to quickly turn around on the roadblocks.

Test Scores

  • GRE

    Score: 320

  • CMAT

    Score: 99.98 %

    All India Rank - 11

  • CAT

    Score: 99.62 %

  • XAT

    Score: 99.72 %

Languages

  • English

    Native or bilingual proficiency

  • Hindi

    Native or bilingual proficiency

  • Spanish

    Limited working proficiency

  • French

    Limited working proficiency

Organizations

  • IEEE Computer Society

    Active Member

    - Present
  • American Statistical Association

    Active Member

    - Present
  • IU Data Science Club

    Co-Founder

    -

    - Founded the club with a group of graduate students in the Data Science Program at Indiana University Bloomington. - Served as the Director of Professional Development of from March 2017 - February 2018. - Currently mentoring the next batch of club leadership

More activity by Jivitesh

View Jivitesh’s full profile

  • See who you know in common
  • Get introduced
  • Contact Jivitesh directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More