Abhijith Asok

Bothell, Washington, United States
10K followers 500+ connections

View mutual connections with Abhijith

Welcome back

Email or phone

Password

Forgot password?

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

Join to view profile

Microsoft

Harvard University

Personal Website

About

Data scientist with experience across the corporate, social and research arms of data…

Activity

In this piece, myself and Debashis Bandyopadhyay discuss how the traditionally US dominated global public health research landscape is evolving due…

In this piece, myself and Debashis Bandyopadhyay discuss how the traditionally US dominated global public health research landscape is evolving due…

Liked by Abhijith Asok
Concepts I would master if I were interviewing for an Applied Scientist position in 2025: 1. 𝐑𝐀𝐆 Most popular enterprise use case of Generative…

Concepts I would master if I were interviewing for an Applied Scientist position in 2025: 1. 𝐑𝐀𝐆 Most popular enterprise use case of Generative…

Liked by Abhijith Asok
Ever ran into numerical errors while doing matrix computations? Recently, I was using Mahalanobis distance, to determine how similar a sentence is…

Ever ran into numerical errors while doing matrix computations? Recently, I was using Mahalanobis distance, to determine how similar a sentence is…

Liked by Abhijith Asok

Join now to see all activity

Experience

Microsoft

Greater Seattle Area
-

Salem, Tamil Nadu, India
-

Greater Boston Area
-

New York City
-
-

Remote
-

New Delhi Area, India
-

Pune, Maharashtra, India
-

Pune, Maharashtra, India
-

Bengaluru Area, India
-

Goa, India
-

Goa
-

Harvard University,Boston, Massachussets
-

Hyderabad Area, India
-

Dubai, UAE
-

London, England
-

Bengaluru Area, India
-

BITS Pilani K K Birla Goa Campus
-
-

Goa
-

Goa
-

Thiruvananthapuram Area, India

Education

Harvard University

2017 - 2019
2010 - 2015

Activities and Societies: Department of Journalism and Media Affairs, Mime Club, SPREE
2013 - 2013

Activities and Societies: I was a Founding Team Member of 'The Graduate Consultancy'(Now GradMinds) during this time, in London as a global collaboration of four of us from India, United States, United Kingdom and Azerbaijan.
1997 - 2010

Activities and Societies: Singing, Dancing, Writing

Licenses & Certifications

Convolutional Neural Networks

Coursera

Issued Jan 2018

Credential ID M8X2B3HQTMV2

See credential
Convolutional Neural Networks

Coursera

Issued Jan 2018

Credential ID M8X2B3HQTMV2

See credential
Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization

Coursera

Issued Jan 2018

Credential ID V27Z33VCMC4P

See credential
Structuring Machine Learning Projects

Coursera

Issued Jan 2018

Credential ID HQ4UJCZRSH9W

See credential
Neural Networks and Deep Learning

Coursera

Issued Dec 2017

Credential ID BGY8M5PGPRPV

See credential
Kaggle R Tutorial on Machine Learning

DataCamp

Issued Jun 2015

Credential ID 3e8ece6dc71ab2140dd25dc5743524dfbd101938

See credential
Lean Six Sigma Green Belt

KPMG

Issued Jan 2015
Kaggle Python Tutorial on Machine Learning

DataCamp

Credential ID 699ca7a30169c1ee14217d4d8495f6a53788f6a5

See credential

Volunteer Experience

Head Volunteer - Data Analysis

Safecity

Aug 2015 - Aug 2017 2 years 1 month

Civil Rights and Social Action

Publications

FAVOR: functional annotation of variants online resource and annotator for variation across the human genome

Nucleic Acids Research November 9, 2022

FAVOR provides a comprehensive multi-faceted variant functional annotation online portal that summarizes and visualizes findings of all possible nine billion single nucleotide variants (SNVs) across the genome. It allows for rapid variant-, gene- and region-level queries of variant functional annotations.

See publication
Women’s strategies addressing sexual harassment and assault on public buses: an analysis of crowdsourced data

Crime Prevention and Community Safety : An International Journal ( Springer ) September 7, 2017
This paper uses crowdsourced data on women’s self-reports of harassment and assault on public buses in India. The data provide a basis to identify the strategies that women use to respond to and manage this everyday threat. The study examines 137 accounts of assault collected by a crowdsourced platform in which women detail, keeping silent (n = 27), fleeing (n = 38), or resisting (n = 72) such an assault. Findings show that confronting incidents in the moment by “making a scene” and “engaging…

This paper uses crowdsourced data on women’s self-reports of harassment and assault on public buses in India. The data provide a basis to identify the strategies that women use to respond to and manage this everyday threat. The study examines 137 accounts of assault collected by a crowdsourced platform in which women detail, keeping silent (n = 27), fleeing (n = 38), or resisting (n = 72) such an assault. Findings show that confronting incidents in the moment by “making a scene” and “engaging the crowd” works well in the closed, shared-space setting of a crowded public bus. The study concludes by asserting crowdmapping as a multi-faceted tool: it can allow women to be aware of potentially dangerous locales, empowers them to report incidents to help keep others safe, and provides a source of data to advise on best practices for navigating street harassment and assault in public buses.

Other authors
See publication
Generalized Approach to Linear Data Transformation

Proceedings of the IEEE International Conference on Data Science and Engineering August 23, 2016

This paper presents a generalized approach for the simple linear data transformation, Y=bX, through an integration of multidimensional coordinate geometry, vector space theory and polygonal geometry. The scaling is performed by adding an additional ‘Dummy Dimension’ to the n-dimensional data, which helps plot two dimensional component-wise straight lines on pairs of dimensions. The end result is a set of scaled extensions of observations in any of the 2n spatial divisions, where n is the total…

This paper presents a generalized approach for the simple linear data transformation, Y=bX, through an integration of multidimensional coordinate geometry, vector space theory and polygonal geometry. The scaling is performed by adding an additional ‘Dummy Dimension’ to the n-dimensional data, which helps plot two dimensional component-wise straight lines on pairs of dimensions. The end result is a set of scaled extensions of observations in any of the 2n spatial divisions, where n is the total number of applicable dimensions/dataset variables, created by shifting the hyperplane in n dimensions along the ‘Dummy Axis’. The derived scaling factor was found to be dependent on the coordinates of the common point of origin for diverging straight lines and the plane of extension, chosen on and perpendicular to the ‘Dummy Axis’, respectively. This result indicates the geometrical interpretation of a linear data transformation and hence, opportunities for a more informed choice of the factor ‘b’, based on a better choice of these coordinate values.

See publication
Public transport — another hotspot for sexual harassment

YourStory August 9, 2016

According to the analysis carried out by Safecity to identify the reports pertaining to public transportation spaces, an alarming one-fifth of all the data collected are incidents that happen in a public transportation space of some kind. Although this data contained reports from multiple countries, barring a negligible percentage, the data entirely contains incidents collected from various parts of India. The team went ahead to split up the incidents on the basis of common modes of transport…

According to the analysis carried out by Safecity to identify the reports pertaining to public transportation spaces, an alarming one-fifth of all the data collected are incidents that happen in a public transportation space of some kind. Although this data contained reports from multiple countries, barring a negligible percentage, the data entirely contains incidents collected from various parts of India. The team went ahead to split up the incidents on the basis of common modes of transport and some general terms related to transport using keyword separations.

See publication

Patents

Composite Risk Score for Cloud Software Deployments

Filed May 22, 2024 18/671736
Intelligent Table Suggestion and Conversion for Text

Filed November 11, 2021 17/524646
Computing System for Determining Quality of Virtual Machine Telemetry Data

Filed October 7, 2020 17/064685

Courses

Algebra

-
Applied Linear Algebra and Big Data

APMTH 120
Applied Machine Learning

BST 263
Basics of Statistical Inference

BST 222
Computer Programming

-
Computing for Big Data

BST 262
Data Science 2 (Neural Networks and Deep Learning)

BST 261
Data Structures and Algorithms

CS 124
Epidemiology Methods

-
Graphs and Networks

-
Introduction to Data Science

BST 260
Introduction to Social and Biological Networks

BST 267
Operations Research

-
Optimisation

-
Probability and Statistics

-
Real Analysis

-
Reproducible Data Science

BST 270

Projects

Predicting ICD-9 codes from doctor's discharge notes

Apr 2018 - May 2018

The project took in doctor's discharge notes in text form and tried to predict the ICD-9 diagnosis codes from them. Word embeddings were created using the novel Fasttext model and the modelling was done using a Recurrent Neural Network architecture, primarily composed of GRU units. The top 5 ICD-9 codes and their text documents were chosen due to time and resource constraints, but it could be extended without much additional effort.

See project
Predicting GPS location from Wi-Fi data

Jan 2018 - May 2018

The project aimed to look at the Wi-Fi data collected from smartphones(Wi-Fi ID, signal strength, accuracy etc.) and use those as predictors to predict the GPS location of an individual. In the realm of digital phenotyping, data from smartphones can be used to predict a person's mental state in advance, to enable faster response systems to anxiety attacks, depression etc. However, since grabbing the GPS data is more challenging than other kinds of data, it is usually the case that there are far…

The project aimed to look at the Wi-Fi data collected from smartphones(Wi-Fi ID, signal strength, accuracy etc.) and use those as predictors to predict the GPS location of an individual. In the realm of digital phenotyping, data from smartphones can be used to predict a person's mental state in advance, to enable faster response systems to anxiety attacks, depression etc. However, since grabbing the GPS data is more challenging than other kinds of data, it is usually the case that there are far less data points for GPS coordinates compared to the other kinds of data. Therefore, we created a basic model using artificial neural networks, that took in WI-Fi data and churned out GPS coordinates. The project was put on hold after initial modelling as we await more data.
Predicting virality of Mashable articles

Oct 2017 - Dec 2017

Media companies like Mashable produce tens of thousands of articles per year, all with varying degrees of virality. The virality of the content produced is key to a media company’s profitability. An accurate model that could predict parameters that increase the virality of an article, specifically, the number of social shares it receives, would be extremely valuable.

We started with a base dataset containing meta-data of nearly 40,000 unique Mashable blog articles over the past 5 years…

Media companies like Mashable produce tens of thousands of articles per year, all with varying degrees of virality. The virality of the content produced is key to a media company’s profitability. An accurate model that could predict parameters that increase the virality of an article, specifically, the number of social shares it receives, would be extremely valuable.

We started with a base dataset containing meta-data of nearly 40,000 unique Mashable blog articles over the past 5 years. The meta-data includes 61 different attributes ranging from metrics like word counts to sentiment analysis. This dataset is hosted on the Machine Learning Repository from the Center for Machine Learning and Intelligent Systems at the University of California Irvine.

In addition to the meta-data, we were also interested in the actual article data, so we used a webscraper (using ‘rvest’) to collect the actual article titles, date published, author and article content. We joined these values into the base dataset.

The problem was attempted in the beginning using advanced regression but was soon shifted to tree-based models with the final model being an extreme gradient boosting model.

While we expected our task of predicting the number of shares for an article to be challenging, we now realize it is even more challenging than initially thought. For example, two articles discuss very popular tech products, Xbox and iPhone and have similar values for key metrics identified in the Variable Importance table above. However, they have drastically different number of shares, 900 vs 197,000.

That said, we learned that it is possible to create a good model that utilizes a number of predictors to determine the number of shares a given article.

The final boosted model had an MAE of 2724.32. On average, our model is able to predict the shares of mashable articles with a maximum positive/negative difference of just over 2,700 shares.

See project

Honors & Awards

Karuna Majumdar Fellowship

Dr. Hasi Majumdar Venkatachalam MD

Oct 2017

The Fellowship fund was created by Dr. Hasi Majumdar Venkatachalam MD as a monetary assistance to students from India.
Winner, Sales Forecasting Challenge, ZS India

ZS Associates

Mar 2016

I was the solo winner across ZS India, in a team challenge of ~50 registered teams, to build models to forecast the sales for 183 different pharmaceutical products. The winning solution used an enhanced variant of ARIMA.
Winner, Insurance Premium Prediction Challenge, ZS Pune

ZS Associates

Dec 2015

I was the solo winner at the Pune office for a national company-wide team prediction challenge to predict Medical Insurance Premiums, based on a wide variety of variables of multiple data types. The winning solution utilized generalized non-linear modelling, coupled with smoothing splines.
Finalist

Atlantic Council / StartupDosti

Apr 2014

Finalist at the Indo-Pak Business Plan competition. Finale from April 24-30,2014 at Thailand.
Finalist

Global Cooperative Challenge / SENStation

Feb 2014

Finalist at the GCC Business Plan Competition, organised by SENStation.
Best Intern

MusicPerk

Feb 2013

Test Scores

GRE

Score: 329

Jul 2016

Quant : 170 / 170 (97th percentile)
Verbal : 159 / 170 (82nd percentile)
AWA : 4.5 / 6 (82nd percentile)
TOEFL

Score: 117

Jul 2016

Reading : 30 / 30
Listening : 30 / 30
Speaking : 27 / 30
Writing : 30 / 30

Languages

Malayalam

Native or bilingual proficiency
English

Full professional proficiency
Hindi

Professional working proficiency
Tamil

Elementary proficiency

Recommendations received

5 people have recommended Abhijith

Join now to view

More activity by Abhijith

As many of you know, athletics has been a part of my life since I was a boy. I'm passionate about living my faith and fitness brand and I often get…

As many of you know, athletics has been a part of my life since I was a boy. I'm passionate about living my faith and fitness brand and I often get…

Liked by Abhijith Asok
Last month, I wrapped up an incredible chapter in Saarbrücken, Germany, at the INM-Leibniz Institute for New Materials, where I had the opportunity…

Last month, I wrapped up an incredible chapter in Saarbrücken, Germany, at the INM-Leibniz Institute for New Materials, where I had the opportunity…

Liked by Abhijith Asok

View Abhijith’s full profile

See who you know in common
Get introduced
Contact Abhijith directly

Join to view full profile

Other similar profiles

Explore more posts

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Abhijith Asok

11 others named Abhijith Asok are on LinkedIn

See others named Abhijith Asok

Add new skills with these courses

See all courses

Abhijith Asok

Bothell, Washington, United States 10K followers 500+ connections

About

Activity

In this piece, myself and Debashis Bandyopadhyay discuss how the traditionally US dominated global public health research landscape is evolving due…

Liked by Abhijith Asok

Concepts I would master if I were interviewing for an Applied Scientist position in 2025: 1. 𝐑𝐀𝐆 Most popular enterprise use case of Generative…

Liked by Abhijith Asok

Ever ran into numerical errors while doing matrix computations? Recently, I was using Mahalanobis distance, to determine how similar a sentence is…

Liked by Abhijith Asok

Experience

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

Education

Licenses & Certifications

Lean Six Sigma Green Belt

Volunteer Experience

Head Volunteer - Data Analysis

Publications

Nucleic Acids Research November 9, 2022

Crime Prevention and Community Safety : An International Journal ( Springer ) September 7, 2017

Proceedings of the IEEE International Conference on Data Science and Engineering August 23, 2016

YourStory August 9, 2016

Patents

Composite Risk Score for Cloud Software Deployments

Filed May 22, 2024 18/671736

Intelligent Table Suggestion and Conversion for Text

Filed November 11, 2021 17/524646

Computing System for Determining Quality of Virtual Machine Telemetry Data

Filed October 7, 2020 17/064685

Courses

Algebra

-

Applied Linear Algebra and Big Data

APMTH 120

Applied Machine Learning

BST 263

Basics of Statistical Inference

BST 222

Computer Programming

-

Computing for Big Data

BST 262

Data Science 2 (Neural Networks and Deep Learning)

BST 261

Data Structures and Algorithms

CS 124

Epidemiology Methods

-

Graphs and Networks

-

Introduction to Data Science

BST 260

Introduction to Social and Biological Networks

BST 267

Operations Research

-

Optimisation

-

Probability and Statistics

-

Bothell, Washington, United States
10K followers 500+ connections