Abhijith Asok

Abhijith Asok

Bothell, Washington, United States
10K followers 500+ connections

About

Data scientist with experience across the corporate, social and research arms of data…

Activity

Join now to see all activity

Experience

  • Microsoft Graphic

    Microsoft

    Greater Seattle Area

  • -

    Salem, Tamil Nadu, India

  • -

    Greater Boston Area

  • -

    New York City

  • -

  • -

    Remote

  • -

    New Delhi Area, India

  • -

    Pune, Maharashtra, India

  • -

    Pune, Maharashtra, India

  • -

    Bengaluru Area, India

  • -

    Goa, India

  • -

    Goa

  • -

    Harvard University,Boston, Massachussets

  • -

    Hyderabad Area, India

  • -

    Dubai, UAE

  • -

    London, England

  • -

    Bengaluru Area, India

  • -

    BITS Pilani K K Birla Goa Campus

  • -

  • -

    Goa

  • -

    Goa

  • -

    Thiruvananthapuram Area, India

Education

  • Harvard University Graphic
  • -

    Activities and Societies: Department of Journalism and Media Affairs, Mime Club, SPREE

  • -

    Activities and Societies: I was a Founding Team Member of 'The Graduate Consultancy'(Now GradMinds) during this time, in London as a global collaboration of four of us from India, United States, United Kingdom and Azerbaijan.

  • -

    Activities and Societies: Singing, Dancing, Writing

Licenses & Certifications

Volunteer Experience

  • Safecity Graphic

    Head Volunteer - Data Analysis

    Safecity

    - 2 years 1 month

    Civil Rights and Social Action

Publications

  • FAVOR: functional annotation of variants online resource and annotator for variation across the human genome

    Nucleic Acids Research

    FAVOR provides a comprehensive multi-faceted variant functional annotation online portal that summarizes and visualizes findings of all possible nine billion single nucleotide variants (SNVs) across the genome. It allows for rapid variant-, gene- and region-level queries of variant functional annotations.

    See publication
  • Women’s strategies addressing sexual harassment and assault on public buses: an analysis of crowdsourced data

    Crime Prevention and Community Safety : An International Journal ( Springer )

    This paper uses crowdsourced data on women’s self-reports of harassment and assault on public buses in India. The data provide a basis to identify the strategies that women use to respond to and manage this everyday threat. The study examines 137 accounts of assault collected by a crowdsourced platform in which women detail, keeping silent (n = 27), fleeing (n = 38), or resisting (n = 72) such an assault. Findings show that confronting incidents in the moment by “making a scene” and “engaging…

    This paper uses crowdsourced data on women’s self-reports of harassment and assault on public buses in India. The data provide a basis to identify the strategies that women use to respond to and manage this everyday threat. The study examines 137 accounts of assault collected by a crowdsourced platform in which women detail, keeping silent (n = 27), fleeing (n = 38), or resisting (n = 72) such an assault. Findings show that confronting incidents in the moment by “making a scene” and “engaging the crowd” works well in the closed, shared-space setting of a crowded public bus. The study concludes by asserting crowdmapping as a multi-faceted tool: it can allow women to be aware of potentially dangerous locales, empowers them to report incidents to help keep others safe, and provides a source of data to advise on best practices for navigating street harassment and assault in public buses.

    Other authors
    • Elsa DSilva
    • Suzanne Goodney Lea
    See publication
  • Generalized Approach to Linear Data Transformation

    Proceedings of the IEEE International Conference on Data Science and Engineering

    This paper presents a generalized approach for the simple linear data transformation, Y=bX, through an integration of multidimensional coordinate geometry, vector space theory and polygonal geometry. The scaling is performed by adding an additional ‘Dummy Dimension’ to the n-dimensional data, which helps plot two dimensional component-wise straight lines on pairs of dimensions. The end result is a set of scaled extensions of observations in any of the 2n spatial divisions, where n is the total…

    This paper presents a generalized approach for the simple linear data transformation, Y=bX, through an integration of multidimensional coordinate geometry, vector space theory and polygonal geometry. The scaling is performed by adding an additional ‘Dummy Dimension’ to the n-dimensional data, which helps plot two dimensional component-wise straight lines on pairs of dimensions. The end result is a set of scaled extensions of observations in any of the 2n spatial divisions, where n is the total number of applicable dimensions/dataset variables, created by shifting the hyperplane in n dimensions along the ‘Dummy Axis’. The derived scaling factor was found to be dependent on the coordinates of the common point of origin for diverging straight lines and the plane of extension, chosen on and perpendicular to the ‘Dummy Axis’, respectively. This result indicates the geometrical interpretation of a linear data transformation and hence, opportunities for a more informed choice of the factor ‘b’, based on a better choice of these coordinate values.

    See publication
  • Public transport — another hotspot for sexual harassment

    YourStory

    According to the analysis carried out by Safecity to identify the reports pertaining to public transportation spaces, an alarming one-fifth of all the data collected are incidents that happen in a public transportation space of some kind. Although this data contained reports from multiple countries, barring a negligible percentage, the data entirely contains incidents collected from various parts of India. The team went ahead to split up the incidents on the basis of common modes of transport…

    According to the analysis carried out by Safecity to identify the reports pertaining to public transportation spaces, an alarming one-fifth of all the data collected are incidents that happen in a public transportation space of some kind. Although this data contained reports from multiple countries, barring a negligible percentage, the data entirely contains incidents collected from various parts of India. The team went ahead to split up the incidents on the basis of common modes of transport and some general terms related to transport using keyword separations.

    See publication

Patents

  • Composite Risk Score for Cloud Software Deployments

    Filed 18/671736

  • Intelligent Table Suggestion and Conversion for Text

    Filed 17/524646

  • Computing System for Determining Quality of Virtual Machine Telemetry Data

    Filed 17/064685

Courses

  • Algebra

    -

  • Applied Linear Algebra and Big Data

    APMTH 120

  • Applied Machine Learning

    BST 263

  • Basics of Statistical Inference

    BST 222

  • Computer Programming

    -

  • Computing for Big Data

    BST 262

  • Data Science 2 (Neural Networks and Deep Learning)

    BST 261

  • Data Structures and Algorithms

    CS 124

  • Epidemiology Methods

    -

  • Graphs and Networks

    -

  • Introduction to Data Science

    BST 260

  • Introduction to Social and Biological Networks

    BST 267

  • Operations Research

    -

  • Optimisation

    -

  • Probability and Statistics

    -

  • Real Analysis

    -

  • Reproducible Data Science

    BST 270

Projects

  • Predicting ICD-9 codes from doctor's discharge notes

    -

    The project took in doctor's discharge notes in text form and tried to predict the ICD-9 diagnosis codes from them. Word embeddings were created using the novel Fasttext model and the modelling was done using a Recurrent Neural Network architecture, primarily composed of GRU units. The top 5 ICD-9 codes and their text documents were chosen due to time and resource constraints, but it could be extended without much additional effort.

    See project
  • Predicting GPS location from Wi-Fi data

    -

    The project aimed to look at the Wi-Fi data collected from smartphones(Wi-Fi ID, signal strength, accuracy etc.) and use those as predictors to predict the GPS location of an individual. In the realm of digital phenotyping, data from smartphones can be used to predict a person's mental state in advance, to enable faster response systems to anxiety attacks, depression etc. However, since grabbing the GPS data is more challenging than other kinds of data, it is usually the case that there are far…

    The project aimed to look at the Wi-Fi data collected from smartphones(Wi-Fi ID, signal strength, accuracy etc.) and use those as predictors to predict the GPS location of an individual. In the realm of digital phenotyping, data from smartphones can be used to predict a person's mental state in advance, to enable faster response systems to anxiety attacks, depression etc. However, since grabbing the GPS data is more challenging than other kinds of data, it is usually the case that there are far less data points for GPS coordinates compared to the other kinds of data. Therefore, we created a basic model using artificial neural networks, that took in WI-Fi data and churned out GPS coordinates. The project was put on hold after initial modelling as we await more data.

  • Predicting virality of Mashable articles

    -

    Media companies like Mashable produce tens of thousands of articles per year, all with varying degrees of virality. The virality of the content produced is key to a media company’s profitability. An accurate model that could predict parameters that increase the virality of an article, specifically, the number of social shares it receives, would be extremely valuable.

    We started with a base dataset containing meta-data of nearly 40,000 unique Mashable blog articles over the past 5 years…

    Media companies like Mashable produce tens of thousands of articles per year, all with varying degrees of virality. The virality of the content produced is key to a media company’s profitability. An accurate model that could predict parameters that increase the virality of an article, specifically, the number of social shares it receives, would be extremely valuable.

    We started with a base dataset containing meta-data of nearly 40,000 unique Mashable blog articles over the past 5 years. The meta-data includes 61 different attributes ranging from metrics like word counts to sentiment analysis. This dataset is hosted on the Machine Learning Repository from the Center for Machine Learning and Intelligent Systems at the University of California Irvine.

    In addition to the meta-data, we were also interested in the actual article data, so we used a webscraper (using ‘rvest’) to collect the actual article titles, date published, author and article content. We joined these values into the base dataset.

    The problem was attempted in the beginning using advanced regression but was soon shifted to tree-based models with the final model being an extreme gradient boosting model.

    While we expected our task of predicting the number of shares for an article to be challenging, we now realize it is even more challenging than initially thought. For example, two articles discuss very popular tech products, Xbox and iPhone and have similar values for key metrics identified in the Variable Importance table above. However, they have drastically different number of shares, 900 vs 197,000.

    That said, we learned that it is possible to create a good model that utilizes a number of predictors to determine the number of shares a given article.

    The final boosted model had an MAE of 2724.32. On average, our model is able to predict the shares of mashable articles with a maximum positive/negative difference of just over 2,700 shares.

    See project

Honors & Awards

  • Karuna Majumdar Fellowship

    Dr. Hasi Majumdar Venkatachalam MD

    The Fellowship fund was created by Dr. Hasi Majumdar Venkatachalam MD as a monetary assistance to students from India.

  • Winner, Sales Forecasting Challenge, ZS India

    ZS Associates

    I was the solo winner across ZS India, in a team challenge of ~50 registered teams, to build models to forecast the sales for 183 different pharmaceutical products. The winning solution used an enhanced variant of ARIMA.

  • Winner, Insurance Premium Prediction Challenge, ZS Pune

    ZS Associates

    I was the solo winner at the Pune office for a national company-wide team prediction challenge to predict Medical Insurance Premiums, based on a wide variety of variables of multiple data types. The winning solution utilized generalized non-linear modelling, coupled with smoothing splines.

  • Finalist

    Atlantic Council / StartupDosti

    Finalist at the Indo-Pak Business Plan competition. Finale from April 24-30,2014 at Thailand.

  • Finalist

    Global Cooperative Challenge / SENStation

    Finalist at the GCC Business Plan Competition, organised by SENStation.

  • Best Intern

    MusicPerk

Test Scores

  • GRE

    Score: 329

    Quant : 170 / 170 (97th percentile)
    Verbal : 159 / 170 (82nd percentile)
    AWA : 4.5 / 6 (82nd percentile)

  • TOEFL

    Score: 117

    Reading : 30 / 30
    Listening : 30 / 30
    Speaking : 27 / 30
    Writing : 30 / 30

Languages

  • Malayalam

    Native or bilingual proficiency

  • English

    Full professional proficiency

  • Hindi

    Professional working proficiency

  • Tamil

    Elementary proficiency

Recommendations received

More activity by Abhijith

View Abhijith’s full profile

  • See who you know in common
  • Get introduced
  • Contact Abhijith directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Abhijith Asok

Add new skills with these courses