Samuel Cantrell
Samuel Cantrell
Data Scientist
[email protected]
(470) 634-6443
11 IT Experience
8 Years Data Science
Tuning/Optimization, Integrity, Fourier Analysis, Personal Branding, Software Development, Professional, Risk, Disability Awareness, Consulting, Give Clear Feedback, Bash Scripting, Stress Management, Market Share, Perseverant, Data Management, Negotiation, Team Lead/Manager, Delegation, Microsoft Transact-SQL (T-SQL), Talent Management, SAP, Public Speaking, Apache Cassandra, Accept Feedback, Automation, Acuity, Pytorch, Optimistic, Apache Pig, Goal-Setting, Web Production, Non-Verbal Communication, Open Source, Focus, Boosted Trees, Diplomacy, Data Formats, Coaching, Image Processing, Open-Minded, Amazon Web Services (AWS), Presentation Skills, SQLite, Discipline, Statistics, Idea Exchange, Training Program Evaluation, Decision-Making, HDFS (Hadoop Distributed File System), Observation, Systems Administration/Management, Agility, Cost Control, Highly Organized, Risk Management, Active Listening,
Artificial Inteligence, Diversity Awareness, Query Analysis, Artistic Sense, Seaborn, Empathy, Bussiness Problem Solution, Supervising, Writing Skills, Selflessness, Refactoring, Managing Difficult Conversations, Mathematics, Influential, User Acceptance Testing, Versatility, Reporting Skills, Design, Systems Maintenance, Taking Criticism, Stochastic Analysis, Verbal Communication, Cloudera, Deal with Difficult Situations, SQL (Structured Query Language), Written Communication, Team Foundation Server (TFS), Remote Team Management, Publications, Trainable, MySQL, Interviewing, Statistical Programming Languages, Clarity, Data Science, Generosity, DNS (Domain Name System), Sensitivity, Naive Bayes, Tolerance of Change and Uncertainty, Electrocardiogram, Persuasion, Production Control, Reframing, Matplotlib, Troubleshooting, RPC (Remote Procedure Call), Streamlining, Microsoft Visual Basic for Applications (VBA), Imagination,
Credit Reports, Collaborative, Apache CouchDB, Dependability, Analysis Skills, Insight, Business Practices, Mediation, Market Research, Cooperation, Microsoft Visio, Intercultural Competence, Training/Teaching, Deal-Making, Performance Testing, Conflict or Dispute Resolution, Business Strategy, Resilient, Prophet, Memory, XG Boost, Sales Skills, Tuition Fees, Facilitating, Decision Support, Attentive, Statistical Analysis System (SAS), Calm, Offshoring, Introspection, Machine Tool, Friendliness, Human-Cyborg Relations, Inspiring, Big Data, Project Management, Netflow, Lateral Thinking, PostgreSQL, Task Planning, Predictive Modeling, Authenticity, Algorithms, Results-Oriented, Business Intelligence, Emotional Intelligence, Scrum Project Management and Software Development, Interpersonal Relationships Skills, Security Attacks, Business Ethics, Subversion, Curiosity, Spectrum Analyzers, Storytelling, Data Entry, Punctual, Software
Engineering, Persistence, Flask, Allocating Resources, Turing Test, Design Sense, C Programming Language, Mind Mapping, Microsoft Windows Operating System, Questioning, Automotive Industry, Humility, Gradient Boosted Machines, Motivated, CentOS, Tolerance, Operational Audit, Cultural Intelligence, Presentation/Verbal Skills, Commitment, Mercurial, Office Politics Management, Statistical Modeling, Critical Observation, Leadership, Visual Communication, System Test, Inspiration, Query Optimization, Personality Conflicts Management, Business Intelligence Software, Divergent Thinking, Data Mining, Analysis, MongoDB, Responsible, Atlassian JIRA, Experimenting, Data Profiling, Brainstorming, Unit Test, People Management, Best Practices, Humor, Optical Character Recognition (OCR), Positive Reinforcement, Software Architecture Design, Coordination, Technical Presentation, Confidence, tf-idf (term frequency–inverse document
frequency), Initiative, Apache Hive, Emotion Management, Sales, Competitiveness, Record Keeping, Time Awareness, GitHub, Self-Awareness, Performance Modeling, Team-Building, KNN, Virtual Team Management, Performance Analysis, Personal Time Management, Java Script, Planning, Time Management, Mentoring, Markov Chains, Coping, Azure, Networking, Microsoft Azure, Strategic Planning, Input/Output, Social Skills, Clustering, Recall, NCR Teradata, Reliable, Database Report Tools, Meeting Management, Database Administration, Innovation, Business Model, Task Tracking, IBM Product Family, Sense of Urgency, Oracle Applications, Scheduling, Robotics, Logical Reasoning, Performance Metrics, Encouraging, Linux Operating System, Patience, Random Forest, Work-Life Balance, Data Collection, Independence, Programming Tools, Prioritization, SQL Databases, Organization, Data Modeling, Respectfulness, Requirements
Management, AWS, Furniture, Sagemaker, Computer Hacking, Network Monitoring, Telephone Skills, Mentoring, Data lake, Project Design, Financial Reporting, Multivariate Analysis, Software Installation, Microsoft SharePoint, CSS (Cascading Style Sheet), Numba, Data Quality, Financial Management, Eclipse IDE, Academic Advice, Unix Shell Programming, Illustrating Ability, Campaigns, Insurance, AllFusion Erwin, Business Analysis, Risk Analysis, Pandas, Logistic Regression, Unix Operating Systems, Identify Issues, VMWare, Regular Expressions, Amazon Elastic Compute Cloud (EC2), Microsoft SQL Server, Home Automation, Java, Production Machining, Worker's Compensation, Customer Research, Quantitative Analysis, RPA, Machine Learning, C++ Programming Language, Node.JS, Style Sheets, Develop Methodologies, Physics, MapReduce, Microsoft Visual Studio, Test Strategy, Manufacturing, Content Development, Linear Algebra, Reinforcement
Learning, Project/Program Management, Ecosystems, Neuron, Pivot Tables, IEEE (Institute of Electrical and Electronic Engineers), Industrial Research, Problem Solving Skills, Usability Engineering, Logistics, Desktop PC, Neural Networks, Image Processing, Scientific Research, Network Performance/Analysis, Banking Services, Multi-Layer Perceptron, Artificial Neural Networks, Data Processing, MATLAB, Tensorflow, Internet Security, Computer Programming, Web Application Framework, Testing, Multitasking, Cardiac Monitoring, Face Recognition, Quality Control, Customer Support/Service, Waterfall Model of Software Development, Generative Adversarial Networks, Mining Methods, User Interface/Experience (UI/UX), Process Improvement, Secondary School, Oracle Database, Customer/Consumer Behavior, Organizational Skills, Electrical Engineering, Ridge Regression, Datasheets, IBM SPSS Statistical Package, Strategic Analysis, Computer Vision,
Windows PowerShell, User Interface Design, Graphical User Interface (GUI), Oracle PL-SQL, Neurotrauma (Traumatic Brain Injury), Data Analysis, Stakeholder Presentation, Workflow Analysis, Signal Processing, Transformation Tools, Amazon Simple Storage Service (S3), Concrete, Object Oriented Programming (OOP), Relational Databases (RDBMS), Metrics, LifeTime Value (LTV), Statistical Algorithms, HortonWorks, Agile Programming Methodologies, Radio Frequency, Database Technology, Interpersonal Skills, Econometrics, Detail Oriented, Budgeting, Application Programming Interface (API), Java, Market Segmentation, Mail Services, Web Scraping, Credit Risk, Kernel Programming, Data Warehouse, R Programing Language, Principal Component Analysis (PCA), Forecasting, Cross-Functional, Bootstrap, Internet Application, Time Series Analysis, Operating Systems, Advanced Data Visualization, Git, Marketing Strategy, Customer Surveys, Investment
Services, IDE (Integrated Development Environment), LaTeX Typesetting, Modeling Languages, Bayesian Networks, Validation Testing, Process Modeling, Customer Conversion, Data Visualization, Outbound Marketing, Microsoft Office, GPU Procesing , Scala Programming Language, Legal, Data Bricks, Use Cases, Hidden Markov Modeling, Objective C, Health Insurance, Pattern Matching, AI , Management Strategy, Plotly, Microsoft Product Family, GG Plot, Data Structures, Strategic Planning, Dynamo Db, Software Development Lifecycle (SDLC), Computer Linguistics, Data Migration, Scripting (Scripting Languages), Customer Churn, Business Process Management, Numpy, Spark, Other Skills, Recruiting/Staffing/Hiring, United States Department of Justice (DOJ), Text Mining, Artificial Intelligence (AI), NoSQL, IP (Internet Protocol), Database Extract Transform and Load (ETL), Microsoft Windows Azure, Product Pricing, 3-Laws Compliance, Cardiovascular
Disease, Investment Management, Social Media, Prototyping, Online Marketing, Oracle, Natural Language Processing (NLP), Programming Languages, Econometrics, Kerberos, Support Vector Machines, Microsoft Word, Microsoft Excel, K-Means Clustering, JSON, Human Resources, Cloud Computing, Reporting Dashboards, HTML (HyperText Markup Language), Memory Hardware, Marketing, Keras, Quality Assurance, Apache, Variance Analysis, Microsoft PowerPoint, Cloud Storage, Needs Assessment, Geometry, Python Programming/Scripting Language, Wireless Software, Expense Analysis, Demographics, Computer Security, Analysis Software, Communication Skills, Name, Retail, Audiovisual, Oracle Development Tools, Cost Estimates, Systems Analysis, Computer Skills, Team Player, Gesture Recognition, Surveillance, Mobile Applications, SQL Server Reporting Services (SSRS), Functional Testing, Advertising, SQL Server Integration Services (SSIS),
Machine Vision, Simulation, Cuda, Hyperparameter Tunning,
Professional Summary
Data Scientist with experience in Data Science consulting providing analytics and custom
development for specific business use cases. Combines Skills in mathematics with
analytics with hands-on development of machine learning algorithms, deep learning, and
data modeling to derive innovative solutions to enhance performance, productivity, and
quality of deliverables in any industry.
DATA SCIENCE
Samuel Cantrell
Data Scientist
[email protected]
(470) 634-6443
DATA SCIENCE
Samuel Cantrell
Data Scientist
[email protected]
(470) 634-6443
Professional Experience
Data Scientist
Enhance IT, Atlanta, GA, October 2019 – Present
DATA SCIENCE
Samuel Cantrell
Data Scientist
[email protected]
(470) 634-6443
Data Scientist
Associated Press, Austin, TX, April 2017 – August 2019
Associated Press is an online news outlet that handles a large amount of incoming information from various
other sources including social media feeds. However, much of the information gathered is not conducive to
making articles, either due to the information not being news or being incorrect. In order to solve this issue
and reduce the amount of time and employees needed to spend manually sorting the information, an NLP-
based filter was constructed and trained on twitter tweets. This filter sorted these feeds into categories of
Spam, Reviews, and News as well as sorted them into good, bad, and neutral categories. Finally, once news
as identified, it was further filtered into real or fake news with final output being only real news and the
overall sentiment of that news.
Classified 16,000 articles and 1,500 tweets in order to build a complete dataset for the model.
Cleaned text to standardize input to the model and ensure consistent results
Built functions to automatically remove symbols, hyperlinks, emojis, and do spell checking of
incoming text.
Build exception handling to treat potential edge cases of incorrect or unusable data being fed to the
model in production.
Performed stemming and lemmatization of text to remove superfluous components and make the
resulting corpus as small as possible while containing all important information.
A bag of words composed of around 30,000 unique words was compiled and built from scratch
using both nltk and TensorFlow packages for text processing and tokenization.
Concept space embedding via ELMo was also tested and found to have similar results to bag of
words with significant increase in computational time.
Constructed an NLP-based filter utilizing embedding and LSTM layers in Tensorflow and Keras.
Sorting subsequently done through the training and testing of an artificial neural network.
Produced classifications of whether given text was news or fit into other categories of potential
interest, such as spam.
Ran sentiment analysis of text and determined whether text was overall positive, negative, or
neutral.
Average classification accuracy of 93.4% was achieved, well above the target accuracy for the
project of 85%.
Deployed solutions to a Flask app on a cloud-based service (AWS) to which future user
applications connected via an API.
Further tested and compared this solution to those of AWS Sagemaker’s Comprehend, which
achieved a slightly higher accuracy of 94.7% than the previous model.
Finalized model was then handed over to Android and iOS app developers along with web-
developers to create a user front-end.
Client ultimately elected to implement the model previously constructed from scratch due to costs
associated with Sagemaker.
DATA SCIENCE
Samuel Cantrell
Data Scientist
[email protected]
(470) 634-6443
Data Scientist
Micron Semiconductors, Austin, TX, February 2014 – April 2017
Calculated properties of crystal structures via quantum mechanical first principles using the
Vienna ab initio Simulation Package (VASP).
Utilized Linux high-performance computing cluster to compile and run calculations.
Automation of VASP jobs using Unix terminal commands and Bash scripting.
Worked with IT personal managing and maintaining the cluster to compile executables and
optimize computational tasks.
Post-processing data analysis of completed self-consistent calculations done with Python and
MATLAB.
Plots of material properties such as band structure and density of states generated using
Matplotlib and MATLAB.
Creation of instructive visualizations using molecular modeling software such as VESTA.
Electronic and optical properties for a variety of materials were determined and delivered to
the client for further experimentation and fabrication.
Overall stochiometric trends analyzed using machine learning tools such as KNN, Logistic
Regression, and Decision Tree classifiers.
Comprehensive reports and documentation written using LaTeX and subsequently presented
to stakeholders.
DATA SCIENCE
Samuel Cantrell
Data Scientist
[email protected]
(470) 634-6443
Data Scientist
BNYM, Somerset, NJ (Remote), June 2012 – January 2014
Bank of New York Mellon is a large fiscal institution with several branches located across the eastern
seaboard of the US. The overall goal of this project was to decrease and minimize loss of revenue caused
by time and resources used to locate and retrieve older handwritten documents through the digitization of
these documents into an easily accessible and searchable database. Since raw images and scans take up a
large amount of memory, several models were constructed and compared to digitize characters and values
from scanned image using Bayesian and Decision Tree-based classifiers. Best performance was obtained
with a Random-Forest model that was then deployed on an AWS EC2 instance and available to be queried
by the client through an API.
Worked as part of a team with each member working on dedicated tasks contributing to an overall
product.
Used R and Python to retrieve and clean data prior to implementation and model training.
Trained using multiple character datasets to introduce as much handwriting invariance into the
model as possible.
Several models were constructed to compare performance and computational cost and select the
one which best met the needs of the client.
Tested multiple types of Bayesian classifiers, including those utilizing Bernoulli, and Gaussian
distributions.
Performance of digit recognition on par with human accuracy obtained using gradient-boosted
trees within a Random-Forest ensemble.
Utilized Tesseract v2 through command line and OCRFeeder for image segmentation and
classification.
Analyzed performance of various models based on speed and accuracy in a simulated production
environment.
Assisted in deployment of overall model onto an AWS EC2 instance and with setting up APIs for
future access and use.
DATA SCIENCE
Samuel Cantrell
Data Scientist
[email protected]
(470) 634-6443
Education
Master of Science in Physics
Texas State University
San Marcos, TX
DATA SCIENCE