Statistics with Rust: 50+ Statistical Techniques Put into Action
()
About this ebook
Are you an experienced statistician or data professional looking for a powerful, efficient, and versatile programming language to turbocharge your data analysis and machine learning projects? Look no further! "Statistics with Rust" is your comprehensive resource to unlock Rust's true potential in modern statistical methods.
Related to Statistics with Rust
Related ebooks
NumPy: Beginner's Guide - Third Edition Rating: 4 out of 5 stars4/5Time Series with Python: How to Implement Time Series Analysis and Forecasting Using Python Rating: 3 out of 5 stars3/5Learning NumPy Array Rating: 0 out of 5 stars0 ratingsBig Data Science in Finance Rating: 0 out of 5 stars0 ratingsMachine Learning For Beginners Guide Algorithms: Supervised & Unsupervsied Learning. Decision Tree & Random Forest Introduction Rating: 0 out of 5 stars0 ratingsRust In Practice Rating: 0 out of 5 stars0 ratingsLearning Rust Rating: 0 out of 5 stars0 ratingsRust for Network Programming and Automation Rating: 0 out of 5 stars0 ratingsIntroduction to TinyML Rating: 5 out of 5 stars5/5Machine Learning Bookcamp: Build a portfolio of real-life projects Rating: 4 out of 5 stars4/5Rust for C++ Programmers: Learn how to embed Rust in C/C++ with ease (English Edition) Rating: 0 out of 5 stars0 ratingsModern Python Cookbook Rating: 5 out of 5 stars5/5Python Machine Learning Projects: Learn how to build Machine Learning projects from scratch (English Edition) Rating: 0 out of 5 stars0 ratingsC++ for Finance: Writing Fast and Reliable Trading Algorithms Rating: 0 out of 5 stars0 ratingsArtificial Intelligence Programming with Python: From Zero to Hero Rating: 4 out of 5 stars4/5Markov Models Supervised and Unsupervised Machine Learning: Mastering Data Science And Python Rating: 2 out of 5 stars2/5Mastering Python Design Patterns Rating: 0 out of 5 stars0 ratingsBeginning Rust Programming Rating: 0 out of 5 stars0 ratingsPyTorch Cookbook Rating: 0 out of 5 stars0 ratingsLearn Rust Programming: Safe Code, Supports Low Level and Embedded Systems Programming with a Strong Ecosystem (English Edition) Rating: 0 out of 5 stars0 ratingsDeep Learning with Keras Rating: 4 out of 5 stars4/5Mastering Objectoriented Python Rating: 5 out of 5 stars5/5PySide GUI Application Development - Second Edition Rating: 0 out of 5 stars0 ratingsRust Mini Reference: A Hitchhiker's Guide to the Modern Programming Languages, #5 Rating: 0 out of 5 stars0 ratings
Applications & Software For You
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer. Rating: 5 out of 5 stars5/5Canva Tips and Tricks Beyond The Limits Rating: 3 out of 5 stars3/5Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1 Rating: 5 out of 5 stars5/5GarageBand For Dummies Rating: 5 out of 5 stars5/5Photoshop - Stupid. Simple. Photoshop: A Noobie's Guide to Using Photoshop TODAY Rating: 3 out of 5 stars3/5The Basics of User Experience Design by Interaction Design Foundation Rating: 4 out of 5 stars4/5SketchUp For Dummies Rating: 4 out of 5 stars4/5Logic Pro X For Dummies Rating: 0 out of 5 stars0 ratingsPhotoshop For Beginners: Learn Adobe Photoshop cs5 Basics With Tutorials Rating: 0 out of 5 stars0 ratingsSound Design for Filmmakers: Film School Sound Rating: 5 out of 5 stars5/580 Ways to Use ChatGPT in the Classroom Rating: 5 out of 5 stars5/5The Beginner's Guide to Procreate Dreams: How to Create and Animate Your Stories on the iPad Rating: 0 out of 5 stars0 ratingsPython Projects for Everyone Rating: 0 out of 5 stars0 ratingsBlender All-in-One For Dummies Rating: 0 out of 5 stars0 ratingsCanva For Dummies Rating: 5 out of 5 stars5/5Adobe Illustrator CC For Dummies Rating: 5 out of 5 stars5/52022 Adobe® Premiere Pro Guide For Filmmakers and YouTubers Rating: 5 out of 5 stars5/5Master In YouTube - How I Run 12+ Different Profitable YouTube Channels and Make 7 Figures From Them ! Rating: 0 out of 5 stars0 ratingsYouTube Channels For Dummies Rating: 3 out of 5 stars3/5How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally Rating: 4 out of 5 stars4/5Animation for Beginners: Getting Started with Animation Filmmaking Rating: 4 out of 5 stars4/5Creating Apps: The Guide for Ordinary People Rating: 0 out of 5 stars0 ratingsSmartphone Photography Rating: 0 out of 5 stars0 ratingsTableau Your Data!: Fast and Easy Visual Analysis with Tableau Software Rating: 4 out of 5 stars4/5Adobe Creative Cloud All-in-One For Dummies Rating: 1 out of 5 stars1/5Blender 4.3 Guide for All: Mastering 3D Design and Animation Rating: 0 out of 5 stars0 ratings
Reviews for Statistics with Rust
0 ratings0 reviews
Book preview
Statistics with Rust - Keiko Nakamura
Statistics with Rust
50+ Statistical Techniques Put into Action
Keiko Nakamura
Copyright © 2023 by GitforGits.
All rights reserved. This book is protected under copyright laws and no part of it may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system, without the prior written permission of the publisher. Any unauthorized reproduction, distribution, or transmission of this work may result in civil and criminal penalties and will be dealt with in the respective jurisdiction at anywhere in India, in accordance with the applicable copyright laws.
Published by: GitforGits
Publisher: Sonal Dhandre
www.gitforgits.com
Printed in India
First Printing: April 2023
Cover Design by: Kitten Publishing
For permission to use material from this book, please contact GitforGits at [email protected].
Content
Preface
Chapter 1: Introduction to Rust for Statisticians
Why Rust for Data Analysis and Statistics?
Comparing Rust and Python for Statistics
Performance
Memory Safety and Resource Management
Concurrency
Interoperability
Ecosystem Growth and Future Prospects
Readability and Maintainability
Scalability
Cross-platform and Deployment
Learning Curve
Setting up Rust Environment
Download rustup-init
Run rustup-init
Configure PATH Environment Variable
Verify the Installation
Essential Rust Libraries for Statistics
ndarray
statrs
statis
plotly
Setting up Statistical Project
Create a New Rust Project
Add Library Dependencies
Build and Run the Project
Import the Libraries in Rust Code
Summary
Chapter 2: Data Handling and Preprocessing
Data Handling and Preprocessing
Process of Data Handling and Preprocessing
Exploring CSV crate
Dataset Loading with CSV crate
Parsing the Data
Data Structures in Rust
Arrays
Vectors
Tuples
Structs
HashMaps
Calculating Mean
Calculating Median
Common Data Cleaning and Preprocessing Techniques
Handling Missing Values
Data Type Conversion
Scaling/Normalizing Data
Encoding Categorical Variables
Feature Engineering
Performing Data Cleaning and Preprocessing
Summary
Chapter 3: Descriptive Statistics in Rust
Introduction to Descriptive Statistics
Measures of Central Tendency
Calculate Measures of Central Tendency
Measures of Dispersion
Calculate Measures of Dispersion
Exploratory Data Analysis (EDA)
Implementing EDA
Summary
Chapter 4: Probability Distributions and Random Variables
Discrete Probability Distribution
Uniform Distribution
Bernoulli Distribution
Binomial Distribution
Poisson Distribution
Geometric Distribution
Continuous Probability Distribution
Uniform Distribution
Normal (Gaussian) Distribution
Exponential Distribution
Beta Distribution
Gamma Distribution
Generating Random Variables
Sampling from Distributions
Sample Program for Sampling from Distributions
Estimating Distribution Parameters
Method of Moments (MoM)
Maximum Likelihood Estimation (MLE)
Bayesian Estimation
Least Squares
Summary
Chapter 5: Inferential Statistics
Fundamentals of Inferential Statistics
Hypothesis Testing
Confidence Intervals
Performing Hypothesis Testing
Two-sample T-test
Chi-square Test for Independence
Calculating Confidence Interval
For Mean
For the Proportion
Parametric Tests
Paired T-test
One-way ANOVA
Non-parametric Tests
Wilcoxon Rank-sum Test (Mann-Whitney U Test)
Implementing Wilcoxon Rank-sum Test
Kruskal-Wallis Test
Implementing Kruskal-Wallis Test
Summary
Chapter 6: Regression Analysis
Introduction to Regression Analysis
Overview
Applications of Regression Analysis
Types of Regression Analysis
Simple Linear Regression
Understanding Equation
Applying Simple Regression with Rust
Multiple Linear Regression
Understanding Equation
Applying Multiple Linear Regression
Polynomial Regression
Understanding Equation
Applying Polynomial Regression
Ridge and Lasso Regression
Understanding Equation
Applying Ridge and Lasso Regression
Logistic Regression
Understanding Equation
Applying Logistic Regression
Summary
Chapter 7: Bayesian Statistics
Introduction to Bayesian Statistics
Bayes Theorem
Advantages of Bayesian Statistics
Bayesian Inference
Putting Bayesian Inference into Action
Procedure to Perform Bayesian Inference
Practical Illustration of Bayesian Inference
Bayesian Model Comparison
Bayesian Hierarchical Modeling
Advanced Markov Chain Monte Carlo Method
Simple Implementation of HMC Method
Model Comparison and Selection
Model Comparison using DIC
Model Comparison using WAIC
Summary
Chapter 8: Multivariate Statistical Methods
Multivariate Statistical Methods
Introduction
Overview of Multivariate Techniques
Principal Component Analysis (PCA)
Procedure of PCA
Sample Program to Implement PCA
Canonical Correlation Analysis (CCA)
Procedure to Perform CCA
Sample Program to Implement CCA
Linear Discriminant Analysis (LDA)
Procedure to Perform LDA Algorithm
Sample Program to Implement LDA
Independent Component Analysis (ICA)
Overview of ICA Algorithm
Sample Program to Implement ICA
Multidimensional Scaling (MDS)
Types of Multidimensional Scaling
Sample Program to Implement Classical MDS
Summary
Chapter 9: Nonlinear Models and Machine Learning
Nonlinear Models
Decision Trees
Overview
Building Decision Tree
Support Vector Machines (SVM)
Overview
Building SVM Model
Neural Networks
Fundamentals of Neural Networks
Building Neural Network Model
Ensemble Methods
Overview
Building Bagging Ensemble of Decision Tree
Summary
Chapter 10: Model Evaluation and Validation
Model Evaluation and Validation
Introduction
Train-test Split Technique
Exploring Train-test Split
Implementing Train-test Split
Cross-validation Technique
Understanding Cross-validation
Implementing K-fold Cross-validation
Hyperparameter Tuning
Overview
Perform Hyperparameter Tuning using Grid Search
Model Selection Techniques: AIC and BIC
Akaike Information Criterion (AIC)
Bayesian Information Criterion (BIC)
Implement AIC and BIC
Resampling Methods
Bootstrapping
Permutation Tests
Perform Bootstrapping and Permutation Test
Implementing Bootstrapping
Implementing Permutation Test
Summary
Chapter 11: Text and Natural Language Processing
Overview of Natural Language Processing (NLP)
Key Processes of NLP
Text Preprocessing and Tokenization
Key Preprocessing Techniques
Common Tokenization Approaches
Implementing Text Preprocessing and Tokenization
Sample Program to Perform Preprocessing and Tokenization
Stopword Removal Process
Sample Program to Perform Stopword Removal
Stemming and Lemmatization
Perform Stemming
Information Retrieval with TF-IDF
TF-IDF Components
Implementation of TF-IDF
Word Embeddings and Word2Vec
Summary
Index
Epilogue
Preface
Are you an experienced statistician or data professional looking for a powerful, efficient, and versatile programming language to turbocharge your data analysis and machine learning projects? Look no further! Statistics with Rust
is your comprehensive resource to unlock Rust's true potential in modern statistical methods.
This book is tailored specifically for statisticians and data professionals who are already familiar with the fundamentals of statistics and want to leverage the speed and reliability of Rust in their projects. Over 11 in-depth chapters, you will discover how Rust outperforms Python in various aspects of data analysis and machine learning and learn to implement popular statistical methods using Rust's unique features and libraries.
Statistics with Rust
begins by introducing you to Rust's programming environment and essential libraries for data professionals. You'll then dive into data handling, preprocessing, and visualization techniques that form the backbone of any statistical analysis. As you progress through the book, you'll explore descriptive and inferential statistics, probability distributions, regression analysis, time series analysis, Bayesian statistics, multivariate statistical methods, and nonlinear models. Additionally, the book covers essential machine-learning techniques, model evaluation and validation, natural language processing, and advanced techniques in emerging topics.
In this book you will learn how to:
Discover Rust's unique advantages for statistical analysis and machine learning projects.
Learn to efficiently handle, preprocess, and visualize data using Rust libraries.
Implement descriptive and inferential statistics with Rust for powerful data insights.
Master probability distributions and random variables in Rust for robust simulations.
Perform advanced regression analysis with Rust's capabilities.
Explore Bayesian statistics and Markov Chain Monte Carlo methods in Rust.
Uncover multivariate techniques, including PCA and Factor Analysis, using Rust libraries.
Implement cutting-edge machine learning algorithms and model evaluation techniques in Rust.
Delve into text analysis, and natural language processing with Rust.
To ensure you get the most out of this book, each chapter includes hands-on examples and exercises to reinforce your understanding of the concepts presented. You'll also learn to optimize your Rust code and select the best tools and libraries for each task, maximizing your productivity and efficiency.
GitforGits
Prerequisites
Statistics with Rust
is your indispensable guide to harnessing the power of Rust for modern statistical analysis and machine learning. Whether you are a seasoned data professional or a Rust enthusiast looking to expand your knowledge, this book provides the tools and insights to elevate your projects.
Codes Usage
Are you in need of some helpful code examples to assist you in your programming and documentation? Look no further! Our book offers a wealth of supplemental material, including code examples and exercises.
Not only is this book here to aid you in getting your job done, but you have our permission to use the example code in your programs and documentation. However, please note that if you are reproducing a significant portion of the code, we do require you to contact us for permission.
But don't worry, using several chunks of code from this book in your program or answering a question by citing our book and quoting example code does not require permission. But if you do choose to give credit, an attribution typically includes the title, author, publisher, and ISBN. For example, Statistics with Rust by Keiko Nakamura
.
If you are unsure whether your intended use of the code examples falls under fair use or the permissions outlined above, please do not hesitate to reach out to us at [email protected].
We are happy to assist and clarify any concerns.
Acknowledgement
I owe a tremendous debt of gratitude to GitforGits, for their unflagging enthusiasm and wise counsel throughout the entire process of writing this book. Their knowledge and careful editing helped make sure the piece was useful for people of all reading levels and comprehension skills. In addition, I'd like to thank everyone involved in the publishing process for their efforts in making this book a reality. Their efforts, from copyediting to advertising, made the project what it is today.
Finally, I'd like to express my gratitude to everyone who has shown me unconditional love and encouragement throughout my life. Their support was crucial to the completion of this book. I appreciate your help with this endeavour and your continued interest in my career.
Chapter 1: Introduction to Rust for Statisticians
Why Rust for Data Analysis and Statistics?
In recent years, the Rust programming language has attracted considerable attention from developers for its safety, speed, and concurrency capabilities. Originating as a systems programming language, Rust has grown in popularity and has been adopted across various domains, including web development, embedded systems, and even data analysis. With its focus on performance and safety, Rust is a formidable choice for data analysis and statistical computing, providing unique advantages over traditional languages such as Python, R, and Julia.
This book aims to guide you through the world of statistics and data analysis using Rust, offering a comprehensive understanding of Rust's potential in these fields. By the end of this journey, you will be equipped with the knowledge and practical skills to leverage Rust's power for your data analysis projects.
Rust's high-performance capabilities are one of its most appealing features. As a compiled language, Rust offers performance that is on par with or even surpasses C and C++. This is particularly important for data analysis and statistics, where large datasets and complex computations are common. With Rust, you can execute data processing tasks and run algorithms with lower latency, enabling faster and more efficient analysis.
Memory safety is a critical aspect of any programming language, especially when dealing with large datasets or complex data structures. Rust's unique ownership system and strong type system ensure memory safety at compile time, eliminating common bugs such as data races, null pointer dereferences, and buffer overflows. This guarantees that your data analysis programs will be more robust and less prone to crashes, without the need for a garbage collector that might impact performance.
Modern hardware often features multiple cores or processors, and utilizing this parallelism is essential for high-performance computing. Rust's built-in concurrency support, based on its ownership and borrowing system, allows you to build concurrent and parallel programs with ease. By leveraging Rust's concurrency features, you can efficiently distribute data processing tasks across multiple cores or even multiple machines, significantly reducing the time required for complex calculations.
Rust's C-compatible FFI (Foreign Function Interface) enables seamless integration with existing C and C++ libraries. This means you can easily use existing high-performance libraries for data analysis, such as BLAS (Basic Linear Algebra Subprograms), LAPACK (Linear Algebra PACKage), or FFTW (Fastest Fourier Transform in the West), alongside Rust's native libraries. Moreover, Rust's WebAssembly support allows you to run your data analysis code on the web, opening up new possibilities for interactive data visualization and analysis tools.
Although Rust is a relatively young language, its ecosystem has grown rapidly, with an ever-increasing number of libraries and tools catering to data analysis and statistics. Libraries such as ndarray, statrs, and plotly offer robust support for data manipulation, statistical computation, and visualization. Additionally, the Rust community is highly active and committed to developing new libraries and improving existing ones, ensuring that the Rust ecosystem will continue to expand and evolve.
Rust's syntax is clear, concise, and expressive, making it easier for you to write and read your code. This improves the maintainability of your data analysis programs, allowing you to