0% found this document useful (0 votes)
2K views27 pages

Anaconda The Worlds Most Popular Data Science Platform

Uploaded by

Jesus Tellez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2K views27 pages

Anaconda The Worlds Most Popular Data Science Platform

Uploaded by

Jesus Tellez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

The World’s Most Popular

Data Science Platform


What’s Inside

3......... Introduction

7......... Package and Environment Management

12....... Benefits of Anaconda

15....... Anaconda Engages the Community

20...... Who Should Use Anaconda?

22...... Anaconda Embedded

23....... Real-World Anaconda Use Cases

25....... Industry Statistics

Anaconda: The World’s Most Popular Data Science Platform 2


Gone are the days when the word “Python” merely referred to a
large, nonvenomous snake. The term now signifies a widely-used,
multi-purpose programming language. In fact, Python has become
the most popular programming language, and for good reason.
Python is a very accessible language that facilitates a wide variety of
programming-driven tasks. As such, there are hundreds of
thousands of Python packages available that provide modules and
tools for common functionality. This sheer volume of packages*
makes Python applicable for a wide variety of use cases. Because of
its accessibility and broad reach, Python is widely used by non-
programmers and students in addition to programming experts, and
it makes a great teaching language for machine learning (ML) and
artificial intelligence (AI).

Packages are individual


units of installable code.

Anaconda: The World’s Most Popular Data Science Platform 3


The popularity of Python for data science continues to grow. How Often Do Respondents Use Python?
In Anaconda’s 2021 State of Data Science report, 63% of
respondents said they use Python frequently or always, making
it the most popular language of those included in the survey.
34%
Alongside Python’s rise in popularity, data scientists are turning Always
away from more traditional, proprietary languages like MATLAB
and SAS and toward the open-source ecosystem. The velocity of
29%
innovation powered by the community cannot be matched or Frequently
outpaced by any single technology vendor, and more and more
organizations are adopting open-source software for enterprise
use. According to the above-referenced report, 65% of survey
respondents said their employers are encouraging them to 4%
contribute to open-source projects. Never

2021 saw approximately 5 billion package downloads from the


Anaconda repository—and more than 25 million users choose
Anaconda to power their data science and AI workflows, making
11%
Rarely
Anaconda the world’s most popular data science platform and
the foundation of modern machine learning. Anaconda has 22%
pioneered the use of Python for data science, championed its Sometimes
vibrant community, and continues to steward open-source
projects that make tomorrow’s innovation possible. Our
enterprise-grade solutions enable corporate, research, and Source: State of Data Science 2021
academic institutions to harness the power of open source n=3,104
for competitive advantage, groundbreaking research,
and a better world.

Anaconda: The World’s Most Popular Data Science Platform 4


Open-source packages have been the biggest
enabler for data science we’ve seen in recent
years. Being able to offer a set of trusted tools
from Anaconda will empower our customers
through every stage of the data science
journey on Microsoft Azure.
-Mark Russinovich, Chief Technology Officer and Technical Fellow, Microsoft Azure
Anaconda’s mission is multifaceted:

Create a platform where Drive widespread adoption Steward open-source Create a place where
data science users, open- of modern data analysis by innovation by fostering people do meaningful
source contributors, AI creating an excellent user communities and work that embodies
researchers, and technology experience and helping championing open our core values.
providers can discover, customers integrate tools standards for data
share, and sell their and techniques throughout and computation.
innovations. their organizations.

This guide serves to show why Anaconda is the


go-to choice for open-source data science. It covers package
and environment management, benefits of Anaconda,
Anaconda’s broad reach in terms of user groups, and more.

Anaconda: The World’s Most Popular Data Science Platform 6


Package and Environment Management
Python is heavily used for numerical computing and data
analysis, and its packages often require complex build steps.
Python is used across different operating systems and hardware,
which introduces another layer of complexity, and its built-in
packaging system does not properly support these complexities.
Anaconda is an established
Most projects in the Python open-source software (OSS)
ecosystem depend on many other projects to function, and this
leader in Python package
forms a chain of complexities called the “dependency graph.” management, with a track
Open-source projects are managed independently and release
updates asynchronously. Thus, a package manager is needed to record of responsible
ensure package and platform interoperability. Without one, the
chain of dependencies can become unmanageable; packages
behavior. The entire Python
may depend on incompatible versions of other packages, and ecosystem trusts Anaconda.
obtaining different versions of those other packages may break
other dependencies and push the problem downstream. In other
words, chaos ensues. -Jharrod LaFon, Vice President,
Cloud Development, OpenEye Scientific
Anaconda was built to cut through—and prevent—the chaos.
Let’s take a closer look at the components that make this
possible.

Anaconda: The World’s Most Popular Data Science Platform 7


Conda is Anaconda’s package, environment, and dependency
manager. It is cross-platform (Windows, macOS, Linux [x86/
AARCH64/PPC64LE/s390x]), cross-language (supports Python,
R, C/C++, Rust, Go, and more), and ensures package
compatibility and environment correctness. Conda packages are
pre-compiled for multiple platforms using a Conda build recipe
and made available in repositories where users can use Conda to
install them. Because Conda is language-agnostic, it can easily
manage packages with binary dependencies. Conda makes it
easy to install many of the most commonly used numerical
computing packages.

Conda is particularly helpful when it comes to reproducibility


and deployment of applications into multiple environments.
Models and applications written on one computer can easily be
ported to another, regardless of platform. This type of
interoperability continues to steward our core values of
widespread adoption, sharing, and collaboration. There are over
25 million active Conda users, with a 34% increase in active
Conda users from 2020 to 2021.

Learn more about Conda here.

Anaconda: The World’s Most Popular Data Science Platform 8


Anaconda Navigator, Conda’s graphical user interface (GUI),
makes it easy to launch and integrate applications with Conda’s
package and environment management system.

When installed as part of the Anaconda Distribution, Navigator


comes preloaded with a curated set of more than 300 data
science and machine learning packages, and serves as a desktop
application that easily installs additional packages from
the Conda ecosystem.

Learn more about and download Anaconda Navigator here.

Anaconda: The World’s Most Popular Data Science Platform 9


Miniconda is an installer that contains only Python, Conda, and
Conda’s dependencies. It is the minimal way to bootstrap Conda
onto a system. Miniconda is popular amongst users who know
exactly what packages they want. It is often used alongside
Docker deployments, and for Continuous Integration (CI)/
Continuous Deployment (CD). Miniconda can be embedded
inside other products and environments.

Learn more about and download Miniconda here.

Anaconda: The World’s Most Popular Data Science Platform 10


Anaconda-Hosted Repositories:
anaconda.org: repo.anaconda.com:
Community-led hosting of published conda packages. Default location where Conda looks for updates and
Includes open-source repositories and channels such as packages. Only Anaconda, Inc. can publish to this repo.
conda-forge, PyTorch, and commercial partners like NVIDIA
and Intel.

repo.anaconda.cloud:
Anaconda’s premium repository that can only be accessed with a token. Compiled packages in this repository prioritize
cross-package and platform interoperability and stability, and embed additional security features into our package metadata.
Only Anaconda, Inc. can publish to this repo. Packages in repo.anaconda.cloud are:

Secure: Compatible: Uniform: Reproducible: Supported:


packages are built packages are built in the user experience exact package when compatibility
and maintained on a a consistent manner for managing versions can be issues do arise,
private, high-security with dependency packages is the same recorded (when customers can receive
network by Anaconda information so they across operating needed) and used to support directly
employees. can work together systems (Windows, recreate environments from Anaconda.
reliably. They are Mac, and Linux) and seamlessly across
rigorously tested to languages (Python, R, platforms.
ensure functionality in C, C++, etc).
a known environment.

Anaconda: The World’s Most Popular Data Science Platform 11


Benefits of Anaconda
As previously mentioned, Conda is Anaconda’s package
and environment manager. Conda itself is a huge benefit
to Anaconda users as it natively solves for complex
dependencies. While Conda is mainly used for Python
and R, it can also support C++, Java, Rust, and others.
Conda can work on any major operating system without
requiring administrator privileges. Ultimately, Conda is
designed to handle the expansive and specific needs of
data scientists and others doing numerical computing.

Anaconda: The World’s Most Popular Data Science Platform 12


Beyond its beneficial functionalities, there are experiential pluses to using Anaconda:

Anaconda makes it easy for beginners to get started Anaconda’s premium package repository is curated
1 with data science. An Anaconda Distribution install 5 and built from source on our secure network.
comes with a desktop GUI that is preloaded with the Packages are verified upon installation to ensure that
most popular data science and machine learning they are tamper-free. With a secured supply chain for
packages. As such, Anaconda is widely used by open-source software, you can spend less time
universities and bootcamps to teach Python, and managing risk and more time on innovation.
learners become acquainted with it at the start of
their data science careers. Anaconda generates a Software Bill of Materials
6 (SBOM) for customers in accordance with evolving
Developers and data scientists can collaborate more security standards and best practices around the use
2 seamlessly and quickly by using Anaconda’s packaging of open source in sensitive environments. SBOMs are
and software environment management tools. important because they provide visibility into the
components of your software, facilitating awareness
Users working alone also benefit from Anaconda’s of potential risk factors and quicker reaction times
3 packaging technology. They can quickly and safely
use different versions of software packages, thereby
should an issue arise.

allowing them to try new innovation features without Anaconda is part of an active community of over 25
jeopardizing the stability of their existing software
environments and models.
7 million users. We recently created Anaconda Nucleus,
an interactive site where Python students, practitioners,
and experts can work, learn, and share with each other.
Anaconda identifies the security vulnerabilities (CVEs)
4 that packages and their dependencies are exposed to.
We curate and enhance the accuracy of CVE data so
that you can block unsafe packages with precision and
make well-informed decisions.

Anaconda: The World’s Most Popular Data Science Platform 13


Anaconda is enabling organizations to
harness the power of open-source
Python innovation through ease of use
and security-focused functionality.
-Christian Kleinerman, SVP, Product, Snowflake
Anaconda Engages
the Community
Community is at the heart of Anaconda, and there
are multiple ways in which we foster a spirit
of continued learning and collaboration.

Anaconda: The World’s Most Popular Data Science Platform 15


NUCLEUS ™

One of Anaconda’s most unique offerings is Nucleus, its


education and community engagement platform. The platform
features a wealth of data science content ranging from articles to
webinars to videos and more. It’s meant to be a space for users to
discover, share, and sell their innovations. The platform is intended
to grow and evolve, becoming more interactive over time.

One of the most popular features of Nucleus is the Anaconda


Community, a collection of forums open to Anaconda users and
data practitioners of all experience levels. The forums are for
asking questions, learning from experts, finding events, and—most
importantly—connecting with other community members.

Visit Anaconda Nucleus here.

Anaconda: The World’s Most Popular Data Science Platform 16


Dollars spent with Anaconda are directly invested back into the
open-source community. The Anaconda Dividend Program
$44,735
formalizes Anaconda’s commitment to direct a portion of its
revenue and resources to help advance projects and innovation
Anaconda donated $44,735 to
in data science. The goal of the program is to support diverse NumFOCUS to support open-
projects that are under-resourced and need additional exposure. source project development during
Anaconda launched the Dividend Program in partnership with the Dividend Program’s first year.
NumFOCUS, a U.S.-based nonprofit organization that provides
crucial administrative services and operational support for nearly Anaconda is honored to be a
44 open-source scientific computing projects.
Silver Sponsor of NumFOCUS
program initiatives.

Anaconda: The World’s Most Popular Data Science Platform 17


Other Community Efforts
Anaconda is an active supporter of the OSS community,
frequently contributing to events like PyData Global and
PackagingCon. In August 2021 Anaconda sponsored
the Pyston project to accelerate Python performance.
In fact, Anaconda has sponsored many OSS projects over
the years, including Numba, Bokeh, Dask, Intake, fsspec,
fastparquet, pandas, JupyterLab, and HoloViz.

Anaconda provides free storage, networking, infrastructure,


and support to large community channels like conda-forge
and bioconda. Additionally, Anaconda launched a fee
exemption program for non-profit research institutions.

Anaconda: The World’s Most Popular Data Science Platform 18


We rely on Anaconda to manage the
Python packages for our research. It is fast
and flexible, allowing us, for example, to
easily install the latest PyTorch version and
its CUDA dependencies. This enables us
to train machine learning models
efficiently on GPUs.
—Team Member, Potsdam Institute for Climate Impact Research (PIK)
Who Should Use Anaconda?
The short answer to this question is “just about anyone.” Anaconda’s ease of use makes it an
attractive option for data scientists of all abilities—particularly those who wish to build and test
models together, and those who value business impact and data-driven evidence. IT teams can
also benefit from using Anaconda, leveraging its security and governance features to manage
their infrastructure with confidence and reduce their organization’s exposure to vulnerabilities.
And of course Data Engineers, Business Analysts, Software Developers, and Academics can use
Anaconda to streamline their workflows and deliver value—no matter their field—though this is
by no means an exhaustive list of titles.

Anaconda’s current list of customers runs the gamut. Clients include automotive manufacturers,
energy companies, airlines, banks, and more. Employees at 99% of the top Fortune 100
Companies use Anaconda, as do employees at 82% of the top Fortune 500 companies.

Anaconda: The World’s Most Popular Data Science Platform 20


How Users Describe Anaconda
In a recent survey, users described Anaconda as:

A "must have" tool for Essential, massive


every Python user time-saver

Powerful, De facto standard for data science


convenient, environment management
user-friendly

Simple, integrated, best


Anaconda makes data package manager ever
science real
Anaconda Embedded—
Our Partner Network
Companies can partner with Anaconda to build and distribute a
seamless customer experience by using Anaconda behind the
scenes to power their products. Embedded partners receive access
to Anaconda’s experts and developers, experience guaranteed SLAs
and up-time, contribute to the open-source network, and gain
access to Anaconda’s thriving user community.

Anaconda: The World’s Most Popular Data Science Platform 22


Neural Networks
With Anaconda’s platform, you can build and deploy deep learning models
that use neural networks. Anaconda easily integrates with tools like
TensorFlow and Keras so you can build and train neural network models,
including convolutional neural networks (CNNs) and generative adversarial
Real-World networks (GANs).

Anaconda Machine Learning


Use Cases Scale your machine learning pipeline computations horizontally and vertically
on GPUs. Easily store and process data beyond the RAM of a single machine
Anaconda’s reach clearly extends
and reduce model training time by as much as 100x. Parallelize algorithms and
to many different industries, speed up iteration cycles during the development phase.
from healthcare to finance to
manufacturing and many more.
Read on for details about how Predictive Analytics
Anaconda functions within this
multitude of fields. In the past, only companies with big budgets could afford the proprietary
software needed to leverage predictive analytics for enterprise decision-making.
With Anaconda and open-source data science, more businesses have started
taking a proactive approach to addressing problems. Whether it involves
predicting customer churn, consumer demand levels, stock prices, maintenance
needs, or outage probabilities, Anaconda can help operate proactively.

Anaconda: The World’s Most Popular Data Science Platform 23


Data Visualization
The Python ecosystem of data visualization tools is vast. With Anaconda, your data science team can find the right visualization
tool for any data set, from manufacturing output to seismic activity. They will have the power to build and deploy beautiful
dashboards and get them into the hands of decision makers quickly with our one-click deployment technology.

Bias Mitigation
With Anaconda, you can leverage Python’s ecosystem of burgeoning open-source tools for mitigating bias in models and in
data sets, such as FairLearn and AIF360. Model explainability is essential for running an ethical AI program. Python tools you
can use with Anaconda include LIME and InterpretML. These tools help you explain the decisions of black box models as well
as create “glassbox” models that are developed to be explainable from the start.

Manufacturing and Automotive Manufacturing


Anaconda OSS packages provide tools that allow manufacturers to analyze, process, and control data so that they can
optimize their production, detect anomalies, find inefficiencies, and predict when equipment will fail or require maintenance.
Anaconda’s platform provides easy access to analysis environments and app deployments so that people throughout the
manufacturing organization can work directly with data without having to deal with systems administration or IT hurdles. The
combination of Anaconda’s packages and Anaconda’s platform empowers everyone within the organization to work directly
with the data they have to answer the questions that they face.

Financial Services
Anaconda provides a versatile set of OSS tools with many use cases in financial services, including quant/AI/ML/DS research,
portfolio management, derivatives pricing, risk modeling, capital planning, regulatory compliance, AML (Anti-Money
Laundering), customer analytics, fair lending, targeted marketing, fraud prevention, virtual assistants, and trading strategies.
The general pattern is to mix and match best of breed tools to create a unique competitive advantage. Anaconda takes care of
enterprise considerations so that you can focus on innovating with a diverse and active ecosystem of tools.

Anaconda: The World’s Most Popular Data Science Platform 24


Industry Statistics

100% 90%
of the top 10 of the top 10
Fortune-ranked Fortune-ranked
finance companies retail companies

100%
of Ivy League schools
teach using Anaconda
in their curriculum.

100% 80%
of the top 10 of the top 10
Fortune-ranked Fortune-ranked
technology companies energy companies

25
When it comes to programming languages, there are multiple options available.
Likewise, when it comes to package and environment managers, there are multiple
options available. All of that said, Python has climbed to the top of the programming
language heap, and shows no signs of declining. Furthermore, the advantages
offered by the open-source community make it the optimal choice over more
traditional, proprietary options.

Of course, Python and the open-source ecosystem are not without their flaws:
unpredictable speed of development and the potential for an overall lack of
cohesion, amongst others. Anaconda is the answer to many of these challenges.
Anaconda’s tools offer business users security, governance, and stability when it
comes to package and environment management. What’s more, Anaconda is
committed to the betterment of the community it exists within, providing
educational resources and a platform for engagement in Nucleus.

Anaconda is the world’s most popular data science platform. Our diverse user base
views Anaconda as the de facto solution to their data science needs. Anaconda’s
popularity has led to its widespread adoption within numerous companies and
industries. For more information about Anaconda, visit https://www.anaconda.com/.

Anaconda: The World’s Most Popular Data Science Platform 26


© 2022 Anaconda, Inc.

You might also like