0% found this document useful (0 votes)
105 views

Apache Mahout

Apache Mahout is an Apache Software Foundation project that provides machine learning algorithms and tools focused on linear algebra. It includes libraries for common math operations and provides implementations of algorithms like clustering, classification, and collaborative filtering. Originally using Hadoop, Mahout now primarily uses Apache Spark and provides a backend-agnostic programming environment and Scala DSL.

Uploaded by

levin696
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views

Apache Mahout

Apache Mahout is an Apache Software Foundation project that provides machine learning algorithms and tools focused on linear algebra. It includes libraries for common math operations and provides implementations of algorithms like clustering, classification, and collaborative filtering. Originally using Hadoop, Mahout now primarily uses Apache Spark and provides a backend-agnostic programming environment and Scala DSL.

Uploaded by

levin696
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Apache Mahout

Apache Mahout is a project of the Apache Software Foundation


to produce free implementations of distributed or otherwise
Apache Mahout
scalable machine learning algorithms focused primarily on linear
algebra. In the past, many of the implementations use the Apache
Hadoop platform, however today it is primarily focused on Apache Developer(s) Apache
Spark.[3][4] Mahout also provides Java/Scala libraries for common Software
math operations (focused on linear algebra and statistics) and Foundation
primitive Java collections. Mahout is a work in progress; a number Initial release 7 April 2009[1]
of algorithms have been implemented.[5]
Stable release 14.1 /
7 October
Features 2020[2]
Repository Mahout
Repository (htt
Samsara
ps://gitbox.apa
Apache Mahout-Samsara refers to a Scala domain specific che.org/repos/
language (DSL) that allows users to use R-Like syntax as opposed asf/mahout.git)
to traditional Scala-like syntax. This allows user to express Written in Java, Scala
algorithms concisely and clearly.
Operating system Cross-platform

val G = B %*% B.t - C - C.t + (ksi dot ksi) * (s_q cross Type Machine
s_q) Learning
License Apache
License 2.0
Backend Agnostic
Website mahout
Apache Mahout's code abstracts the domain specific language .apache.org (ht
from the engine where the code is run. While active development tps://mahout.a
is done with the Apache Spark engine, users are free to implement pache.org)
any engine they choose- H2O and Apache Flink have been
implemented in the past and examples exist in the code base.

GPU/CPU accelerators

The JVM has notoriously slow computation. To improve speed, “native solvers” were added which move
in-core, and by extension, distributed BLAS operations out of the JVM, offloading to off-heap or GPU
memory for processing via multiple CPUs and/or CPU cores, or GPUs when built against the ViennaCL
library.[6] "Extending Mahout Samsara to GPU Clusters" (https://on-demand.gputechconf.com/gtc/2017/vi
deo/s7572-extending-mahout-samsara-linear-algebra-dsl-to-support-gpu-clusters.mp4).. ViennaCL is a
highly optimized C++ library with BLAS operations implemented in OpenMP, and OpenCL. As of release
14.1, the OpenMP build considered to be stable, leaving the OpenCL build is still in its experimental POC
phase.

Recommenders

Apache Mahout features implementations of Alternating Least Squares, Co-Occurrence, and Correlated
Co-Occurrence, a unique-to-Mahout recommender algorithm that extends co-occurrence to be used on
multiple dimensions of data.

History

Transition from Map Reduce to Apache Spark

While Mahout's core algorithms for clustering, classification and batch based collaborative filtering were
implemented on top of Apache Hadoop using the map/reduce paradigm, it did not restrict contributions to
Hadoop-based implementations. Contributions that run on a single node or on a non-Hadoop cluster were
also welcomed. For example, the 'Taste' collaborative-filtering recommender component of Mahout was
originally a separate project and can run stand-alone without Hadoop.
Starting with the release 0.10.0, the project shifted its focus to building a backend-independent
programming environment, code named "Samsara".[7][8][9] The environment consists of an algebraic
backend-independent optimizer and an algebraic Scala DSL unifying in-memory and distributed algebraic
operators. Supported algebraic platforms are Apache Spark, H2O, and Apache Flink. Support for
MapReduce algorithms started being gradually phased out in 2014.[10]

Release History

Release History
Version Release Date Notes

0.1 2009-04-07
0.2 2009-11-18

0.3 2010-03-17

0.4 2010-10-31
0.5 2011-05-27

0.6 2012-02-06

0.7 2012-05-16
0.8 2013-07-25

0.9 2014-02-01
0.10.0 2015-04-11 Samsara DSL

0.10.1 2015-05-31

0.10.2 2015-08-06
0.11.0 2015-08-07

0.11.1 2015-11-06

0.11.2 2016-03-11
0.12.0 2016-04-11 Added Apache Flink engine

0.12.1 2016-05-19

0.12.2 2016-06-13
0.13.0 2017-04-17

0.14.0 2019-03-07 Source only (no binaries)

14.1 2020-10-07

Developers

Apache Mahout is developed by a community. The project is managed by a group called the "Project
Management Committee" (PMC). The current PMC is Andrew Musselman, Andrew Palumbo, Drew
Farris, Isabel Drost-Fromm, Jake Mannix, Pat Ferrel, Paritosh Ranjan, Trevor Grant, Robin Anil, Sebastian
Schelter, Stevo Slavić.[11]

References
1. "Apache Mahout: First release 0.1 released" (http://mail-archives.apache.org/mod_mbox/ww
w-announce/200904.mbox/%[email protected]
g%3E).
2. "Apache Mahout: Scalable machine learning and data mining" (https://mahout.apache.org/).
Retrieved 6 March 2019.
3. "Introducing Apache Mahout" (http://www.ibm.com/developerworks/java/library/j-mahout/).
ibm.com. 2011. Retrieved 13 September 2011.
4. "InfoQ: Apache Mahout: Highly Scalable Machine Learning Algorithms" (http://www.infoq.co
m/news/2009/04/mahout). infoq.com. 2011. Retrieved 13 September 2011.
5. "Algorithms - Apache Mahout - Apache Software Foundation" (https://web.archive.org/web/2
0131222013730/http://mahout.apache.org/users/basics/algorithms.html). cwiki.apache.org.
2011. Archived from the original (http://mahout.apache.org/users/basics/algorithms.html) on
22 December 2013. Retrieved 13 September 2011.
6. "ViennaCL" (https://viennacl.sourceforge.net/).
7. "Mahout-Samsara's In-Core Linear Algebra DSL Reference" (https://web.archive.org/web/20
160802233841/https://mahout.apache.org/users/environment/in-core-reference.html).
Archived from the original (http://mahout.apache.org/users/environment/in-core-reference.ht
ml) on 2 August 2016. Retrieved 29 February 2016.
8. "Mahout-Samsara's Distributed Linear Algebra DSL Reference" (https://web.archive.org/we
b/20160802233829/https://mahout.apache.org/users/environment/out-of-core-reference.htm
l). Archived from the original (http://mahout.apache.org/users/environment/out-of-core-referen
ce.html) on 2 August 2016. Retrieved 29 February 2016.
9. "Mahout 0.10.x: first Mahout release as a programming environment" (https://web.archive.or
g/web/20161009224405/http://www.weatheringthroughtechdays.com/2015/04/mahout-010x-
first-mahout-release-as.html). www.weatheringthroughtechdays.com. Archived from the
original (http://www.weatheringthroughtechdays.com/2015/04/mahout-010x-first-mahout-rele
ase-as.html) on 9 October 2016. Retrieved 29 February 2016.
10. "MAHOUT-1510 ("Good-bye MapReduce")" (https://issues.apache.org/jira/browse/MAHOUT
-1510).
11. "Apache Committee Information" (https://projects.apache.org/committee.html?mahout).

External links
Official website (https://mahout.apache.org/)

Retrieved from "https://en.wikipedia.org/w/index.php?title=Apache_Mahout&oldid=1146562685"

You might also like