Open Source For You — December 2017

(Steven Felgate) #1
http://www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 81

Insight Developers

By: Palak Shah
The author is an associate consultant in Capgemini and
loves to explore new technologies. She is an avid reader
and writer of technology related articles. She can be
reached at [email protected]

Machine learning tools
There are enough open source tools or frameworks
available to implement machine learning on a system.
One can choose any, based on personal preferences for a
specific language or environment.
Shogun: Shogun is one of the oldest machine learning
libraries available in the market. It provides a wide range
of efficient machine learning processes. It supports
many languages such as Python, Octave, R, Java/ Scala,
Lua, C#, Ruby, etc, and platforms such as Linux/UNIX,
MacOS and Windows. It is easy to use, and is quite fast at
compilation and execution.
Weka: Weka is data mining software that has a
collection of machine learning algorithms to mine the data.
These algorithms can be applied directly to the data or
called from the Java code.
Weka is a collection of tools for:
ƒ Regression
ƒ Clustering
ƒ Association rules
ƒ Data pre-processing
ƒ Classification
ƒ Visualisation
Apache Mahout: Apache Mahout is a free and
open source project. It is used to build an environment
to quickly create scalable machine learning algorithms
for fields such as collaborative filtering, clustering and
classification. It also supports Java libraries and Java
collections for various kinds of mathematical operations.
TensorFlow: TensorFlow performs numerical
computations using data flow graphs. It performs
optimisations very well. It supports Python or C++,
is highly flexible and portable, and also has diverse
language options.
CUDA-Convnet: CUDA-Convnet is a machine
learning library widely used for neural network
applications. It has been developed in C++ and can even
be used by those who prefer Python over C++. The
resulting neural nets obtained as output from this library
can be saved as Python-pickled objects, and those objects
can be accessed from Python.
H2O: This is an open source machine learning as well
as deep learning framework. It is developed using Java,
Python and R, and it is used to control training due to its
powerful graphic interface. H2O’s algorithms are mainly
used for business processes like fraud or trend predictions.


Languages that support machine learning
The languages given below support the implementation of
the machine language:
ƒ MATLAB
ƒ R
ƒ Python
ƒ Java


But for a non-programmer, Weka is highly
recommended when working with machine learning
algorithms.

Advantages and challenges
The advantages of machine learning are:
ƒ Machine learning helps the system to decode based
on the training data provided in the dynamic or
undermined state.
ƒ It can handle multi-dimensional, multi-variety data, and
can extract implicit relationships within large data sets
in a dynamic, complex and chaotic environment.
ƒ It saves a lot of time by tweaking, adding, or dropping
different aspects of an algorithm to better structure
the data.
ƒ It also uses continuous quality improvement for any
large or complex process.
ƒ There are multiple iterations that are done to deliver the
highest level of accuracy in the final model.
ƒ Machine learning allows easy application and
comfortable adjustment of parameters to improve
classification performance.
The challenges of machine learning are as follows:
ƒ A common challenge is the collection of relevant data.
Once the data is available, it has to be pre-processed
depending on the requirements of the specific algorithm
used, which has a serious effect on the final results.
ƒ Machine learning techniques are such that it is difficult
to optimise non-differentiable, discontinuous loss
functions. Discontinuous loss functions are important in
cases such as sparse representations. Non-differentiable
loss functions are approximated by smooth loss
functions without much loss in sparsity.
ƒ It is not guaranteed that machine learning algorithms
will always work in every possible case. It requires some
awareness about the problem and also some experience
in choosing the right machine learning algorithm.
ƒ Collection of such large amounts of data can sometimes
be an unmanageable and unwieldy task.

[1] https://electronicsforu.com/technology-trends/
machine-learning-basics-newbies/3
[2] https://www.analyticsvidhya.com/blog/2015/06/
machine-learning-basics/
[3] https://martechtoday.com/how-machine-learning-
works-150366
[4] https://machinelearningmastery.com/basic-concepts-
in-machine-learning/

References
Free download pdf