Learn how to code a (almost) one liner python function to calculate (manually) cosine similarity or correlation matrices used in many data science algorithms using the broadcasting feature of numpy library in Python.

Image for post
Image for post
Photo by mostafa rezaee on Unsplash

Do you think we can say that a professional MotoGP rider and the kid in the picture have the same passion for motorsports even if they will never meet and are different in all the other aspects of their life ? If you think yes then you grasped the idea of cosine similarity and correlation.

Now suppose you work for a pay tv channel and you have…

My previous story was about the math intuition behind PCA. Today my objective is to apply those concepts to a simple use case and in particular to the part which often is overlooked being that of principal components interpretation.

Image for post
Image for post
(In game screenshot from the author)

As you probably know PCA is a sophisticated tool for dimensionality reduction. In extreme summary, using PCA, a dataset with a high number of variables could be hopefully represented by a small number of new variables which are (special) linear combination of the originals. …

Step by step intuition, mathematical principles and python code snippets behind one of the most important algorithms in unsupervised learning

This animation shows how the covariance matrix of the projected points (A…E) get diagonalized when the rotating direction reaches two special directions (being the solution to the eigen decomposition equation below). To verify it, slow down the video or try to pause it when the direction overlaps the orange dotted line (first principal component) or the second pink dotted line (second Principal componemt). You will also notice that the diagonal elements of the matrix reaches their maximum values (max variance). (Video created by the author using Geogebra6 sw)
Image for post
Image for post
This formula is called eigen-decomposition equation. Initial works started in the end of 1800 but only in recent times has become of practical usage in big data analysis thanks to advances in (personal) computing power.

Hi, everybody, my name is Andrea Grianti in Milan, Italy. I wrote…

If the title looks puzzling let me say that I believe that learning Python (or R) will take you to Machine Learning, but learning Linear Algebra will take you everywhere.

So going from using sw libraries to creating your stuff means moving with agility from numbers to vectors (and matrices) and often translating “Summations” expressions into equivalent one shot Vector/Matrix operations.

Unfortunately moving between these two worlds has some traps for beginners that might be good to refresh as they come up often at later stage in more complex situations.

Image for post
Image for post
Photo by Tiaan van Zyl on Unsplash

When we begin learning Linear Algebra (LA) the first chapters…

Step by step explanation on how EDM is represented in linear algebra and how to code it as a function in Python in just one line.

Image for post
Image for post
Pitagora : Euclide = Triangle : Geometry (drawing by : Andrea Grianti)

Hi everybody, in this post I want to explain my experience in figuring out how, a rather intuitive concept like that of the Euclidean Distance Matrix (EDM), could become a challenge if you decide to improve your (in my case Python) programming skills crossing the chasm from classical “for…loops” type of code toward the beauty of a single line of code using linear algebra concepts.

Why ? Because if you can solve a problem…

Image for post
Image for post
My visualization of the concept of clustering

Preliminary note: This post is the result of personal study on the subject. I studied computer science years ago at Politecnico of Milano university but data, especially business intelligence, became my profession. Data science has been the next step even if my background is more on programming and IT. I realise that there’s a lot to learn in this field so I tried to write with a beginner/student approach. Any suggestion for improvement is appreciated.

The objective of this article is explaining the results of the K Means algorithm that you get when you run it on your data using…

Andrea Grianti

IT Senior Manager and Consultant. Data Warehouse and Business Intelligence expertise in design and build. Freelance.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store