Chatbot: Cosine Similarity, python

Last Updated on

Cosine Similarityは2つの文書の類似性を測ります。

ChatbotやWebページ検索をはじめ、日常言語処理において広く用いられています。

 

I.Cosine Similarityの計算式

 

Cosine Similarity (d1, d2) = Dot product(d1, d2) / ||d1|| * ||d2||

説明は次にあります。

 

Tf-Idf and Cosine similarity

 

II.計算例

 

次の文書の例を紹介します。

 

Cosine similarity in data mining

 

2つの文書があり、それぞれのテキストが次のベルトルに変換されているとします。

 

d1 = (0, 3, 0, 0, 2, 0, 0, 2, 0, 5)

d2 = (1, 2, 0, 0, 1, 1, 0, 1, 0, 3)

 

Dot productを計算します。

 

Dot product(d1, d2) = 0*1 + 3*2 + 0*0 + 0*0 + 2*1 + 0*1 + 0*0 + 2*1 + 0*0 + 5*3 = 25

 

norm(ベクトルの長さ)を計算します。

 

Square root(0*0 + 3*3 + 0*0 + 0*0 + 2*2 + 0*0 + 0*0 + 2*2 + 0*0 + 5*5) =  6.481

Square root(1*1 + 2*2 + 0*0 + 0*0 +1*1 + 1*1 + 0*0 + 1*1 + 0*0 + 3*3) = 4.12

 

Cosine similarityを計算します。

 

cos(d1, d2) = 25/(6.481*4.12) = 0.94

 

III.Onlineで計算

 

次のページで計算できます。

 

COSINE SIMILARITY examples, formula and calculations

 

 

IV.Pythonで計算

 

ライブラリsklearnを使用します。

 

 

About shibatau

I was born and grown up in Kyoto. I studied western philosophy at the University and specialized in analytic philosophy, especially Ludwig Wittgenstein at the postgraduate school. I'm interested in new technology, especially machine learning and have been learning R language for two years and began to learn Python last summer. Listening toParamore, Sia, Amazarashi and MIyuki Nakajima. Favorite movies I've recently seen: "FREEHELD". Favorite actors and actresses: Anthony Hopkins, Denzel Washington, Ellen Page, Meryl Streep, Mia Wasikowska and Robert DeNiro. Favorite books: Fyodor Mikhailovich Dostoyevsky, "The Karamazov Brothers", Shinran, "Lamentations of Divergences". Favorite phrase: Salvation by Faith. Twitter: @shibatau

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.