Human motion similarity is practiced in many fields, including action recognition, anomaly detection, and human performance evaluation. While many computer vision tasks have benefited from deep learning, measuring motion similarity has attracted less attention, particularly due to the lack of large datasets. To address this problem, we introduce two datasets: a synthetic motion dataset for model training and a dataset containing human annotations of real-world video clip pairs for motion similarity evaluation. Furthermore, in order to compute the motion similarity from these datasets, we propose a deep learning model that produces motion embeddings suitable for measuring the similarity between different motions of each human body part. The network is trained with the proposed motion variation loss to robustly distinguish even subtly different motions. The proposed approach outperforms the other baselines considered in terms of correlations between motion similarity predictions and human annotations while being suitable for real-time action analysis. Both datasets and codes are released to the public.