Deep Bayes: Adaptive skip-gram
Introduction
这里记录的是skip-gram模型的改进。转载请注明。
Ref:Deep Bayes slides
Skip-gram model
Distributional hypothesis: similar words appear in similar contexts.
Gradient update:
Summary
learns high-quality semantically rich embeddings
Sparse gradients
Very efficient parallel training
Problem
For some words only one meaning is captured.
For other meanings get uncontrollably mixed up.
Solution: latent-variable model
Latent-variable skip-gram
Training via variational EM
observed variables:
Hidden variables:
Parameters:
Chinese Restaurant Process
Summary
这篇记录了如何通过非参数先验去解决skip-gram一词一意切表达能力不足的问题,通过使用sticking process建模Dirichlet Process以及使用stochastic variational inference来解决这些问题,而且效率还行。
算法小屋 文章被收录于专栏
不定期分享各类算法以及面经。同时也正在学习相关分布式技术。欢迎一起交流。