What would the output of skip-gram model look like?

Question:

To my understanding, the output of the skip-gram model must be compared with many training labels (depending on the window size)

My question is: Does the final output of the skip-gram model look like the description in this picture?

Ps. the most similar question I can find:[1]What does the multiple outputs in skip-gram mean?

Asked By: theabc50111

Source

Answers:

It’s hard to answer about what "should" happen in degenerate/toy/artificial cases, especially given how much randomness is used in the actual initialization/training.

Both the model’s internal weights and the ‘projection layer’ (aka ‘input vectors’ or just ‘word vectors’) are changed by backpropagation. So it can’t be answered what the internal-weights should be without also considering the intialization & updates to projection-weights. And nothing is meaningful with only two training examples, as opposed to "many many more examples than coudl be approximated by the model’s state".

If you think you’ve constructed a tiny case that’s informative when run, I’d suggest trying it against actual implementations to see what happens.

But beware: tiny models & training sets are likely to be weird, or allow for multiple/overfit/idiosyncratic end-states, in ways that don’t reveal much about how the algorithm behaves when used in its intended fashion – on large varied amounts of training data.

Answered By: gojomo

I made a skip-gram model demo in this GitHub repo. Hopefully, it can help people to understand how the skip-gram model works.

Answered By: Kenny Lee