NLP之WE之Skip-Gram:基于TF利用Skip-Gram模型实现词嵌入并进行可视化、过程全记录

NLP之WE之Skip-Gram:基于TF利用Skip-Gram模型实现词嵌入并进行可视化


输出结果

代码设计思路

代码运行过程全记录

3081 originated -> 12 as
3081 originated -> 5234 anarchism
12 as -> 3081 originated
12 as -> 6 a
6 a -> 12 as
6 a -> 195 term
195 term -> 2 of
195 term -> 6 a

Initialized
Average loss at step  0 :  299.51260376
Nearest to was: guelleh, joaquim, phylum, distributive, rotating, rebounds, stance, admirer,
Nearest to which: isomerism, kim, reminiscent, sos, wishful, bounce, lengthening, scranton,
Nearest to into: antigens, longtime, lambs, spake, royals, tius, macrophages, canova,
Nearest to these: teutoburg, fly, pranksters, october, atemi, slayton, ffff, achievable,
Nearest to and: contests, compromising, gunpowder, stripping, tuple, coyne, undocumented, attains,
Nearest to has: stimuli, widens, discos, obelisk, interrupting, crts, showa, fraenkel,
Nearest to time: doesn, deformities, hardback, brod, biathletes, artistic, graphene, doubtful,
Nearest to are: lining, grappelli, nsfnet, machado, genie, semicolon, sink, nongovernmental,
Nearest to with: wilbur, participation, nuisance, repetitive, chases, intersect, reformer, trout,
Nearest to their: disregard, ant, jerk, dire, computerized, madeleine, unrwa, herbrand,
Nearest to there: reset, gms, takeover, wizardry, stressed, cohn, diagram, intrigued,
Nearest to b: abbahu, mariano, ellipsis, unleashing, ideally, message, changeable, devil,
Nearest to will: grad, banna, lockwood, occurrence, sprague, dispersing, wai, dictated,
Nearest to many: nontrinitarian, tolvaj, ifn, supers, offspring, targ, minos, sacrificing,
Nearest to first: blas, satirizing, xenosaga, juice, fora, woes, hancock, wark,
Nearest to be: dethroned, women, lamia, taxonomic, irate, amu, woodrow, agnesi,
Average loss at step  2000 :  113.632992456
Average loss at step  4000 :  53.2114667625
Average loss at step  6000 :  32.964022455
Average loss at step  8000 :  23.6010678512
Average loss at step  10000 :  17.7899276738
Nearest to was: is, stance, in, has, fertile, had, engine, archive,
Nearest to which: boroughs, pleas, impressed, one, grouped, reminiscent, kim, and,
Nearest to into: in, for, oklahoma, would, otto, was, antigens, might,
Nearest to these: october, atemi, algebraically, fly, our, illustrations, edmonton, centers,
Nearest to and: in, of, or, as, the, with, zero, UNK,
Nearest to has: is, protons, was, captain, heavy, have, trivia, sadler,
Nearest to time: doesn, artistic, molinari, real, particulars, designation, privately, reproduction,
Nearest to are: is, were, and, for, sink, does, sigma, electric,
Nearest to with: in, and, for, of, as, significant, repetitive, pleas,
Nearest to their: the, ant, expected, disregard, overview, up, southeast, gothic,
Nearest to there: above, now, sanction, bckgr, stock, diagram, gods, professional,
Nearest to b: message, deaf, ford, UNK, devil, suspense, typical, career,
Nearest to will: occurrence, agave, fricatives, additional, wherein, aikido, yeast, heir,
Nearest to many: ambients, function, polemical, evidence, decades, capitalization, seward, gb,
Nearest to first: archie, juice, overview, hancock, vast, services, poll, active,
Nearest to be: have, crime, woodrow, women, need, lied, books, intense,
Average loss at step  12000 :  14.0033162159
Average loss at step  14000 :  11.7605496879
Average loss at step  16000 :  9.81052753329
Average loss at step  18000 :  8.56377221251
Average loss at step  20000 :  8.00662653184
Nearest to was: is, has, had, by, were, agouti, operatorname, are,
Nearest to which: that, and, this, acacia, also, agouti, UNK, boroughs,
Nearest to into: for, in, on, by, antigens, operatorname, and, amazonas,
Nearest to these: the, october, his, atemi, our, gollancz, algebraically, truetype,
Nearest to and: in, or, agouti, operatorname, of, apatosaurus, for, circ,
Nearest to has: was, is, had, have, agouti, were, by, are,
Nearest to time: doesn, artistic, privately, molinari, real, particulars, designation, verbal,
Nearest to are: were, is, was, and, but, for, operatorname, dimension,
Nearest to with: in, for, and, as, by, of, from, when,
Nearest to their: the, his, ant, gaul, ulyanov, circ, agouti, its,
Nearest to there: now, above, it, truetype, avant, this, takeover, sanction,
Nearest to b: d, deaf, and, message, american, derive, ford, builder,
Nearest to will: occurrence, fricatives, operatorname, grad, agouti, agave, seasonally, heir,
Nearest to many: some, ambients, polemical, offspring, these, epistles, function, decades,
Nearest to first: vast, hancock, juice, poll, and, active, integer, archie,
Nearest to be: have, was, is, not, cimmerian, lied, by, as,
Average loss at step  22000 :  7.10389018786
Average loss at step  24000 :  6.86780297577
Average loss at step  26000 :  6.71077035868
Average loss at step  28000 :  6.38663281119
Average loss at step  30000 :  5.8886589371
Nearest to was: is, had, has, were, agouti, by, became, operatorname,
Nearest to which: that, this, also, and, it, agouti, acacia, arin,
Nearest to into: by, for, from, on, in, amazonas, to, operatorname,
Nearest to these: many, october, atemi, the, his, our, some, algebraically,
Nearest to and: or, in, operatorname, agouti, apatosaurus, circ, one, for,
Nearest to has: was, had, is, have, agouti, by, were, trois,
Nearest to time: doesn, privately, artistic, molinari, particulars, hardback, verbal, hydrogen,
Nearest to are: were, is, was, but, have, be, dimension, operatorname,
Nearest to with: in, for, and, by, from, as, when, six,
Nearest to their: the, his, its, ant, ulyanov, gaul, circ, agouti,
Nearest to there: it, now, above, this, they, which, truetype, he,
Nearest to b: d, deaf, message, and, derive, american, builder, one,
Nearest to will: can, would, to, could, fricatives, seasonally, occurrence, heir,
Nearest to many: some, these, the, polemical, ambients, minos, offspring, spatially,
Nearest to first: vast, hancock, integer, enver, archie, active, present, poll,
Nearest to be: have, was, is, by, not, were, are, as,
Average loss at step  32000 :  5.94304618776
Average loss at step  34000 :  5.68958163345
Average loss at step  36000 :  5.80174588549
Average loss at step  38000 :  5.49352391732
Average loss at step  40000 :  5.27859743583
Nearest to was: is, has, had, were, by, became, agouti, be,
Nearest to which: that, this, also, it, and, one, agouti, who,
Nearest to into: from, on, by, in, amazonas, cue, zero, for,
Nearest to these: many, some, such, atemi, october, their, our, all,
Nearest to and: or, apatosaurus, in, agouti, operatorname, but, zero, dasyprocta,
Nearest to has: had, was, have, is, were, agouti, in, handwriting,
Nearest to time: privately, doesn, artistic, particulars, ciprofloxacin, verbal, inflected, truetype,
Nearest to are: were, is, have, was, but, senex, operatorname, sophocles,
Nearest to with: in, and, for, when, from, between, by, as,
Nearest to their: his, its, the, ulyanov, circ, these, gaul, agouti,
Nearest to there: now, it, they, which, this, he, above, also,
Nearest to b: d, deaf, message, UNK, builder, and, derive, seven,
Nearest to will: can, would, could, to, fricatives, seasonally, modules, agouti,
Nearest to many: some, these, polemical, ambients, those, several, minos, spatially,
Nearest to first: vast, rang, catania, hancock, active, present, second, republic,
Nearest to be: have, was, were, by, is, bali, not, been,
Average loss at step  42000 :  5.37903095126
Average loss at step  44000 :  5.25252643383
Average loss at step  46000 :  5.25466829693
Average loss at step  48000 :  5.22855424786
Average loss at step  50000 :  4.98294925487
Nearest to was: is, were, had, has, became, by, agouti, be,
Nearest to which: this, that, also, it, and, arin, dasyprocta, acacia,
Nearest to into: from, on, cue, by, in, operatorname, amazonas, to,
Nearest to these: some, many, such, atemi, all, the, their, braking,
Nearest to and: or, but, agouti, operatorname, apatosaurus, dasyprocta, four, six,
Nearest to has: had, was, have, is, agouti, were, in, vampire,
Nearest to time: privately, artistic, doesn, inflected, verbal, ciprofloxacin, truetype, agouti,
Nearest to are: were, is, have, was, be, sophocles, seven, semicolon,
Nearest to with: in, and, when, for, by, from, six, eight,
Nearest to their: his, its, the, yoannis, ulyanov, her, these, circ,
Nearest to there: it, now, they, which, this, also, he, above,
Nearest to b: d, message, deaf, UNK, builder, derive, circ, one,
Nearest to will: can, would, could, to, yoannis, seasonally, fricatives, and,
Nearest to many: some, these, several, those, polemical, ambients, two, spatially,
Nearest to first: second, rang, vast, catania, enver, bey, present, hancock,
Nearest to be: have, was, were, is, bali, by, been, are,
Average loss at step  52000 :  5.04070786262
Average loss at step  54000 :  5.16535256505
Average loss at step  56000 :  5.05901861382
Average loss at step  58000 :  5.06915320337
Average loss at step  60000 :  4.95321713698
Nearest to was: is, had, were, has, became, by, agouti, be,
Nearest to which: this, that, also, it, arin, acacia, dasyprocta, but,
Nearest to into: from, on, cue, eight, by, over, under, in,
Nearest to these: some, many, such, all, their, atemi, other, braking,
Nearest to and: or, but, apatosaurus, agouti, operatorname, circ, in, albury,
Nearest to has: had, have, was, is, agouti, in, vampire, eight,
Nearest to time: privately, michelob, doesn, artistic, ciprofloxacin, inflected, agouti, verbal,
Nearest to are: were, is, have, be, was, semicolon, pulau, six,
Nearest to with: in, and, between, when, for, by, six, repetitive,
Nearest to their: his, its, the, her, yoannis, ulyanov, michelob, these,
Nearest to there: it, they, now, this, which, he, also, above,
Nearest to b: d, message, derive, deaf, builder, walter, typical, ursus,
Nearest to will: can, would, could, yoannis, may, to, must, seasonally,
Nearest to many: some, these, several, those, ambients, polemical, three, two,
Nearest to first: second, rang, bey, catania, last, vast, present, enver,
Nearest to be: have, been, were, was, bali, pulau, by, are,
Average loss at step  62000 :  5.00975782585
Average loss at step  64000 :  4.84033363408
Average loss at step  66000 :  4.60538381827
Average loss at step  68000 :  4.96697095311
Average loss at step  70000 :  4.89936179054
Nearest to was: is, were, had, has, became, by, agouti, be,
Nearest to which: that, this, also, it, microcebus, arin, but, who,
Nearest to into: from, on, cue, through, over, under, eight, amazonas,
Nearest to these: some, many, such, all, their, other, which, they,
Nearest to and: or, but, agouti, mitral, microcebus, operatorname, apatosaurus, circ,
Nearest to has: had, have, was, is, agouti, eight, reparations, michelob,
Nearest to time: privately, michelob, artistic, ciprofloxacin, doesn, cebus, verbal, agouti,
Nearest to are: were, is, have, be, was, semicolon, but, upanija,
Nearest to with: when, between, in, and, repetitive, thaler, by, for,
Nearest to their: its, his, the, her, thaler, yoannis, michelob, some,
Nearest to there: they, it, now, which, he, this, still, usually,
Nearest to b: d, seven, UNK, message, derive, ursus, builder, circ,
Nearest to will: can, would, could, may, to, must, yoannis, thaler,
Nearest to many: some, these, several, various, those, ambients, polemical, thaler,
Nearest to first: second, last, bey, rang, catania, present, under, vast,
Nearest to be: been, have, were, was, bali, by, pulau, is,
Average loss at step  72000 :  4.76643716341
Average loss at step  74000 :  4.80032113028
Average loss at step  76000 :  4.72049832308
Average loss at step  78000 :  4.81389507228
Average loss at step  80000 :  4.81387408137
Nearest to was: is, had, were, has, became, by, agouti, be,
Nearest to which: this, that, also, it, microcebus, arin, but, dasyprocta,
Nearest to into: from, through, on, cue, under, to, over, amazonas,
Nearest to these: some, many, such, all, which, their, they, other,
Nearest to and: or, but, microcebus, operatorname, mitral, agouti, upanija, apatosaurus,
Nearest to has: had, have, was, is, agouti, in, reparations, vampire,
Nearest to time: privately, michelob, doesn, escuela, artistic, ciprofloxacin, cebus, emulation,
Nearest to are: were, is, have, be, was, pulau, semicolon, upanija,
Nearest to with: in, between, when, repetitive, thaler, and, make, by,
Nearest to their: its, his, the, her, thaler, yoannis, some, michelob,
Nearest to there: it, they, now, which, he, this, still, usually,
Nearest to b: d, UNK, message, seven, circ, ursus, agouti, derive,
Nearest to will: can, would, could, may, must, czes, to, yoannis,
Nearest to many: some, these, several, various, those, all, ambients, polemical,
Nearest to first: second, last, bey, catania, rang, during, present, under,
Nearest to be: been, have, were, by, was, bali, pulau, is,
Average loss at step  82000 :  4.76806989324
Average loss at step  84000 :  4.76034037805
Average loss at step  86000 :  4.78169750309
Average loss at step  88000 :  4.75147537124
Average loss at step  90000 :  4.72619627428
Nearest to was: is, had, has, were, became, by, been, agouti,
Nearest to which: that, this, also, microcebus, it, but, arin, who,
Nearest to into: from, through, on, under, cue, by, amazonas, over,
Nearest to these: some, many, such, all, which, several, they, their,
Nearest to and: or, but, microcebus, mitral, apatosaurus, agouti, operatorname, while,
Nearest to has: had, have, was, is, reparations, agouti, uses, unable,
Nearest to time: privately, michelob, doesn, escuela, days, artistic, ciprofloxacin, condorcet,
Nearest to are: were, have, is, be, pulau, those, include, upanija,
Nearest to with: between, in, when, repetitive, thaler, and, for, six,
Nearest to their: its, his, the, her, yoannis, thaler, our, michelob,
Nearest to there: they, it, now, he, which, this, usually, still,
Nearest to b: d, UNK, message, six, derive, seven, ursus, builder,
Nearest to will: can, would, could, may, must, czes, to, yoannis,
Nearest to many: some, several, these, various, those, all, groups, ambients,
Nearest to first: second, last, bey, jati, under, rang, catania, present,
Nearest to be: been, have, were, are, by, is, bali, pulau,
Average loss at step  92000 :  4.66262474084
Average loss at step  94000 :  4.70650331116
Average loss at step  96000 :  4.68686405849
Average loss at step  98000 :  4.59285849941
Average loss at step  100000 :  4.71168959773
Nearest to was: is, had, has, were, became, been, by, be,
Nearest to which: that, this, also, but, microcebus, it, arin, what,
Nearest to into: from, through, under, cue, on, amazonas, over, by,
Nearest to these: many, some, such, several, all, their, they, both,
Nearest to and: or, but, microcebus, operatorname, mitral, apatosaurus, agouti, circ,
Nearest to has: had, have, was, is, reparations, agouti, unable, vampire,
Nearest to time: michelob, escuela, privately, doesn, agouti, thaler, artistic, cebus,
Nearest to are: were, is, have, larch, be, include, semicolon, these,
Nearest to with: between, in, when, repetitive, by, thaler, for, mitral,
Nearest to their: its, his, the, her, thaler, our, yoannis, some,
Nearest to there: they, it, now, he, which, this, usually, still,
Nearest to b: d, derive, UNK, message, seven, deaf, circ, six,
Nearest to will: can, would, could, may, must, czes, to, should,
Nearest to many: some, several, these, various, those, all, groups, such,
Nearest to first: second, last, bey, rang, under, jati, third, catania,
Nearest to be: been, have, is, were, by, was, bali, pulau,
(0)

相关推荐

  • 无监督中文分词算法近年研究进展

    设为 "星标",重磅干货,第一时间送达! 转载自 | PaperWeekly ©PaperWeekly 原创 · 作者|韩蕊莘 学校|北京大学硕士生 研究方向|问答系统 SLM 论 ...

  • 深入理解 word2vec 原理

    Author:louwill From:深度学习笔记 语言模型是自然语言处理的核心概念之一.word2vec是一种基于神经网络的语言模型,也是一种词汇表征方法.word2vec包括两种结构:skip- ...

  • 机器学习竞赛必备基础知识_Word2Vec

    机器学习竞赛必备基础知识_Word2Vec

  • 不懂word2vec,还敢说自己是做NLP?

    选择"星标"公众号 重磅干货,第一时间送达! 前  言 如今,深度学习炙手可热,deep learning在图像处理领域已经取得了长足的进展.随着Google发布word2vec, ...

  • 【NLP实战】tensorflow词向量训练实战

    实战是学习一门技术最好的方式,也是深入了解一门技术唯一的方式.因此,NLP专栏计划推出一个实战专栏,让有兴趣的同学在看文章之余也可以自己动手试一试. 本篇介绍自然语言处理中最基础的词向量的训练. 作者 ...

  • word2vec中的数学模型

    word2vec中的数学模型

  • 手把手教你解决90%的NLP问题

    作者:Emmanuel Ameisen 编译:ronghuaiyang 导读 利用机器学习方法来理解和利用文本,从最简单的到state-of-the-art,由浅入深,循序渐进. 文本数据到处都是 无 ...

  • 【NLP-词向量】从模型结构到损失函数详解word2vec

    上周我们讲到,在进行NNLM训练时,能够得到副产品,词向量.本文介绍一种专门用于词向量制备的方法:word2vec,利用它能够高效的训练出词向量. 作者&编辑 | 小Dream哥 1 word ...

  • 特征工程|四种主流的embedding特征技术

    特征工程系列文章目前已经更新: 特征工程|数据的分类.特征工程的定义.意义和应用 特征工程|特征设计.特征可用性评估 特征工程|特征获取.特征规范和特征存储 特征工程|数据清洗.特征生成.特征拼接 特 ...

  • 论文|万物皆可Vector之Word2vec:2个模型、2个优化及实战使用

    本主题文章将会分为三部分介绍,每部分的主题为: word2vec的前奏-统计语言模型(点击阅读) word2vec详解-风华不减 其他xxx2vec论文和应用介绍 后续会更新Embedding相关的文 ...

  • YYDS!一个针对中文的预训练模型

    深度学习自然语言处理 一个热衷于深度学习与NLP前沿技术的平台,期待在知识的殿堂与你相遇~ 156篇原创内容 公众号 作者 | 周俊贤 整理 | NewBeeNLP 相信做中文NLP的同学和朋友们,对 ...