使用 D2V 向量化容器¶
导入功能块¶
[1]:
from EduNLP.I2V import I2V, D2V, get_pretrained_i2v
d:\MySoftwares\Anaconda\envs\data\lib\site-packages\gensim\similarities\__init__.py:15: UserWarning: The gensim.similarities.levenshtein submodule is disabled, because the optional Levenshtein package <https://pypi.org/project/python-Levenshtein/> is unavailable. Install Levenhstein (e.g. `pip install python-Levenshtein`) to suppress this warning.
warnings.warn(msg)
[2]:
items = [
r"题目一:如图几何图形.此图由三个半圆构成,三个半圆的直径分别为直角三角形$ABC$的斜边$BC$, 直角边$AB$, $AC$.$\bigtriangleup ABC$的三边所围成的区域记为$I$,黑色部分记为$II$, 其余部分记为$III$.在整个图形中随机取一点,此点取自$I,II,III$的概率分别记为$p_1,p_2,p_3$,则$\SIFChoice$$\FigureID{1}$",
r"题目二: 如图来自古希腊数学家希波克拉底所研究的几何图形.此图由三个半圆构成,三个半圆的直径分别为直角三角形$ABC$的斜边$BC$, 直角边$AB$, $AC$.$\bigtriangleup ABC$的三边所围成的区域记为$I$,黑色部分记为$II$, 其余部分记为$III$.在整个图形中随机取一点,此点取自$I,II,III$的概率分别记为$p_1,p_2,p_3$,则$\SIFChoice$$\FigureID{1}$"
]
向量化¶
使用EduNLP中公开的预训练模型¶
D2V没有实现token向量化,只能获得 item(题目)的表征
[4]:
save_dir = "../test_model/d2v"
i2v = get_pretrained_i2v("d2v_test_256", model_dir=save_dir)
item_vector, _ = i2v.infer_vector(items)
# or
item_vector = i2v.infer_item_vector(items)
print(len(item_vector), len(item_vector[0]))
EduNLP, INFO Use pretrained t2v model d2v_test_256
downloader, INFO http://base.ustc.edu.cn/data/model_zoo/modelhub/doc2vec_pub/1/d2v_test_256.zip is saved as ..\test_model\d2v\d2v_test_256.zip
downloader, INFO file existed, skipped
2 256
使用本地模型¶
[5]:
pretrained_path = "../test_model/d2v/d2v_test_256/d2v_test_256.bin"
i2v = D2V("pure_text", "d2v", pretrained_path)
item_vector, _ = i2v(items)
# or
item_vector, _ = i2v.infer_vector(items)
# or
item_vector = i2v.infer_item_vector(items)
print(len(item_vector), len(item_vector[0]))
2 256