使用 W2V 向量化容器¶
导入功能块¶
[1]:
from EduNLP.I2V import I2V, W2V, get_pretrained_i2v
E:\dev_env\anaconda\envs\data\lib\site-packages\gensim\similarities\__init__.py:15: UserWarning: The gensim.similarities.levenshtein submodule is disabled, because the optional Levenshtein package <https://pypi.org/project/python-Levenshtein/> is unavailable. Install Levenhstein (e.g. `pip install python-Levenshtein`) to suppress this warning.
warnings.warn(msg)
[2]:
items = [
r"题目一:如图几何图形.此图由三个半圆构成,三个半圆的直径分别为直角三角形$ABC$的斜边$BC$, 直角边$AB$, $AC$.$\bigtriangleup ABC$的三边所围成的区域记为$I$,黑色部分记为$II$, 其余部分记为$III$.在整个图形中随机取一点,此点取自$I,II,III$的概率分别记为$p_1,p_2,p_3$,则$\SIFChoice$$\FigureID{1}$",
r"题目二: 如图来自古希腊数学家希波克拉底所研究的几何图形.此图由三个半圆构成,三个半圆的直径分别为直角三角形$ABC$的斜边$BC$, 直角边$AB$, $AC$.$\bigtriangleup ABC$的三边所围成的区域记为$I$,黑色部分记为$II$, 其余部分记为$III$.在整个图形中随机取一点,此点取自$I,II,III$的概率分别记为$p_1,p_2,p_3$,则$\SIFChoice$$\FigureID{1}$"
]
向量化¶
使用EduNLP中公开的预训练模型¶
[3]:
save_dir = "../test_model/w2v"
i2v = get_pretrained_i2v("test_w2v", model_dir=save_dir)
item_vector, token_vector = i2v.infer_vector(items)
# or
item_vector = i2v.infer_item_vector(items)
token_vector = i2v.infer_token_vector(items)
print(len(item_vector), len(item_vector[0]))
print(len(token_vector), len(token_vector[0]), len(token_vector[0][0]))
EduNLP, INFO Use pretrained t2v model test_w2v
downloader, INFO http://base.ustc.edu.cn/data/model_zoo/EduNLP/w2v/w2v_test_256.zip is saved as ..\test_model\data\w2v\w2v_test_256.zip
downloader, INFO file existed, skipped
2 256
2 56 256
使用本地模型¶
[4]:
pretrained_path = "../test_model/w2v/w2v_test_256/w2v_test_256.kv"
i2v = W2V("pure_text", "w2v", pretrained_path)
item_vector, token_vector = i2v(items)
# or
item_vector, token_vector = i2v.infer_vector(items)
# or
item_vector = i2v.infer_item_vector(items)
token_vector = i2v.infer_token_vector(items)
print(len(item_vector), len(item_vector[0]))
print(len(token_vector), len(token_vector[0]), len(token_vector[0][0]))
2 256
2 56 256