EduNLP.I2V

class EduNLP.I2V.i2v.I2V(tokenizer, t2v, *args, tokenizer_kwargs: Optional[dict] = None, pretrained_t2v=False, model_dir='/home/docs/.EduNLP/model', **kwargs)[source]

It just a api, so you shouldn’t use it directly. If you want to get vector from item, you can use other model like D2V and W2V.

Parameters
  • tokenizer (str) – the name of tokenizer. eg. bert, pure_text, …

  • t2v (str) – the name of token2vector model

  • args – the parameters passed to t2v

  • tokenizer_kwargs (dict) – the parameters passed to tokenizer

  • pretrained_t2v (bool) –

    • True: use pretrained t2v model

    • False: use your own t2v model

  • model_dir (str) – local directionary for saving online pretrained models, work only when pretrained_t2v=True

  • kwargs – the parameters passed to t2v

Examples

>>> item = {"如图来自古希腊数学家希波克拉底所研究的几何图形.此图由三个半圆构成,三个半圆的直径分别为直角三角形$ABC$的斜边$BC$,     ... 直角边$AB$, $AC$.$\bigtriangleup ABC$的三边所围成的区域记为$I$,黑色部分记为$II$, 其余部分记为$III$.在整个图形中随机取一点,    ... 此点取自$I,II,III$的概率分别记为$p_1,p_2,p_3$,则$\SIFChoice$$\FigureID{1}$"}
>>> model_dir = "examples/test_model/d2v"
>>> url, model_name, *args = get_pretrained_model_info('d2v_test_256')
>>> (); path = get_data(url, model_dir); () 
(...)
>>> path = path_append(path, os.path.basename(path) + '.bin', to_str=True)
>>> i2v = D2V("pure_text", "d2v", filepath=path, pretrained_t2v=False)
>>> i2v(item)
([array([ ...dtype=float32)], None)
Returns

i2v model

Return type

I2V

tokenize(items, *args, key=<function I2V.<lambda>>, **kwargs) list[source]
infer_vector(items, key=<function I2V.<lambda>>, **kwargs) tuple[source]
infer_item_vector(tokens, *args, **kwargs) ...[source]
infer_token_vector(tokens, *args, **kwargs) ...[source]
save(config_path)[source]
classmethod load(config_path, *args, **kwargs)[source]
classmethod from_pretrained(name, model_dir='/home/docs/.EduNLP/model', *args, **kwargs)[source]
property vector_size
class EduNLP.I2V.i2v.D2V(tokenizer, t2v, *args, tokenizer_kwargs: Optional[dict] = None, pretrained_t2v=False, model_dir='/home/docs/.EduNLP/model', **kwargs)[source]

The model aims to transfer item to vector directly.

Bases

I2V

param tokenizer

the tokenizer name

type tokenizer

str

param t2v

the name of token2vector model

type t2v

str

param args

the parameters passed to t2v

param tokenizer_kwargs

the parameters passed to tokenizer

type tokenizer_kwargs

dict

param pretrained_t2v

True: use pretrained t2v model False: use your own t2v model

type pretrained_t2v

bool

param kwargs

the parameters passed to t2v

Examples

>>> item = {"如图来自古希腊数学家希波克拉底所研究的几何图形.此图由三个半圆构成,三个半圆的直径分别为直角三角形$ABC$的斜边$BC$,     ... 直角边$AB$, $AC$.$\bigtriangleup ABC$的三边所围成的区域记为$I$,黑色部分记为$II$, 其余部分记为$III$.在整个图形中随机取一点,    ... 此点取自$I,II,III$的概率分别记为$p_1,p_2,p_3$,则$\SIFChoice$$\FigureID{1}$"}
>>> model_dir = "examples/test_model/d2v"
>>> url, model_name, *args = get_pretrained_model_info('d2v_test_256')
>>> (); path = get_data(url, model_dir); () 
(...)
>>> path = path_append(path, os.path.basename(path) + '.bin', to_str=True)
>>> i2v = D2V("pure_text","d2v",filepath=path, pretrained_t2v = False)
>>> i2v(item)
([array([ ...dtype=float32)], None)
returns

i2v model

rtype

I2V

infer_vector(items, tokenize=True, key=<function D2V.<lambda>>, *args, **kwargs) tuple[source]

It is a function to switch item to vector. And before using the function, it is necessary to load model.

Parameters
  • items (str) – the text of question

  • tokenize (bool) – True: tokenize the item

  • key (function) – determine how to get the text of each item

  • args – the parameters passed to t2v

  • kwargs – the parameters passed to t2v

Returns

vector

Return type

list

classmethod from_pretrained(name, model_dir='/home/docs/.EduNLP/model', *args, **kwargs)[source]
class EduNLP.I2V.i2v.W2V(tokenizer, t2v, *args, tokenizer_kwargs: Optional[dict] = None, pretrained_t2v=False, model_dir='/home/docs/.EduNLP/model', **kwargs)[source]

The model aims to transfer tokens to vector.

Bases

I2V

param tokenizer

the tokenizer name

type tokenizer

str

param t2v

the name of token2vector model

type t2v

str

param args

the parameters passed to t2v

param tokenizer_kwargs

the parameters passed to tokenizer

type tokenizer_kwargs

dict

param pretrained_t2v

True: use pretrained t2v model False: use your own t2v model

type pretrained_t2v

bool

param kwargs

the parameters passed to t2v

Examples

>>> (); i2v = get_pretrained_i2v("w2v_test_256", "examples/test_model/w2v"); () 
(...)
>>> item_vector, token_vector = i2v(["有学者认为:‘学习’,必须适应实际"]) 
>>> item_vector 
[array([...], dtype=float32)]
returns

i2v model

rtype

W2V

infer_vector(items, tokenize=True, key=<function W2V.<lambda>>, *args, **kwargs) tuple[source]

It is a function to switch item to vector. And before using the function, it is necessary to load model.

Parameters
  • items (str) – the text of question

  • tokenize (bool) – True: tokenize the item

  • key (function) – determine how to get the text of each item

  • args – the parameters passed to t2v

  • kwargs – the parameters passed to t2v

Returns

vector

Return type

list

classmethod from_pretrained(name, model_dir='/home/docs/.EduNLP/model', *args, **kwargs)[source]
class EduNLP.I2V.i2v.Elmo(tokenizer, t2v, *args, tokenizer_kwargs: Optional[dict] = None, pretrained_t2v=False, model_dir='/home/docs/.EduNLP/model', **kwargs)[source]

The model aims to transfer item and tokens to vector with Elmo.

Bases

I2V

param tokenizer

the tokenizer name

type tokenizer

str

param t2v

the name of token2vector model

type t2v

str

param args

the parameters passed to t2v

param tokenizer_kwargs

the parameters passed to tokenizer

type tokenizer_kwargs

dict

param pretrained_t2v

True: use pretrained t2v model False: use your own t2v model

type pretrained_t2v

bool

param kwargs

the parameters passed to t2v

returns

i2v model

rtype

Elmo

infer_vector(items: ~typing.Tuple[~typing.List[str], ~typing.List[dict], str, dict], *args, key=<function Elmo.<lambda>>, **kwargs) tuple[source]

It is a function to switch item to vector. And before using the function, it is necessary to load model.

Parameters
  • items (str or dict or list) – the item of question, or question list

  • return_tensors (str) – tensor type used in tokenizer

  • args – the parameters passed to t2v

  • kwargs – the parameters passed to t2v

Returns

vector

Return type

list

classmethod from_pretrained(name, model_dir='/home/docs/.EduNLP/model', *args, **kwargs)[source]
class EduNLP.I2V.i2v.Bert(tokenizer, t2v, *args, tokenizer_kwargs: Optional[dict] = None, pretrained_t2v=False, model_dir='/home/docs/.EduNLP/model', **kwargs)[source]

The model aims to transfer item and tokens to vector with Bert.

Bases

I2V

param tokenizer

the tokenizer name

type tokenizer

str

param t2v

the name of token2vector model

type t2v

str

param args

the parameters passed to t2v

param tokenizer_kwargs

the parameters passed to tokenizer

type tokenizer_kwargs

dict

param pretrained_t2v

True: use pretrained t2v model False: use your own t2v model

type pretrained_t2v

bool

param kwargs

the parameters passed to t2v

returns

i2v model

rtype

Bert

infer_vector(items: ~typing.Tuple[~typing.List[str], ~typing.List[dict], str, dict], *args, key=<function Bert.<lambda>>, return_tensors='pt', **kwargs) tuple[source]

It is a function to switch item to vector. And before using the function, it is nesseary to load model.

Parameters
  • items (str or dict or list) – the item of question, or question list

  • return_tensors (str) – tensor type used in tokenizer

  • args – the parameters passed to t2v

  • kwargs – the parameters passed to t2v

Returns

vector

Return type

list

classmethod from_pretrained(name, model_dir='/home/docs/.EduNLP/model', *args, **kwargs)[source]
class EduNLP.I2V.i2v.DisenQ(tokenizer, t2v, *args, tokenizer_kwargs: Optional[dict] = None, pretrained_t2v=False, model_dir='/home/docs/.EduNLP/model', **kwargs)[source]

The model aims to transfer item and tokens to vector with DisenQ. Bases ——- I2V :param tokenizer: the tokenizer name :type tokenizer: str :param t2v: the name of token2vector model :type t2v: str :param args: the parameters passed to t2v :param tokenizer_kwargs: the parameters passed to tokenizer :type tokenizer_kwargs: dict :param pretrained_t2v: True: use pretrained t2v model

False: use your own t2v model

Parameters

kwargs – the parameters passed to t2v

Returns

i2v model

Return type

DisenQ

infer_vector(items: ~typing.Tuple[~typing.List[str], ~typing.List[dict], str, dict], *args, key=<function DisenQ.<lambda>>, vector_type=None, **kwargs) tuple[source]

It is a function to switch item to vector. And before using the function, it is nesseary to load model. :param items: the item of question, or question list :type items: str or dict or list :param key: determine how to get the text of each item :type key: function :param args: the parameters passed to t2v :param kwargs: the parameters passed to t2v

Returns

vector

Return type

list

classmethod from_pretrained(name, model_dir='/home/docs/.EduNLP/model', **kwargs)[source]
class EduNLP.I2V.i2v.QuesNet(tokenizer, t2v, *args, tokenizer_kwargs: Optional[dict] = None, pretrained_t2v=False, model_dir='/home/docs/.EduNLP/model', **kwargs)[source]

The model aims to transfer item and tokens to vector with quesnet. Bases ——- I2V

infer_vector(items: ~typing.Tuple[~typing.List[str], ~typing.List[dict], str, dict], *args, key=<function QuesNet.<lambda>>, meta=['know_name'], **kwargs)[source]

It is a function to switch item to vector. And before using the function, it is nesseary to load model. :param items: the item of question, or question list :type items: str or dict or list :param tokenize: True: tokenize the item :type tokenize: bool, optional :param key: determine how to get the text of each item, by default lambdax: x :type key: function, optional :param meta: meta information, by default [‘know_name’] :type meta: list, optional :param args: the parameters passed to t2v :param kwargs: the parameters passed to t2v

Returns

  • token embeddings

  • question embedding

classmethod from_pretrained(name, model_dir='/home/docs/.EduNLP/model', *args, **kwargs)[source]
EduNLP.I2V.i2v.get_pretrained_i2v(name, model_dir='/home/docs/.EduNLP/model')[source]

It is a good idea if you want to switch item to vector earily.

Parameters
  • name (str) – the name of item2vector model e.g.: d2v_math_300 w2v_math_300 elmo_math_2048 bert_math_768 bert_taledu_768 disenq_math_256 quesnet_math_512

  • model_dir (str) – the path of model, default: MODEL_DIR = ‘~/.EduNLP/model’

Returns

i2v model

Return type

I2V

Examples

>>> item = {"如图来自古希腊数学家希波克拉底所研究的几何图形.此图由三个半圆构成,三个半圆的直径分别为直角三角形$ABC$的斜边$BC$,     ... 直角边$AB$, $AC$.$\bigtriangleup ABC$的三边所围成的区域记为$I$,黑色部分记为$II$, 其余部分记为$III$.在整个图形中随机取一点,    ... 此点取自$I,II,III$的概率分别记为$p_1,p_2,p_3$,则$\SIFChoice$$\FigureID{1}$"}
>>> (); i2v = get_pretrained_i2v("d2v_test_256", "examples/test_model/d2v"); () 
(...)
>>> print(i2v(item)) 
([array([ ...dtype=float32)], None)