流水线¶

流水线由文档预处理和任务推理两部分组成，使用pipeline()轻松构建流水线，本示例将展示： * 仅对文档进行预处理 * 直接对文档应用任务推理 * 先对文档进行预处理，之后再应用任务推理

[4]:

from EduNLP.Pipeline import pipeline

仅进行预处理¶

预处理流水线提供了一系列针对SIF处理和成分分解的组件，并且允许自定义组件。这些组件可以在流水线中按顺序调用。

[2]:

item = "如图所示，则三角形ABC的面积是_。"

我们提供了一些常用的管道，您可以在初始化流水线时通过名称来构建。（注意，初始化时构建的管道实例无法为其指定参数）

[3]:

processor = pipeline(preprocess=['is_sif', 'to_sif', 'is_sif', 'seg_describe'])

也可以通过插入的方式修改流水线，以此种方式可以为管道传入参数，例如：

[4]:

processor.add_pipe(name='seg', symbol='fm', before='seg_describe')

这在流水线中seg_describe组件之前插入了一个seg管道，并且指定了参数为symbol='fm'

查看流水线中所有组件内容：

[6]:

print(processor.component_names)

['is_sif', 'to_sif', 'is_sif', 'seg', 'seg_describe']

应用流水线对文档进行处理：

[7]:

processor(item)

False
True
{'t': 3, 'f': 1, 'g': 0, 'm': 1}

[7]:

<bound method Kernel.raw_input of <ipykernel.ipkernel.IPythonKernel object at 0x7ff65898ab80>>

直接应用任务推理¶

通过指定应用任务名称，使用默认模型来进行任务推理。

[ ]:

processor = pipeline(task="property-prediction")
processor(item)

通过指定应用任务名称，使用自定义模型来进行任务推理。

[1]:

from EduNLP.ModelZoo.rnn import ElmoLMForPropertyPrediction
from EduNLP.Pretrain import ElmoTokenizer

自定义模型准备。

[6]:

pretrained_pp_dir = f"examples/test_model/elmo/elmo_pp"
tokenizer = ElmoTokenizer.from_pretrained(pretrained_pp_dir)
model = ElmoLMForPropertyPrediction.from_pretrained(pretrained_pp_dir)
model.eval()
text='有公式$\\FormFigureID{wrong1?}$和公式$\\FormFigureBase64{wrong2?}$，如图$\\FigureID{088f15ea-8b7c-11eb-897e-b46bfc50aa29}$,若$x,y$满足约束条件$\\SIFSep$，则$z=x+7 y$的最大值为$\\SIFBlank$'

[EduNLP, INFO] All the weights of ElmoLMForPropertyPrediction were initialized from the model checkpoint at examples/test_model/elmo/elmo_pp.
If your task is similar to the task the model of the checkpoint was trained on, you can already use ElmoLMForPropertyPrediction for predictions without further training.

[7]:

pl=pipeline(task='property-prediction', model=model, tokenizer=tokenizer)
print(pl([text, text]))

[{'property': 0.4843716621398926}, {'property': 0.4843716621398926}]

/Users/lipingzhi/Desktop/nnnyt/EduNLP/EduNLP/ModelZoo/rnn/rnn.py:354: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  (outputs.forward_output[torch.arange(len(seq_len)), torch.tensor(seq_len) - 1],

预处理与任务推理结合¶

基本是上面两部分的顺序组合，例如：

[ ]:

processor = pipeline(task="property-prediction", preprocess=['is_sif', 'to_sif', 'is_sif', 'seg_describe'])
processor(item)