You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

1.3 KiB

Raw Blame History Unescape Escape

LLM_Evaluator

A simple program to evaluate large language model.

Python 3.8
torch 1.13.1+cu117
transformers 4.33.2
accelerate 0.26.1
tqdm 4.66.1
openai 0.28

需求其余文件

请下载GLM模型并放置于到./THUDM/chatglm-6b文件夹下
请下载GLM2模型并放置于到./THUDM/chatglm2-6b文件夹下
微调后的lora模型可放置于./lora文件夹下，可应用于ChatGLM2
微调后的ptuning模型可放置于./ptuning文件夹下，可应用于ChatGLM
训练数据按照C-Eval格式，放置于./data文件夹下，文件命名和eval.py中的subject_name相关
相较于C-Eval的数据集，代码添加了'qa'的数据集，放置于./data/qa文件夹下，为非选择题的问答数据集。

Run

python eval.py --model_name chatglm --cuda_device 0 --finetune ptuning1

Arguments

--model_name: 模型名称，可选chatglm、chatglm2
--cuda_device: GPU编号
--finetune: 微调模型名称，为放置于lora/ptuning文件夹下的文件夹名
--few_shot: 使用少量数据进行微调（可选）
--ntrain: 少量数据的数量（可选）
--cot: 使用思维链（可选）