A simple program to evaluate large language model.
You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
PeterAlbus b3f8e768ff Make the process coherent.
The saving of results has been optimized.
10 months ago
evaluators Make the process coherent. 10 months ago
scoring Make the process coherent. 10 months ago
.gitignore Init commit. Add Evaluators and support ChatGLM/ChatGLM2. 10 months ago
README.md Write README.md. 10 months ago
eval.py Make the process coherent. 10 months ago
generate_eval_text.py Init commit. Add Evaluators and support ChatGLM/ChatGLM2. 10 months ago

README.md

LLM_Evaluator

A simple program to evaluate large language model.

Recommend Requirements

  • Python 3.8
  • torch 1.13.1+cu117
  • transformers 4.33.2
  • accelerate 0.26.1
  • tqdm 4.66.1
  • openai 1.10.0

需求其余文件

  • 请下载GLM模型并放置于到./THUDM/chatglm-6b文件夹下
  • 请下载GLM2模型并放置于到./THUDM/chatglm2-6b文件夹下
  • 微调后的lora模型可放置于./lora文件夹下可应用于ChatGLM2
  • 微调后的ptuning模型可放置于./ptuning文件夹下可应用于ChatGLM
  • 训练数据按照C-Eval格式放置于./data文件夹下,文件命名和eval.py中的subject_name相关

Run

python eval.py --model_name chatglm --cuda_device 0 --finetune ptuning1

Arguments

  • --model_name: 模型名称,可选chatglmchatglm2
  • --cuda_device: GPU编号
  • --finetune: 微调模型名称,为放置于lora/ptuning文件夹下的文件夹名
  • --few_shot: 使用少量数据进行微调(可选)
  • --ntrain: 少量数据的数量(可选)
  • --cot: 使用思维链(可选)