You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
|
|
|
|
# LLM_Evaluator
|
|
|
|
|
|
|
|
|
|
A simple program to evaluate large language model.
|
|
|
|
|
|
|
|
|
|
## Recommend Requirements
|
|
|
|
|
|
|
|
|
|
- Python 3.8
|
|
|
|
|
- torch 1.13.1+cu117
|
|
|
|
|
- transformers 4.33.2
|
|
|
|
|
- accelerate 0.26.1
|
|
|
|
|
- tqdm 4.66.1
|
|
|
|
|
- openai 0.28
|
|
|
|
|
|
|
|
|
|
## 需求其余文件
|
|
|
|
|
|
|
|
|
|
- 请下载[GLM模型](https://hf-mirror.com/THUDM/chatglm-6b)并放置于到`./THUDM/chatglm-6b`文件夹下
|
|
|
|
|
- 请下载[GLM2模型](https://hf-mirror.com/THUDM/chatglm2-6b)并放置于到`./THUDM/chatglm2-6b`文件夹下
|
|
|
|
|
- 微调后的lora模型可放置于`./lora`文件夹下,可应用于ChatGLM2
|
|
|
|
|
- 微调后的ptuning模型可放置于`./ptuning`文件夹下,可应用于ChatGLM
|
|
|
|
|
- 训练数据按照C-Eval格式,放置于`./data`文件夹下,文件命名和`eval.py`中的`subject_name`相关
|
|
|
|
|
- 相较于C-Eval的数据集,代码添加了'qa'的数据集,放置于`./data/qa`文件夹下,为非选择题的问答数据集。
|
|
|
|
|
|
|
|
|
|
## Run
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
python eval.py --model_name chatglm --cuda_device 0 --finetune ptuning1
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Arguments
|
|
|
|
|
|
|
|
|
|
- `--model_name`: 模型名称,可选`chatglm`、`chatglm2`
|
|
|
|
|
- `--cuda_device`: GPU编号
|
|
|
|
|
- `--finetune`: 微调模型名称,为放置于`lora/ptuning`文件夹下的文件夹名
|
|
|
|
|
- `--few_shot`: 使用少量数据进行微调(可选)
|
|
|
|
|
- `--ntrain`: 少量数据的数量(可选)
|
|
|
|
|
- `--cot`: 使用思维链(可选)
|