-
Timestamp 🔥
- [2024/10] We update new version of SciAssess.
- [2024/08] Add Ernie4 (Baidu Inc.) on SciAssess Benchmark.
- [2024/06] Introduce more annotated data to update Sciassess Benchmark and verify it on various models.
- [2024/05] Added Deepseek+PyPDF, Command-R-Plus+PyPDF on SciAssess Benchmark.
- [2024/05] Added Claude3+PyPDF, Qwen-api+PyPDF, Moonshot, Skylark+PyPDF on SciAssess Benchmark.
- [2024/05] Added Uni-Smart Nano on SciAssess Benchmark.
- [2024/04] We officially released SciAssess Benchmark! And test in several baseline LLM (Uni-Smart Pro, Gpt4-Withpdf, Gpt3.5-Withpdf).
Model |
Biology |
Chemistry |
Material |
Medicine |
Average |
Model |
MMLU Pro Biology |
Biology Chart QA |
Chemical Entities Recognition |
Compound Disease Recognition |
Disease Entities Recognition |
Gene Disease Function |
Model |
MMLU Pro Chemistry |
Electrolyte Table QA |
Polymer Chart QA |
Polymer Composition QA |
Reactant QA |
Reaction Mechanism QA |
Model |
Material QA |
Alloy Chart QA |
Temperature QA |
Sample Differentiation |
Treatment Sequence |
Model |
MMLU Pro Health |
Drug Chart QA |
Tag to Molecule |
Markush to Molecule |
Molecule in Document |