LeaderBoard


Benchmarking LLM Proficiency on SciAssess
  • Timestamp 🔥
    • [2024/10] We update new version of SciAssess.
    • [2024/08] Add Ernie4 (Baidu Inc.) on SciAssess Benchmark.
    • [2024/06] Introduce more annotated data to update Sciassess Benchmark and verify it on various models.
    • [2024/05] Added Deepseek+PyPDF, Command-R-Plus+PyPDF on SciAssess Benchmark.
    • [2024/05] Added Claude3+PyPDF, Qwen-api+PyPDF, Moonshot, Skylark+PyPDF on SciAssess Benchmark.
    • [2024/05] Added Uni-Smart Nano on SciAssess Benchmark.
    • [2024/04] We officially released SciAssess Benchmark! And test in several baseline LLM (Uni-Smart Pro, Gpt4-Withpdf, Gpt3.5-Withpdf).

Model Biology Chemistry Material Medicine Average
Model MMLU Pro Biology Biology Chart QA Chemical Entities Recognition Compound Disease Recognition Disease Entities Recognition Gene Disease Function
Model MMLU Pro Chemistry Electrolyte Table QA OLED Property Extraction Polymer Chart QA Polymer Composition QA Polymer Property Extraction Solubility Extraction Reactant QA Reaction Mechanism QA
Model Material QA Alloy Chart QA Composition Extraction Temperature QA Sample Differentiation Treatment Sequence
Model MMLU Pro Health Affinity Extraction Drug Chart QA Tag to Molecule Markush to Molecule Molecule in Document