AI Solutions

Discover and compare the best AI tools, rated by the community

Sort by:

LLMEval

focuses on understanding how these models perform in various scenarios and analyzing results from an interpretability perspective.

0 reviewsView details →

FELM

API

a meta-benchmark that evaluates how well factuality evaluators assess the outputs of large language models (LLMs).

0 reviewsView details →

DreamBench++

API

a benchmark for evaluating the performance of large language models (LLMs) in various tasks related to both textual and visual imagination.

0 reviewsView details →

MMToM-QA

API

a multimodal question-answering benchmark designed to evaluate AI models' cognitive ability to understand human beliefs and goals.

0 reviewsView details →

PubMedQA

API

a biomedical question-answering benchmark designed for answering research-related questions using PubMed abstracts.

0 reviewsView details →

MMedBench

API

a benchmark that evaluates large language models' ability to answer medical questions across multiple languages.

0 reviewsView details →

TAT-DQA

API

a large-scale Document Visual Question Answering (VQA) dataset designed for complex document understanding, particularly in financial reports.

0 reviewsView details →

SuperLim

API

a Swedish language understanding benchmark that evaluates natural language processing (NLP) models on various tasks such as argumentation analysis, semantic similarity, and textual entailment.

0 reviewsView details →

SciBench

API

benchmark designed to evaluate large language models (LLMs) on solving complex, college-level scientific problems from domains like chemistry, physics, and mathematics.

0 reviewsView details →

SuperBench

API

a benchmark platform designed for evaluating large language models (LLMs) on a range of tasks, particularly focusing on their performance in different aspects such as natural language understanding, reasoning, and generalization.

0 reviewsView details →

$We-Math$

We-Math

API

a benchmark that evaluates large multimodal models (LMMs) on their ability to perform human-like mathematical reasoning.

0 reviewsView details →

VisualWebArena

API

a benchmark designed to assess the performance of multimodal web agents on realistic visually grounded tasks.

0 reviewsView details →

WHOOPS!

API

a benchmark dataset testing AI's ability to reason about visual commonsense through images that defy normal expectations.

0 reviewsView details →

Tune Studio

API

Playground for devs to finetune & deploy LLMs

0 reviewsView details →

Guardrails.ai

API

AI tool from awesome-llm

0 reviewsView details →

MPT-7B

API

AI tool from awesome-llm

0 reviewsView details →

Chainlit

API

AI tool from awesome-llm

0 reviewsView details →

Arthur Shield

API

AI tool from awesome-llm

0 reviewsView details →

Grok-1-314B-MoE

API

AI tool from awesome-llm

0 reviewsView details →

Weights & Biases

API

AI tool from awesome-llm

0 reviewsView details →

Llama 3.2-1|3|11|90B

API

AI tool from awesome-llm

0 reviewsView details →

MLflow

API

MLflow: An open-source framework for the end-to-end machine learning lifecycle, helping developers track experiments, evaluate models/prompts, deploy models, and add observability with tracing.

0 reviewsView details →

Gemma2-9|27B

API

AI tool from awesome-llm

0 reviewsView details →

CS25-Transformers United

API

AI tool from awesome-llm

0 reviewsView details →

Showing 601-624 of 4,089 solutions