AI Solutions

Roadmaps featuring essential concepts, learning methods, and the tools to put them into practice.

AI for Productivity

Curated List of AI Apps for productivity

NVIDIA Omniverse AI Animal Explorer Extension

AI Animal Explorer is an Omniverse extension that enables creators to quickly prototype unique 3D animal meshes.

Improving Language Understanding by Generative Pre-Training

AI tool from awesome-llm

Jeremy Howard’s Fast.ai & Data Institute Certificates

The in-person certificate courses are not free, but all of the content is available on Fast.ai as MOOCs.

Sebastian Thrun’s Introduction To Machine Learning

robust introduction to the subject and also the foundation for a Data Analyst “nanodegree” certification sponsored by Facebook and MongoDB.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

AI tool from awesome-llm

BeHonest

A pioneering benchmark specifically designed to assess honesty in LLMs comprehensively.

Berkeley Function-Calling Leaderboard

evaluates LLM's ability to call external functions/tools.

LLaMA: Open and Efficient Foundation Language Models

AI tool from awesome-llm

PaLM 2 Technical Report

AI tool from awesome-llm

PaLM-E: An Embodied Multimodal Language Model

AI tool from awesome-llm

Language Models are Unsupervised Multitask Learners

AI tool from awesome-llm

LiveBench

A Challenging, Contamination-Free LLM Benchmark.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

AI tool from awesome-llm

AlpacaEval

An Automatic Evaluator for Instruction-following Language Models using Nous benchmark suite.

Finetuned Language Models are Zero-Shot Learners

AI tool from awesome-llm

LawBench

a benchmark designed to evaluate large language models in the legal domain.

InfiBench

a benchmark designed to evaluate large language models (LLMs) specifically in their ability to answer real-world coding-related questions.

CompMix

a benchmark evaluating QA methods that operate over a mixture of heterogeneous input sources (KB, text, tables, infoboxes).

MathEval

a comprehensive benchmarking platform designed to evaluate large models' mathematical abilities across 20 fields and nearly 30,000 math problems.

CompassRank

CompassRank is dedicated to exploring the most advanced language and visual models, offering a comprehensive, objective, and neutral evaluation reference for the industry and research.

MixEval

a ground-truth-based dynamic benchmark derived from off-the-shelf benchmark mixtures, which evaluates LLMs with a highly capable model ranking (i.e., 0.96 correlation with Chatbot Arena) while running locally and quickly (6% the time and cost of running MMLU).

M3CoT

a benchmark that evaluates large language models on a variety of multimodal reasoning tasks, including language, natural and social sciences, physical and social commonsense, temporal reasoning, algebra, and geometry.