Papers

Deep Learning and Collective Intelligence | Paper: Collective intelligence for deep learning: A survey of recent developments

This is a summary of the paper "Collective intelligence for deep learning: A survey of recent developments," which explores the relationship between deep learning and collective intelligence. Introduction Collective intelligence for deep l…

Understanding GPT-2 | Paper Summary: Language Models are Unsupervised Multitask Learners

This is a summary of the GPT-2 paper "Language Models are Unsupervised Multitask Learners." Introduction Language Models are Unsupervised Multitask Learners Overview Method Creating the WebText Training Dataset BPE: Byte Pair Encoding Mode…

Understanding Organoid Intelligence | Paper Notes: Organoid intelligence (OI): the new frontier in biocomputing and intelligence-in-a-dish

This is a summary of the review paper "Organoid intelligence (OI): the new frontier in biocomputing and intelligence-in-a-dish," which covers developments in Organoid Intelligence (OI). Introduction Organoid intelligence (OI): the new fron…

Understanding the First GPT | Paper Notes: Improving Language Understanding by Generative Pre-Training

This is a summary of the first GPT paper, "Improving Language Understanding by Generative Pre-Training." Introduction Improving Language Understanding by Generative Pre-Training Overview Method Results Natural Language Inference Tasks Ques…

Reading the Transformer Paper: Attention Is All You Need

This is a summary of the seminal paper "Attention Is All You Need," which introduced the Transformer architecture. Introduction Attention Is All You Need Overview Method Model Architecture Training Method Results Translation Tasks Transfor…

Understanding Sakana.ai's Evolutionary Model Merging | Paper Notes: Evolutionary Optimization of Model Merging Recipes

This is a summary of the paper "Evolutionary Optimization of Model Merging Recipes," which describes Sakana.ai's evolutionary model merging approach. Introduction Evolutionary Optimization of Model Merging Recipes Overview Method Results L…

LLM and Brain Theory: Differences and Similarities with Active Inference

This article explores the differences and similarities between Active Inference from the Free Energy Principle and LLMs (Large Language Models), based on the paper "Predictive Minds: LLMs As Atypical Active Inference Agents." Introduction …

How to Read Research Papers: Learning from Andrew Ng

I'll share techniques for reading research papers from Andrew Ng's lecture video. Introduction How to Read Research Papers According to Andrew Ng Reading Papers Checking Your Understanding Conclusion References Introduction Recently, as I'…

Understanding 1-bit LLMs | Paper Notes: The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

This is a summary of the paper "The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits". Introduction The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Overview Method Results Conclusion/Thoughts References Intro…

LLM-Based 3D Modeling in Blender: Trying Out MeshGen/LLaMA-Mesh

Exploring LLaMA-Mesh, an LLM that outputs 3D model data, and its Blender add-on, MeshGen. In addition, we'll also try using ChatGPT for 3D modeling. Introduction Understanding LLaMA-Mesh 3D Modeling with MeshGen Installing MeshGen in Blend…