Papers
This is a summary of the paper "Collective intelligence for deep learning: A survey of recent developments," which explores the relationship between deep learning and collective intelligence. Introduction Collective intelligence for deep l…
This is a summary of the GPT-2 paper "Language Models are Unsupervised Multitask Learners." Introduction Language Models are Unsupervised Multitask Learners Overview Method Creating the WebText Training Dataset BPE: Byte Pair Encoding Mode…
This is a summary of the review paper "Organoid intelligence (OI): the new frontier in biocomputing and intelligence-in-a-dish," which covers developments in Organoid Intelligence (OI). Introduction Organoid intelligence (OI): the new fron…
This is a summary of the first GPT paper, "Improving Language Understanding by Generative Pre-Training." Introduction Improving Language Understanding by Generative Pre-Training Overview Method Results Natural Language Inference Tasks Ques…
This is a summary of the seminal paper "Attention Is All You Need," which introduced the Transformer architecture. Introduction Attention Is All You Need Overview Method Model Architecture Training Method Results Translation Tasks Transfor…
This is a summary of the paper "Evolutionary Optimization of Model Merging Recipes," which describes Sakana.ai's evolutionary model merging approach. Introduction Evolutionary Optimization of Model Merging Recipes Overview Method Results L…
This article explores the differences and similarities between Active Inference from the Free Energy Principle and LLMs (Large Language Models), based on the paper "Predictive Minds: LLMs As Atypical Active Inference Agents." Introduction …
I'll share techniques for reading research papers from Andrew Ng's lecture video. Introduction How to Read Research Papers According to Andrew Ng Reading Papers Checking Your Understanding Conclusion References Introduction Recently, as I'…
This is a summary of the paper "The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits". Introduction The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Overview Method Results Conclusion/Thoughts References Intro…
Exploring LLaMA-Mesh, an LLM that outputs 3D model data, and its Blender add-on, MeshGen. In addition, we'll also try using ChatGPT for 3D modeling. Introduction Understanding LLaMA-Mesh 3D Modeling with MeshGen Installing MeshGen in Blend…