llm | Angel Vyas

I build things, break things, and write what I learn.

Currently exploring:

LLM Data Preprocessing
Nov 8, 2025 · 15 min read · llm
LLM Data Preprocessing Preprocessing text data is one of the most important steps in training Large Language Models (LLMs). The first step in this process is tokenization — the act of converting raw …
LLM Introduction
Nov 8, 2025 · 1 min read · llm
What is an LLM? LLM stands for Large Language Model. An LLM is a type of artificial intelligence model designed to understand, generate, and interact using human language. It learns patterns, …
Positional & Input Embeddings (Data Preprocessing)
Nov 7, 2025 · 5 min read · llm
📘3. Positional Embeddings 🧠 Why Do We Need Positional Embeddings? In embedding layer, the same tokens get mapped to the same vector representation. That means the model naturally has no idea about …
Token Embeddings (Data Preprocessing)
Nov 5, 2025 · 6 min read · llm
🧠2.Token Embeddings in LLMs They’re also often called vector embeddings or word embeddings. 🔤 From Tokens to Token Embeddings Before a model can “read” anything, it first breaks text into smaller …

LLM Data Preprocessing