Thursday, June 11, 2026

Interactive AI Learning

Polo Club of Data Science (Georgia Tech)

Website:
https://poloclub.github.io/

The Polo Club team has created some of the most impressive interactive explainers in modern AI education. Their projects combine animations, visualizations, and user interaction to make complex machine learning concepts intuitive.

Recommended Explorables

Project Why It Is Worth Exploring
CNN Explainer One of the best visual explanations of Convolutional Neural Networks ever produced.
Transformer Explainer Interactive walkthrough of attention mechanisms and transformer architectures.
CNN 101 Beginner-friendly introduction to how convolutional networks process images.
Interactive Classification Demonstrates classification concepts through direct experimentation.
Communicating with Interactive Articles Shows how interactive storytelling can improve technical communication.

Fred Hohman

Website:
https://fredhohman.com/

Fred Hohman is widely recognized for his work in machine learning visualization, human-centered AI, and interactive systems for explainable AI. His work demonstrates how complex models can become more understandable when paired with effective visual interfaces.

Recommended Reading

Article Focus Area
Communicating with Interactive Articles Explains how interactive documents improve technical learning and engagement.
Interactive Scalable Interfaces for Machine Learning Interpretability Focuses on making machine learning systems understandable through visualization.
Parametric Press A collection of beautifully designed interactive storytelling experiences.

Google AI Explorables

Google has also invested significantly in interactive educational content through its AI Explorables initiative and the PAIR (People + AI Research) team.

Recommended Resources

Resource Purpose
AI Explorables Interactive visual explanations covering important AI and ML concepts.
PAIR Interactive Visualizations Tools and demonstrations designed to improve AI transparency and understanding.

Final Thoughts

These projects demonstrate an important lesson for technical education: understanding complex systems often requires more than words.

Interactive explanations allow readers to experiment, visualize internal mechanisms, and build intuition that is often difficult to achieve through static text alone.

Whether you are learning about convolutional neural networks, transformers, explainable AI, or machine learning interpretability, these resources represent some of the finest examples of technical communication available today.

Useful Links

Wednesday, June 10, 2026

Junk Computers, Big Dream and The Turn Around : Extraordinary Story of Two Engineers Who Shaped The Thinking of Modern Distributed Computing

The story of two engineers who not only realized the growth of a startup into a tech behemoth, but also shaped the entire branch of modern distributed computing

When people think about Google, they usually think of Larry Page, Sergey Brin, search engines, artificial intelligence, Android, Gmail, and billions of users. Very few people know the names Jeff Dean and Sanjay Ghemawat.

Yet behind Google's rise lies one of the most important engineering partnerships in technology history.

Their friendship, trust, and technical brilliance solved a problem that threatened to limit Google's growth. In doing so, they created technologies that would later influence almost every modern cloud platform.

The Problem: Google Had More Ambition Than Money

In the late 1990s and early 2000s, Google faced a difficult challenge.

The company needed enormous computing power to crawl the web, build search indexes, and answer millions of user queries. Traditional wisdom suggested buying expensive enterprise-grade servers from major hardware vendors.

Google could have spent around $800,000 on a relatively small number of premium servers.

Instead, the company made a radical decision.

It spent roughly $250,000 buying large numbers of cheap, often used, commodity PCs.

On paper, this looked reckless.

These machines failed frequently. Hard disks crashed. Power supplies died. Network cards malfunctioned. Individual computers could not be trusted.

Most companies tried to eliminate hardware failures.

Google decided to assume hardware failures were inevitable.

That single decision changed the future of computing.

A Different Philosophy

The challenge was obvious.

If your infrastructure is built from unreliable machines, how do you build a reliable service?

The answer came from two engineers: Jeff Dean and Sanjay Ghemawat.

Rather than relying on expensive hardware, they wrote software that could automatically detect failures, recover data, redistribute work, and continue operating even when individual machines died.

"Hardware will fail. Design software that expects it."

Today this idea sounds normal.

At the time, it was revolutionary.

The Birth of Google File System (GFS)

The first major breakthrough was the Google File System (GFS).

Instead of storing data on a single expensive machine, GFS spread data across many cheap computers.

Multiple copies of each piece of data were stored throughout the cluster.

If one machine failed, another copy was available.

Users never noticed.

The system continued operating.

For Google, hardware failures became routine events rather than disasters.

This was one of the earliest demonstrations that software could provide reliability even when the underlying hardware could not.

The Birth of MapReduce

Storing data was only half the problem.

Google also needed a way to process enormous amounts of information.

Imagine analyzing billions of web pages.

One machine could never do it fast enough.

Dean and Ghemawat developed MapReduce.

The idea was elegant.

  1. Break a massive problem into thousands of smaller tasks.
  2. Distribute those tasks across many machines.
  3. Process everything in parallel.
  4. Combine the results.

If a machine failed halfway through the job, the software simply reassigned the work to another machine.

The computation continued.

Again, software compensated for hardware failures.

MapReduce became one of the most influential distributed computing models ever created.

Many modern big-data systems trace their roots directly to these ideas.

The Foundation of Modern Data Platforms

The concepts pioneered by Dean and Ghemawat did not stop with GFS and MapReduce.

Their work inspired entire generations of distributed systems.

Technologies such as Hadoop, Spark, cloud storage platforms, and large-scale analytics systems were influenced by the architectural principles they introduced.

The modern world of big data stands on foundations they helped create.

What began as a practical solution for running Google on inexpensive hardware eventually transformed the entire technology industry.

A Friendship Built on Trust

Technical brilliance alone does not explain the story.

The remarkable part is how Jeff Dean and Sanjay Ghemawat worked together.

For years, they operated as an extraordinarily effective engineering partnership.

  • Each trusted the other's judgment.
  • Each understood the other's strengths.
  • They challenged ideas.
  • They refined designs.
  • They solved problems together.

Many legendary achievements in technology are attributed to individuals.

Google's infrastructure revolution was the product of collaboration.

Their friendship created an environment where ambitious ideas could be tested, improved, and executed at an extraordinary pace.

The Invisible Heroes of Google

  • Users saw a search box.
  • Advertisers saw a growing platform.
  • Investors saw a rapidly expanding company.

Behind the scenes, Dean and Ghemawat built the machinery that made Google's growth possible.

Without scalable infrastructure, Google's search engine could not have handled the explosive growth of the internet.

Without fault-tolerant systems, operating at Google's scale would have been prohibitively expensive.

Without distributed computing, processing the world's information would have remained a dream rather than a reality.

Larry Page and Sergey Brin gave Google its vision.

Jeff Dean and Sanjay Ghemawat helped make that vision scalable.

Lessons for Every Startup

1. Constraints Can Create Innovation

Google could not afford unlimited amounts of premium hardware.

Instead of treating this as a disadvantage, it became the catalyst for innovation.

Sometimes limitations force better solutions than abundance.

2. Software Can Be More Valuable Than Hardware

Many organizations try to solve problems by buying better equipment.

Google solved its problem by writing better software.

The resulting innovation was far more valuable than any hardware purchase could have been.

3. Great Companies Need Great Partnerships

Technology history often focuses on founders and CEOs.

But many transformative breakthroughs come from trusted partnerships between engineers.

The friendship between Jeff Dean and Sanjay Ghemawat reminds us that collaboration can be as powerful as individual genius.

Conclusion

Google's rise was not powered by expensive machines.

It was powered by a radical idea: accept that computers will fail and design software that keeps working anyway.

Jeff Dean and Sanjay Ghemawat turned that idea into reality.

By building systems that transformed unreliable hardware into reliable infrastructure, they enabled Google to scale from a promising startup into one of the most influential companies in history.

Their story is not merely about technology.

It is a story about friendship, trust, ingenuity, and the belief that great software can overcome seemingly impossible constraints.

Sometimes the people who change the world are not the ones on stage.

They are the engineers quietly building the foundations beneath it.

Learn More About the Engineers Behind Google's Infrastructure Revolution

Jeff Dean

Jeff Dean is one of the most influential engineers in computing history. Over more than two decades at Google, he has contributed to many of the systems that enabled Google to scale globally, including MapReduce, BigTable, TensorFlow, and numerous large-scale machine learning systems.

LinkedIn Profile:
Jeff Dean (Google Chief Scientist)

Sanjay Ghemawat

Sanjay Ghemawat is a distinguished Google engineer and one of the principal architects behind Google File System (GFS), MapReduce, and BigTable.

LinkedIn Search:
Sanjay Ghemawat LinkedIn Search Results

Sanjay Ghemawat maintains a much lower public profile than Jeff Dean, and a widely accessible public LinkedIn profile is not readily available.

Essential Reading: Google's Engineering Philosophy

This collection of articles reflects the practical engineering culture that engineers like Jeff Dean and Sanjay Ghemawat helped establish at Google.

Key Lesson:

Optimizing a small benchmark is not the same as optimizing a real-world system.

The article explains why engineers should focus on end-to-end system performance rather than isolated measurements. This philosophy mirrors the approach that led Google to build systems like GFS and MapReduce: solving problems at scale rather than chasing small local optimizations.

Original Research Papers Worth Reading

Together, these papers form the intellectual foundation of much of today's cloud computing ecosystem.

Spring Boot Interceptors vs .NET Action Filters

Spring Boot Interceptors and .NET Action Filters are highly equivalent in terms of purpose, design, and behavior. Both allow you to run code before or after an incoming request hits your core business logic (your controllers) without cluttering the controllers themselves.

There is a slight misconception about Spring Boot being entirely a backend framework, which we'll clear up below, but your architectural comparison is spot-on.

Key Takeaway
If you are familiar with ASP.NET Core Action Filters, you can think of a Spring MVC Interceptor as almost the same concept implemented in the Spring ecosystem.

How They Match Up (Concept by Concept)

If you are coming from .NET to Spring Boot, here is how the concepts map directly to one another:

Feature / Concept .NET (ASP.NET Core) Spring Boot (Spring MVC)
The Component ActionFilterAttribute or IActionFilter HandlerInterceptor
Before the Controller Executes OnActionExecuting() preHandle()
After the Controller Executes (Before View) OnActionExecuted() postHandle()
After Everything is Done (Cleanup / Exceptions) OnResultExecuted() or Middleware cleanup afterCompletion()
Registration Registered globally, via Controllers, or Dependency Injection Registered via a WebMvcConfigurer configuration class

Typical Use Cases

Both frameworks commonly use Interceptors / Action Filters for the following cross-cutting concerns:

  • Logging and Metrics — Tracking request execution times, request tracing, and performance metrics.
  • Authentication & Authorization — Verifying tokens, sessions, permissions, and access rights before the controller executes.
  • Request Manipulation — Enriching the request context with additional metadata such as correlation IDs, tenant information, or tracing identifiers.
  • Auditing — Capturing user activity and recording security-relevant operations.
  • Monitoring — Integrating with observability platforms such as Prometheus, Grafana, Application Insights, or OpenTelemetry.

The Subtle Difference: Interceptors vs Filters

In both ecosystems, developers sometimes confuse Interceptors with Filters. Understanding where each component sits in the request processing pipeline is important.

Important:
Filters operate closer to the raw HTTP layer, while Interceptors / Action Filters operate closer to the MVC framework layer where controller metadata is available.

In .NET

  • Middleware handles raw HTTP requests very early in the request pipeline.
  • Action Filters execute later inside the MVC framework.
  • Action Filters have access to controller metadata, action parameters, model binding information, and execution context.

In Spring Boot

  • Servlet Filters execute early and process raw HTTP requests and responses.
  • Handler Interceptors execute later after Spring MVC maps the request to a controller.
  • Interceptors are aware of the selected controller handler and MVC execution context.

Pipeline Comparison

ASP.NET Core

HTTP Request
    ↓
Middleware
    ↓
Action Filter
    ↓
Controller Action
    ↓
Response

Spring Boot

HTTP Request
    ↓
Servlet Filter
    ↓
Handler Interceptor
    ↓
Controller
    ↓
Response

Final Conclusion

Spring MVC Interceptor ≈ ASP.NET Core Action Filter

Both are framework-aware request interception mechanisms that allow developers to execute logic:
  • Before controller execution
  • After controller execution
  • After request completion
  • Without polluting business logic
The primary difference is not in capability but in terminology and framework implementation details.

Encoder/Decoders/Transformers and Stable Diffusion

Table 1: The Big Picture

Term Purpose Input Output Example
Encoder Compress / understand data Raw data Latent representation (embedding) BERT, Sentence-BERT, CLIP Text Encoder
Decoder Generate or reconstruct data Latent representation Output data GPT, VAE Decoder
Autoencoder Learn compressed representations Input data Reconstructed input Image Autoencoder
Autodecoder Learn latent vectors directly Learned latent code Output data DeepSDF, Neural Shape Models

Table 2: Encoder vs Decoder

Aspect Encoder Decoder
Main Goal Understanding Generation / Reconstruction
Direction Input → Latent Latent → Output
Typical Output Embedding Text, Image, Audio, etc.
Used For Search, Retrieval, Classification Generation, Reconstruction
Example BERT GPT

Table 3: Common Encoder Examples

Model Architecture Input Output Purpose
BERTTransformer EncoderTextEmbeddingUnderstanding
Sentence-BERTTransformer EncoderTextSentence EmbeddingSemantic Search
E5Transformer EncoderTextEmbeddingRAG Retrieval
BGETransformer EncoderTextEmbeddingVector Search
CLIP Text EncoderTransformer EncoderTextText EmbeddingText-to-Image
ResNetCNN EncoderImageFeature VectorVision Tasks
ViTTransformer EncoderImageImage EmbeddingVision Tasks

Table 4: Is an Embedding Model an Encoder?

Model Type Encoder? Example
Embedding Model Yes (usually) BERT, E5, BGE
RAG Embedding Model Yes E5, BGE
GPT No (Decoder-only) GPT-4, Llama
CLIP Text Encoder Yes Stable Diffusion
Rule of Thumb:

Embedding Model ≈ Encoder

Table 5: Decoder Types

Decoder Type Input Output Example
VAE Decoder Latent Vector Image Stable Diffusion VAE
CNN Decoder Feature Maps Segmentation Mask / Image U-Net
RNN Decoder Context Vector Sequence Old Translation Models
Transformer Decoder Previous Tokens Next Token GPT, Llama
Diffusion Decoder* Noise Image Stable Diffusion
Note: Diffusion Decoder is not a formal category; it is commonly used informally.

Table 6: Transformer Decoder vs Generic Decoder

Feature Decoder Transformer Decoder
Meaning General Concept Specific Architecture
Purpose Latent → Output Sequence Generation
Attention Mechanism Optional Yes
Token-by-Token Generation Not Required Yes
Example VAE Decoder GPT

Relationship Hierarchy

Decoder
├── VAE Decoder
├── CNN Decoder
├── RNN Decoder
└── Transformer Decoder
      ├── GPT
      ├── Llama
      ├── Gemini
      └── Claude

Table 7: Can Transformer Decoders Work Only on Text?

Data Type Can Use Transformer Decoder? Example
Text Yes GPT
Images (tokenized) Yes ImageGPT
Audio Yes AudioLM
Music Yes MusicLM
Protein Sequences Yes ProGen
Video (tokenized) Yes Various Video Transformers
Better Definition

Transformer Decoder = Sequence Generator

NOT

Transformer Decoder = Text Generator

Table 8: How GPT (Decoder-Only) is Trained

Training Sentence:

I love Kubernetes

Input Target
I love
I love Kubernetes
I love Kubernetes <EOS>

Loss Function
CrossEntropy(PredictedToken, ActualToken)
Thus a decoder does have a target: the next token.

Table 9: Autoencoder vs Autodecoder

Feature Autoencoder Autodecoder
Encoder Present? Yes No
Decoder Present? Yes Yes
Latent Vector Source Produced by Encoder Directly Learned
Typical Use Compression, Denoising 3D Shapes, Neural Fields
Example Variational Autoencoder DeepSDF

Autoencoder Flow

Input
 ↓
Encoder
 ↓
Latent
 ↓
Decoder
 ↓
Reconstructed Input

Autodecoder Flow

Learned Latent Code
        ↓
      Decoder
        ↓
      Output

Table 10: Stable Diffusion Components

Component Type Purpose
Text Encoder Transformer Encoder Understand Prompt
U-Net Diffusion Network Denoising
VAE Encoder Encoder Compress Images
VAE Decoder Decoder Reconstruct Images
Scheduler Control Logic Manage Denoising Steps

Table 11: When is the VAE Encoder Used in Stable Diffusion?

Operation VAE Encoder Used?
Model Training Yes
Text → Image No
Image → Image Yes
Inpainting Yes
Outpainting Yes

Table 12: Stable Diffusion (Training)

Image
  ↓
VAE Encoder
  ↓
Image Latent
  ↓
Add Noise
  ↓
U-Net
  ↓
Predict Noise
During training, Stable Diffusion learns how to remove noise from latent image representations.

Table 13: Stable Diffusion (Text-to-Image Inference)

Prompt
   ↓
Text Encoder
   ↓
Text Embeddings
                +
Random Latent Noise
                ↓
             U-Net
                ↓
         Clean Latent
                ↓
           VAE Decoder
                ↓
              Image
Important Observation

VAE Encoder is NOT used during standard Text-to-Image generation.

Table 11: When is the VAE Encoder Used in Stable Diffusion?

Operation VAE Encoder Used?
Model Training Yes
Text → Image No
Image → Image Yes
Inpainting Yes
Outpainting Yes

Table 12: Stable Diffusion (Training)

Image
  ↓
VAE Encoder
  ↓
Image Latent
  ↓
Add Noise
  ↓
U-Net
  ↓
Predict Noise

Table 13: Stable Diffusion (Text-to-Image Inference)

Prompt
    ↓
Text Encoder
    ↓
Text Embeddings
                +
Random Latent Noise
                ↓
            U-Net
            ↓
        Clean Latent
        ↓
      VAE Decoder
      ↓
     Image
Notice:

VAE Encoder is NOT used here.

Table 14: BERT vs GPT vs Stable Diffusion

Feature BERT GPT Stable Diffusion
Architecture Encoder-only Decoder-only Diffusion + VAE
Input Text Text Text Prompt
Output Embedding Text Image
Primary Goal Understanding Generation Image Generation
Uses Embeddings? Yes Internally Yes
Uses VAE? No No Yes
Uses Transformer Decoder? No Yes No
Uses Transformer Encoder? Yes No Text Encoder Only

Tuesday, June 9, 2026

K8S - Important Reference Articles on this Blog

The following articles provide a neat and concise reference for creating YAML files for some of the most commonly used Kubernetes resources.

What makes these references particularly useful is that they focus exclusively on the essential YAML elements required for each resource, presented in a compact tabular format that is easy to understand and quick to use during day-to-day Kubernetes administration and development.

Why These References Are Useful
  • Quick lookup for Kubernetes YAML structure.
  • Focuses only on important fields and attributes.
  • Easy to use during interviews, certification preparation, and production work.
  • Avoids lengthy official documentation when you only need a YAML reference.
  • Provides concise information that is difficult to find in a single place elsewhere on the internet.

Available Kubernetes YAML References

Reference Article Link
K8S Reference - StatefulSet YAML Open Article
K8S Reference - Service YAML Open Article
K8S Reference - PersistentVolumeClaim (PVC) YAML Open Article
K8S Reference - StorageClass YAML Open Article
K8S Reference - PersistentVolume (PV) YAML Open Article
Recommended Reading Order
  1. StorageClass YAML
  2. PersistentVolume (PV) YAML
  3. PersistentVolumeClaim (PVC) YAML
  4. Service YAML
  5. StatefulSet YAML
This sequence helps build a complete understanding of how storage and networking components work together inside a stateful Kubernetes application.

K8S Reference - StatefulSet YAML

The table below summarizes all important elements used when defining a Kubernetes StatefulSet. Unlike Deployments, StatefulSets provide stable network identities, persistent storage, and ordered deployment behavior, making them ideal for databases and other stateful applications.

Element Required / Optional Description Example Syntax
apiVersion Required The API group and version for workloads. For StatefulSets, this is apps/v1. apiVersion: apps/v1
kind Required Defines the resource type. kind: StatefulSet
metadata.name Required The unique name of the StatefulSet controller. name: mysql
spec.serviceName Required The name of the Headless Service that manages the stable DNS and network identity of the StatefulSet Pods. serviceName: "my-db-service"
spec.replicas Optional The number of desired Pod instances. Defaults to 1. replicas: 3
spec.selector Required Tells the StatefulSet controller which Pods it owns and should manage. The selector must exactly match the labels defined inside template.metadata.labels.
selector:

  matchLabels:

    app: my-database
spec.template Required The Pod blueprint used to create each StatefulSet Pod. This contains labels, container definitions, images, environment variables, ports, volumes, and all standard Pod settings. Standard Pod spec structure
spec.volumeClaimTemplates Optional (Highly Recommended) An array of mini-PVC definitions. Instead of sharing one volume, every Pod automatically receives its own dedicated PersistentVolumeClaim (for example: data-mysql-0, data-mysql-1, data-mysql-2).
volumeClaimTemplates:

- metadata:

    name: data
Important: A StatefulSet almost always works together with a Headless Service and one or more PersistentVolumeClaims. These three components provide stable DNS names, persistent storage, and predictable Pod identities.
Example Pod Names Created:

mysql-0
mysql-1
mysql-2

K8S Reference - Service YAML

The table below summarizes all important elements used when defining a Kubernetes Service resource.

Element Required / Optional Description Example Syntax
apiVersion Required The API group. For Services, this is always v1. apiVersion: v1
kind Required Defines the resource type. kind: Service
metadata.name Required The DNS name that other applications inside the cluster will use to talk to this service. name: my-db-service
spec.ports Required A list of network ports to expose. Includes the port (the Service's port) and targetPort (the Pod's actual application port).
- port: 80

  targetPort: 8080
spec.selector Optional (Recommended) A key-value label pair used to target which Pods receive traffic. Crucial for connecting the Service to your application.
selector:

  app: my-database
spec.type Optional Controls how the Service is exposed: ClusterIP (internal only), NodePort (exposes via host ports), or LoadBalancer (cloud provider external IP). Defaults to ClusterIP. type: ClusterIP
spec.clusterIP Optional Can be set to None to create a Headless Service, which is strictly required when pairing with a StatefulSet to handle direct pod addressing. clusterIP: None
Tip: The most commonly used Service types are ClusterIP, NodePort, and LoadBalancer. For StatefulSets, remember that a Headless Service (clusterIP: None) is typically required to provide stable network identities to Pods.

Interactive AI Learning

Polo Club of Data Science (Georgia Tech) Website: https://pol...