Tuesday, 24 March 2026

📊 Data Preprocessing in Data Science: Why Cleaning Data Matters

When we talk about data science, most people immediately think of machine learning models.

But in reality, a large portion of the work happens before the model is even built.

This step is called data preprocessing.

🧠 What is Data Preprocessing?

Data preprocessing is the process of:

cleaning data
transforming data
preparing it for analysis or modeling

Raw data is rarely usable in its original form.

It often contains:

missing values
inconsistent formats
duplicate records
irrelevant features

🔍 Why Preprocessing is Important

Even the best algorithm cannot fix poor-quality data.

For example:

Missing values can break models
Inconsistent formats lead to wrong analysis
Outliers can distort predictions

A simple model on clean data often performs better than a complex model on messy data.

🧩 Common Steps in Data Preprocessing

1️⃣ Handling Missing Values

Missing data is very common.

Options include:

removing rows
filling with mean/median
using interpolation

Example (Python)


import pandas as pd

df = pd.read_csv("data.csv")

# Fill missing values with mean
df['age'].fillna(df['age'].mean(), inplace=True)

2️⃣ Removing Duplicates

Duplicate data can bias results.


df.drop_duplicates(inplace=True)

3️⃣ Encoding Categorical Variables

Machine learning models work with numbers, not text.

Example:


df = pd.get_dummies(df, columns=['city'])

4️⃣ Feature Scaling

Some algorithms require data to be on similar scales.


from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
df[['salary']] = scaler.fit_transform(df[['salary']])

5️⃣ Handling Outliers

Outliers can distort models.

Example:

remove extreme values
cap values
use robust methods

📊 Real-World Example

Suppose you are building a model to predict house prices.

Raw dataset may have:

missing values in price
inconsistent formats
duplicate entries
extreme outliers

After preprocessing:

missing values handled
clean numerical data
standardized features

Only then is the data ready for modeling.

Sunday, 22 March 2026

Exploring Oracle Database 26ai: 5 Features That Stand Out

Databases have evolved far beyond simple data storage systems. Modern databases are expected to support AI workloads, analytics, application development, and secure data management—all within a single platform.

Oracle Database 26ai reflects this evolution. It introduces several features designed to make databases more intelligent, scalable, and developer-friendly.

In this post, I’ll highlight five interesting capabilities of Oracle 26ai that caught my attention while exploring recent demos and documentation.

1️⃣ Hybrid Vector Index – Enabling AI-Driven Search

One of the most exciting additions in modern databases is vector search.

Vector search allows databases to store and query embeddings generated by machine learning models. Instead of matching exact words, the system can find results based on semantic similarity.

This capability is essential for building:

AI assistants
semantic search systems
recommendation engines
Retrieval Augmented Generation (RAG) applications

Oracle 26ai introduces Hybrid Vector Index, which combines vector similarity search with traditional SQL filtering.

Example

Suppose we store product descriptions and their embeddings.


CREATE TABLE products (
    id NUMBER,
    description VARCHAR2(500),
    embedding VECTOR(768)
);

To search for products similar to a query embedding:


SELECT id, description
FROM products
ORDER BY VECTOR_DISTANCE(embedding, :query_vector)
FETCH FIRST 5 ROWS ONLY;

The hybrid vector index improves performance by combining vector search with structured filtering.

Example:


SELECT id, description
FROM products
WHERE category = 'electronics'
ORDER BY VECTOR_DISTANCE(embedding, :query_vector)
FETCH FIRST 5 ROWS ONLY;

This allows AI search and structured queries to work together efficiently.

2️⃣ JSON Relational Duality – Bridging Documents and Tables

Applications often use JSON-based APIs, while databases traditionally store data in relational tables.

Oracle introduced JSON Relational Duality, allowing data to be viewed both as:

relational tables
JSON documents

without duplicating the data.

This makes it easier to build modern applications while keeping relational integrity.

Example

Create a table storing customer data.


CREATE TABLE customers (
    id NUMBER PRIMARY KEY,
    name VARCHAR2(100),
    city VARCHAR2(100)
);

The same data can be exposed as a JSON document.


SELECT JSON_OBJECT(
    'id' VALUE id,
    'name' VALUE name,
    'city' VALUE city
)
FROM customers;

This flexibility allows developers to interact with the database in the format most suitable for their application.

3️⃣ Secure Data Redaction – Protecting Sensitive Information

Security and privacy are critical for modern data systems, especially when dealing with personal or financial information.

Oracle provides data redaction capabilities to prevent sensitive data from being exposed to unauthorized users.

Instead of returning real values, the database can automatically mask or redact sensitive fields.

Example

Suppose we want to hide credit card numbers from certain users.


BEGIN
DBMS_REDACT.ADD_POLICY(
    object_schema => 'HR',
    object_name   => 'PAYMENTS',
    column_name   => 'CARD_NUMBER',
    policy_name   => 'REDACT_CARD',
    function_type => DBMS_REDACT.FULL
);
END;

When queried, the column might appear as:


XXXX-XXXX-XXXX-1234

This feature helps enforce data protection policies directly inside the database.

4️⃣ Domain Types – Improving Data Consistency

Another interesting feature is Domain Types, which allow developers to define reusable data definitions with constraints.

This helps maintain data consistency across multiple tables.

Example

Define a domain for email addresses.


CREATE DOMAIN email_domain AS VARCHAR2(255)
CHECK (REGEXP_LIKE(VALUE, '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'));

Now the domain can be reused across tables.


CREATE TABLE users (
    id NUMBER,
    email email_domain
);

This ensures every table using this domain follows the same validation rules.

5️⃣ Built-in GraphQL Support

Modern applications often use GraphQL APIs instead of traditional REST APIs.

Oracle Database now supports GraphQL queries directly within the database layer, allowing developers to expose data through GraphQL without building separate middleware.

This reduces complexity in application architectures.

Example GraphQL query:


{
  customers {
    id
    name
    city
  }
}

The database can resolve this query and return structured data directly.

This capability helps simplify data access for modern web and mobile applications.

Why These Features Matter

Looking at these features together reveals an interesting trend.

Modern databases are evolving to support multiple workloads:

Capability	Purpose
Vector Search	AI and semantic search
JSON Duality	Modern API-friendly data access
Data Redaction	Security and compliance
Domain Types	Data quality and consistency
GraphQL	Simplified application integration

Instead of relying on multiple separate systems, many of these capabilities are now built directly into the database engine.

Final Thoughts

Oracle Database 26ai reflects a broader shift in data platforms.

Databases are no longer just repositories for structured tables—they are becoming intelligent data platforms capable of supporting AI, analytics, and modern application architectures.

Features like vector search, JSON duality, and built-in security controls show how database systems are adapting to the needs of modern applications.

For developers and data professionals, understanding these capabilities can open up new possibilities for building AI-powered and data-driven systems.

Many of these capabilities were introduced in Oracle 23ai and continue to evolve in Oracle 26ai as part of Oracle’s AI-focused database platform.

Thursday, 5 March 2026

📊 Analysis vs Analytics: Understanding the Foundation of Data-Driven Decisions

In conversations about data, two terms often appear together: analysis and analytics.

Although they sound similar, they represent slightly different concepts.

Understanding this difference is important before exploring more advanced topics like predictive or prescriptive analytics.

🔍 What is Analysis?

Analysis refers to the detailed examination of something in order to understand its structure, components, or meaning.

It is usually focused on a specific problem or dataset.

In simple terms:

Analysis means breaking something down into smaller parts to understand it better.

Examples

Examining financial statements to understand company performance
Investigating why website traffic dropped last week
Studying customer feedback to identify common complaints

Analysis is often manual or investigative, and it answers questions like:

What happened?
What patterns exist in this data?

📈 What is Analytics?

Analytics is broader than analysis.

Analytics is the systematic computational analysis of data using tools, algorithms, and statistical methods to discover patterns and generate insights.

Unlike traditional analysis, analytics typically involves:

automated tools
statistical models
machine learning techniques
large datasets

Analytics aims to transform raw data into actionable insights for decision-making.

Example

A company might:

Analyze last quarter’s sales report manually
Use analytics tools to automatically detect trends and predict future demand

So while analysis is a process, analytics is often a system or discipline that uses data technologies to perform analysis at scale.

🧠 Simple Comparison

Aspect	Analysis	Analytics
Scope	Focused investigation	Broader discipline
Approach	Manual or exploratory	Systematic and computational
Tools	Basic tools or manual review	Statistical models, AI, analytics platforms
Goal	Understand a specific problem	Extract insights and support decision-making

📊 The Four Types of Data Analytics

Once data is processed through analytics methods, organizations typically apply four levels of insight.

These levels represent increasing sophistication in how data is used.

1️⃣ Descriptive Analytics — What Happened?

Descriptive analytics summarizes historical data to understand past events.

Examples:

sales reports
website traffic dashboards
financial summaries

It provides a snapshot of past performance.

2️⃣ Diagnostic Analytics — Why Did It Happen?

Diagnostic analytics investigates causes and relationships within the data.

Techniques include:

correlation analysis
root cause investigation
drill-down reporting

Example:
Understanding why customer churn increased last month.

3️⃣ Predictive Analytics — What Will Happen?

Predictive analytics uses statistical models and machine learning to forecast future outcomes.

Examples:

sales forecasting
demand prediction
fraud detection models

This stage introduces data science techniques.

4️⃣ Prescriptive Analytics — What Should We Do?

Prescriptive analytics goes further by recommending optimal actions based on predictions.

Examples:

dynamic pricing recommendations
supply chain optimization
personalized product suggestions

Here analytics begins to guide decisions automatically.

📊 The Analytics Maturity Ladder

These four analytics types often represent an organization’s data maturity progression.

Level	Question Answered
Descriptive	What happened?
Diagnostic	Why did it happen?
Predictive	What will happen?
Prescriptive	What should we do?

Organizations gradually move from understanding past data to making future-oriented decisions.

🌱 Final Thoughts

While analysis focuses on understanding specific data problems, analytics represents a broader discipline that uses computational methods to extract insights from large datasets.

Together, they form the foundation of modern data-driven decision-making.

Understanding these concepts is the first step toward deeper fields such as data science, machine learning, and artificial intelligence.

You can checkout the related blogs here:

What is Data Science

Types of Data Explained

Tuesday, 24 March 2026

📊 Data Preprocessing in Data Science: Why Cleaning Data Matters

🧠 What is Data Preprocessing?

🔍 Why Preprocessing is Important

🧩 Common Steps in Data Preprocessing

1️⃣ Handling Missing Values

Example (Python)

2️⃣ Removing Duplicates

3️⃣ Encoding Categorical Variables

4️⃣ Feature Scaling

5️⃣ Handling Outliers

📊 Real-World Example

Sunday, 22 March 2026

Exploring Oracle Database 26ai: 5 Features That Stand Out

1️⃣ Hybrid Vector Index – Enabling AI-Driven Search

Example

2️⃣ JSON Relational Duality – Bridging Documents and Tables

Example

3️⃣ Secure Data Redaction – Protecting Sensitive Information

Example

4️⃣ Domain Types – Improving Data Consistency

Example

5️⃣ Built-in GraphQL Support

Why These Features Matter

Final Thoughts

Thursday, 5 March 2026

📊 Analysis vs Analytics: Understanding the Foundation of Data-Driven Decisions

🔍 What is Analysis?

Examples

📈 What is Analytics?

Example

🧠 Simple Comparison

📊 The Four Types of Data Analytics

1️⃣ Descriptive Analytics — What Happened?

2️⃣ Diagnostic Analytics — Why Did It Happen?

3️⃣ Predictive Analytics — What Will Happen?

4️⃣ Prescriptive Analytics — What Should We Do?

📊 The Analytics Maturity Ladder

🌱 Final Thoughts

🏞️ Data Lake vs Data Warehouse vs Lakehouse: Understanding Modern Data Architectures