Foundation Phase: Migrating from On-Prem Oracle Database to MongoDB on Azure VM
This guide is written for DBA support, who must be successfully migrate mission-critical, very large databases (10TB–100TB) from Oracle (on-premises) to MongoDB running on Azure Virtual Machines.
We will use simple English, real-world steps, and free tools wherever possible.
1. Understanding the Big Picture (Very Important First Step)
Before doing anything technical, you must understand one truth:
Oracle and MongoDB are completely different database systems
Oracle Database = Relational Database (tables, rows, columns, joins)
MongoDB = NoSQL Document Database (JSON-like documents, flexible schema)
This means:
You are not just migrating data
You are also transforming how the data is structured
This is called:
Schema Transformation (Relational → Document Model)
2. Migration Strategy Overview (Simple Roadmap)
Think of the migration in 3 major phases:
Phase 1: BEFORE Migration
- Assessment
- Planning
- Environment setup
- Schema design
Phase 2: DURING Migration
- Data extraction
- Transformation
- Data loading
- Validation
Phase 3: AFTER Migration
- Performance tuning
- Backup & disaster recovery
- Monitoring
- Optimization
3. Step-by-Step BEFORE Migration Tasks
Step 1: Assess Your Oracle Database (Discovery Phase)
This is the most critical step. If you skip this, the migration will fail.
3.1 Identify What You Have
Run this in Oracle:
-- Database size
SELECT SUM(bytes)/1024/1024/1024 AS size_gb FROM dba_data_files;
-- List schemas
SELECT username FROM dba_users;
-- Table sizes
SELECT owner, segment_name, SUM(bytes)/1024/1024/1024 AS size_gb
FROM dba_segments
WHERE segment_type='TABLE'
GROUP BY owner, segment_name
ORDER BY size_gb DESC;
This helps you:
Identify largest tables (priority for migration)
Understand data distribution
3.2 Identify Complex Objects
MongoDB does NOT support:
Stored procedures
Triggers
Complex joins
Run:
-- Stored procedures
SELECT object_name FROM dba_objects WHERE object_type='PROCEDURE';
-- Triggers
SELECT trigger_name FROM dba_triggers;
You must plan to move this logic into:
Application layer (Java, Python, etc.)
Or MongoDB aggregation pipelines
3.3 Identify Data Relationships
Relational DB uses:
Foreign keys
Joins
Run:
SELECT a.table_name, a.constraint_name, c.column_name, c.position
FROM dba_constraints a
JOIN dba_cons_columns c
ON a.constraint_name = c.constraint_name
WHERE a.constraint_type = 'R';
These relationships must be converted into:
Embedded documents OR
References in MongoDB
Step 2: Choose the Right MongoDB Data Model
This is where many fail.
Relational DB Example (Oracle)
Customers table
Orders table
MongoDB Design Options
Option 1: Embedded (Best for performance)
{
"customer_id": 1,
"name": "John",
"orders": [
{ "order_id": 101, "amount": 200 },
{ "order_id": 102, "amount": 300 }
]
}
Option 2: Referenced (for very large data)
Customer:
{
"customer_id": 1,
"name": "John"
}
Orders:
{
"order_id": 101,
"customer_id": 1,
"amount": 200
}
Rule of thumb:
Use embedded for small, frequently accessed data
Use reference for large or independent data
Step 3: Set Up Azure Virtual Machines
We will use:
Azure Virtual Machine (IaaS)
Install MongoDB manually (free)
3.1 Create Azure VM
Recommended:
OS: Ubuntu 22.04
VM Size:
For 10TB: at least 16 vCPU, 64GB RAM
For 100TB: 32+ vCPU, 128GB+ RAM
Use Premium SSD or Ultra Disk
3.2 Configure Storage (Critical for Performance)
MongoDB needs fast disk.
Mount disk:
sudo mkfs.ext4 /dev/sdc
sudo mkdir /data
sudo mount /dev/sdc /data
3.3 Install MongoDB (Free Community Edition)
sudo apt-get update
sudo apt-get install -y mongodb
Start service:
sudo systemctl start mongodb
sudo systemctl enable mongodb
Step 4: Set Up MongoDB Replica Set (HADR Basics)
For mission-critical systems, you MUST use High Availability.
Minimum setup:
3 nodes (Primary + 2 Secondary)
4.1 Enable Replication
Edit config:
sudo nano /etc/mongod.conf
Add:
replication:
replSetName: "rs0"
Restart:
sudo systemctl restart mongodb
4.2 Initialize Replica Set
Connect:
mongo
Run:
rs.initiate()
Add nodes:
rs.add("node2:27017")
rs.add("node3:27017")
Now you have:
Automatic failover
High availability (HADR)
Step 5: Choose Free Migration Tools (Very Important)
Here are the most commonly used free tools:
5.1 Oracle Data Export Tools
Oracle Data Pump (expdp / impdp)
SQL*Plus
Oracle SQL Developer
5.2 MongoDB Migration Tools (Free)
mongoimport
mongodump / mongorestore
MongoDB Compass (GUI)
MongoDB Relational Migrator (Free tool)
Best combination:
Oracle → CSV (using SQL Developer)
CSV → MongoDB (using mongoimport)
Step 6: Extract Data from Oracle
6.1 Export to CSV
Example:
SET COLSEP ','
SET HEADING OFF
SET FEEDBACK OFF
SPOOL customers.csv
SELECT * FROM customers;
SPOOL OFF;
For large tables (10TB+), DO NOT export all at once.
Use batching:
SELECT * FROM customers WHERE rownum <= 1000000;
Step 7: Prepare Data for MongoDB (Transformation Phase)
This step is often ignored but critical.
7.1 Convert to JSON Format
Example transformation:
From CSV:
1,John,USA
To JSON:
{ "id": 1, "name": "John", "country": "USA" }
7.2 Use Python for Automation
import csv
import json
with open('customers.csv') as f:
reader = csv.DictReader(f)
data = list(reader)
with open('customers.json', 'w') as f:
json.dump(data, f)
Step 8: Plan for Large Data Migration (10TB–100TB)
This is where real-world complexity begins.
8.1 Use Parallel Migration
Split data:
By ID ranges
By date
By partitions
8.2 Example Strategy
Node 1 → Customers
Node 2 → Orders
Node 3 → Transactions
8.3 Use Compression
gzip customers.json
Step 9: Network Planning (Often Ignored but Critical)
For large migrations:
Use Azure VPN or ExpressRoute
Avoid public internet
Step 10: Risk Planning
Common failures:
Network interruption
Disk full
Data corruption
Timeout errors
Mitigation Strategy
Always migrate in small batches
Always validate data after each batch
Always keep Oracle as fallback
No comments:
Post a Comment