Sanjay growth of data and having massive amount

Sanjay Tanwani1
and Amit Kanojia2

1 School of
Computer Science & IT, Indore, India

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

2 Department of Computer
Science, M.J.Govt.Girls PG College, Indore, India

 

Abstract- The rapid growth
in data volume, complexity, variety and velocity of data in organizations, need
for handling unstructured data is increasing continuously.  NoSQL databases are well suited in dealing
with big data applications.  The enormous
amount of data generated on web is highly unstructured in nature.  Relational database are designed to manage
structured data and is not capable of managing unstructured data and high data
volume.  This paper presents comparative analysis of an
Oracle Database and NoSQL document oriented database management system –
MongoDB.  The comparison depicts key
features, theoretical differences, restrictions and focuses on basic CRUD operations in MogoDB

 

Key Words- Big data, NoSQL, MongoDB,
RDBMS, crud

 

I.       
Introduction

The term NoSQL was first introduced by
Carlo Strozzi in year 1998.  NoSQL stands
for “Not Only SQL”.  The rapid growth of data
and having massive amount of data that comes out every day from the web and business
applications become hard to handle for RDBMS. 
This has added interest to alternatives to RDBMS.  NoSQL databases are defined as distributed,
horizontally scalable and open source. 5

 

Relational database management systems
define fixed schema and data is inserted strictly according to schema.  NoSQL databases are built to allow the
insertion of data without predefined schema, which makes it easy to make
significant application changes in real time and makes development faster.  NoSQL databases are high performance,
scalable systems 1.  It is difficult to
handle both the size of data and concurrent actions on data within standard
RDBMS.  Some of the reasons to employ
NoSQL technique are scalability, high availability; distribute architecture
support, flexible schema, varied data structure, fault tolerance and
consistency. 

 

MongoDB is an open source project held
by the 10gen.company. It is a document-oriented, schema-less database, which
stores data in BSON (Binary JSON) format. 
MongoDB can deal with structured semi structured and unstructured data
unlike RDBMS. MongoDB documents can vary in structure. Fields can vary from
document to document. Similar documents are stored in collections. Here, collection
corresponds to a table and document corresponds to a record.
MongoDB can add, remove or change a field for a document without affecting
other documents in the same collection. This saves the expensive ALTER table
operations that can lead to redesigning the entire set of schemas and the
migration of existing database to the new schema.

 

MongoDB documents hold all data for a
given record in a single document as against relational databases where data
for a single record is spread across different tables. Therefore data in
MongoDB is more localized, which reduces the need to JOIN separate tables 3.
Joins are avoided in MongoDB by embedding documents within the document. The
result is increased performance and scalability as a single read to the
database can retrieve the entire document. MongoDB also provides horizontal
scalability by a technique called Auto sharding and therefore chances of any
node failure are almost nil. Most of the research studies reveal that MongoDB
is much faster than MS SQL in writing (inserts/updates) and reading (retrieval)
1

 

II .   No SQL Databases (Classification)

 

NoSQL
databases are classified as6 –

i.                    
Document
oriented store

ii.                   
Key-value
store

iii.                 
Column
oriented store

iv.                 
Graph
oriented store

 

A.
Document-Oriented

Document-Oriented stores are similar to
Key-Value stores with the distinction that values are visible and can be
queried. Data formats such as JSON or XML are used to store document-oriented
datasets. Document stores provide flexible schema so there is no restriction
for documents to have the same information or schema. Unlike Key-Value store,
it offers the indexing and querying based on values.  These databases store their data in form of
documents in the databases. Here the documents are recognized by a unique set
of keys and values which are almost same as there in the Key Value databases.
Document Stores Databases are schema free and are variable in nature.614

 

Other characteristics of
Document-Oriented stores are horizontal scalability and sharding across the
cluster nodes. Examples of some Document- Oriented stores are MongoDB, Amazon
DynamoDB, CouchDB, CouchBase, MarkLogic, OrientDB, Rethink DB, Cloudant, RavenDB
and Microsoft Azure DocumentDB 6.

 

B.
Key-Value

Key-Value Stores as the name
suggests is a combination of two entities: Key and Values.  It is one of the traditional databases that
has given birth to all the other databases of NoSQL. It has a concrete
application programming interface (API) and allows its users to store data in a
schemaless manner. The stored is in two parts: 
Key is a unique identifier to a particular data entry. Key should not be
repeated if one used that it is not duplicate in nature. Value is a kind of data
that is pointed by a key. 14

 

Key-Value store is the least complex
storage paradigm amongst NoSQL databases. Key-Value Stores provide best
performance on basic CRUD (Create, Read, Update and Delete) operations. They
also provide scalability and sharding across cluster nodes. Sharding is a
horizontal partitioning technique used to partition large amount of data into
smaller and easily manageable parts/shards. However, Key-Value databases are
less flexible for querying and indexing complex and connected data. Queries for
this category are usually based on keys rather than values. Examples of some
Key- Value stores are Redis, Memcached, Riak KV, Hazelcast, Ehcached, OrientDB,
Aerospike, Amazon simple DB etc.6

 

C.
Column-Oriented

Column oriented databases are also
referred as column family databases. Column oriented stores are feasible when
there is a need to handle sparse and large amount of data. Column stores in NoSQL are basically hybrid row/column
store unlike pure relational column databases. Although it makes use of the
columnar extensions but rather storing data in the tables it stores them in
extensively distributed architecture. Columns are grouped according to the
relationship of data. In column stores, each key is associated with one or more
attributes (columns). A Column oriented data storestores its data in such a
fashion that it can be aggregated rapidly with less I/O activity. It focuses on
high scalability in data storage. The data is stored in the sorted sequence of
the column family.

 

In the comparison of row oriented
databases, column oriented databases have better capabilities to manage data
and storage space. Horizontal scalability is one of its trending
characteristics. Some prominent examples of column oriented databases include bloging
and event logging etc. Examples of column-oriented stores are Hbase, Accumulo,
Hypertable, Google Cloud Bigtable, Sqrrl, ScyllaDB, MapR-DB614

 

D.
Graph-Oriented

Graph databases evolved from the Graph
Theory which is designed to represent entities and their relationships as nodes
and edges respectively. The graph consists of nodes and edges, where nodes act
as the objects and edges act as the relationship between the objects. Graph
databases replace relational tables with structured relational graphs of
interconnected key-value pairings. The graph also consists of properties
related to nodes. It uses a technique called index free adjacency i.e. every
node consists of a direct pointer which points to the adjacent node. Millions
of records can be traversed using this technique. In a graph database, focus is
on the relation established between data using pointers. Graph databases
provides schema less and efficient storage of semi structured data. The queries
are expressed as traversals, thus making graph databases faster than relational
databases. It is easy to scale and whiteboard friendly. Graph databases support
ACID axiom and support rollback14.  As
graphs have an expressive power and strong modeling characteristics thus every
scenario from the real world can be represented as graphs and it is possible to
model in graph database as well. Graph data can be queried more efficient
because intensive joins are not necessarily required in graph query languages. 6

 

Fig.
1 NoSQL database types

III.              
Comparison
-Oracle and MongoDb

 

MongoDB is a NoSQL database management system
released in 2009. It stores data as JSON-like documents with dynamic schemas (the
format is called BSON).   NoSQL is a
class of database management system different from the traditional relational
databases in that data is not stored using fixed table schemas.  Mainly its purpose is to serve as database
system for huge web-scale applications where they outperform traditional
relational databases

 

MongoDB focussed on four factors:
flexibility, power, speed and ease of use. 
It supports indexing and it offers multiple programming languages
drivers.  Database model for MongoDB is
schemaless document oreinted wherease Oracle database supports relational
model. Oracle databases possesses a standarnd query language SQl while MongoDB
supports API calls.

 

MongoDB has aggregation functions. A
built-in map-reduce function can be used to aggregate large amounts of data.  MongoDB accepts larger data. The Oracle
database supports maximum value size 4KB whereas MongoDB has maximum value size
16 MB.  The integrity model used by
Oracle Database is ACID, while MongoDB uses BASE. MongoDB offers consistency,
durability and conditional atomicity. Oracle Database offers integrity features
that MongoDB doesn’t offer like: isolation, transactions, referential integrity
and revision control.  In manners of
distribution both MongoDB and Oracle Database are horizontal scalable and have support
for data replication. While MongoDB offers sharing support, Oracle Database
doesn’t.  Both MongoDB and Oracle
Database are cross platform database management systems. Oracle Database was
written in C++, C and Java, while MongoDB was written in C++. MongoDB is a freeware
product, while licencence is needed to use Oracle databases.  17.

 

A.     
Features of Mongodb

•       MongoDB provides high performance.

•       Has rich query language, support all major CRUD
operations, and provides Aggregation features.

•       MongoDB provides High availability with Auto
Replication feature.  Data is restored
through backup (replica) in case failure of server.

•       Provides automatic failover mechanism

•       Sharding is major feature due to which horizontal
scalability is possible.

•       A record in MongoDB is a document

•       Holds collections of documents

B.     
advantages
of Mongodb

•       MongoDB  is simple
and very easy to install and setup.

•       MongoDB is a schema-less database.

•       The document query language supported by MongoDB
plays a vital role in supporting dynamic queries.

•       Very easy to scale.

•       In MongoDB no complex joins are needed. Becauses
data stored in BSON format – key value pair way.

•       It useds internal memory for storage of data due to
this faster access of the data is possible in MongoDB.

•       In MongoDB enhancement in performance can be done
easily compared to any relational databases.

•       No need of mapping the application objects to the
data objects.

•       MongoDB support Sharding results in the horizontal
scaling.  Relational databases support
vertical scaling.

 

Table 2 Comparison of MongoDB and Oracle 14

Key Feature

Oracle

MongoDB

Data Model

Data
Stores in form of tables.  Follow fixed
schema structure.

Follow
Document based model for representing the data. It is schema less and can
handle unstructured data efficiently

Scalability

Providing
both vertical as  well as horizontal scalability

Provide
an effective horizontal scalability

Transaction reliability

follow
ACID rule hence are more reliable

follow
BASE rule

Complexity

More
Complex

Less Complex

Security

Very secure
mechanism

Less Secure

Crash Recovery

Ensure
crash recovery through its ACID properties

depends
on replication as back up to recover from crash.

Cloud

Not
suitable for cloud applications

Suitable for cloud applications

Big Data Handling

Unable
to handle big data problem

Designed
to deal with the Big Data problem effectively.

 

IV . Crud Operations

 

This
section focuses on the basic operations of CRUD. Two databases, one using
Oracle and one in MongoDB are created to compare the way that data will be
created, selected, inserted and deleted in both databases 21.  MongoDB is a fast responding database
management system. If you want a simple database that will respond very fast, MongoDB
is best choice.  MongoDB support all
major CRUD operations, and provides Aggregation features.  Following are the major CRUD operations – 

 

Table 3 CRUD Operations

Operations

Oracle

MongoDB

Create Table

CREATE TABLE Accounts (first_name`
VARCHAR(64) NULL , `last_name` VARCHAR(45) NULL , PRIMARY KEY (`id`) );

db.accounts.insert({
name:”abc”, age:26, address:”indore”})

Delete
a Table

Drop table accounts;

db.accounts.drop()

Insert

Insert into accounts( name, age,
address ) VALUES ( “abc”, 26, “indore”)

db.accounts.insert({
name:”abc”, age:26, address:”indore”})

Select

Select * from accounts

db.accounts.find()

Select fields

Select first_name, last_name  from accounts

db.accounts.find({ }, {
first_name: 1, last_name: 1 })

Conditional Select

Select * from Accounts where dep_wid=”D”
and balance>5000

db.accounts.find({dep_wid:”d”,
balance:{$gt:5000}})

Ordered Select ascending

Select * from accounts order
by user_id asc

db.accounts.find({}).sort({user_id
: 1})

Ordered Select descending

Select * from accounts order
by user_id desc

db.accounts.find({}).sort({user_id:
-1 })

Select with count

Select count(*) from users

db.articles.count()

Update

update table student set
section=”F”  where marks