Course Description

470.694 - Big Data Management Systems

This course introduces students to big data management systems such as the Hadoop system, MongoDB, Amazon AWS, and Microsoft Azure. The course covers the basics of the Apache Hadoop platform and Hadoop ecosystem; the Hadoop distributed file system (HDFS); MapReduce; common big data tools such as Pig (a procedural data processing language for Hadoop parallel computation), Hive (a declarative SQL-like language to handle Hadoop jobs), HBase (the most popular NoSQL database), and YARN. MongoDB is a popular NoSQL database that handles documents in a free schema design, which gives the developer great flexibility to store and use data. We cover aspects of the cloud computing model with respect to virtualization, multitenancy, privacy, security, and cloud data management. Prerequisite: 470.763 Database Management Systems