BAD601 Big Data Analytics
This repository contains the implementation and documentation of Big Data Analytics Laboratory Experiments.
The main objective is to understand the fundamentals of Big Data, Hadoop ecosystem, and related tools through hands-on experiments.
| Sl. No | Experiment | Link |
|---|---|---|
| 1 | Install Hadoop and implement the following file management tasks in HDFS: • Adding files and directories • Retrieving files • Deleting files and directories Hint: A typical Hadoop workflow creates data files (such as log files) elsewhere and copies them into HDFS using one of the above command line utilities. |
Install |
| 2 | Develop a MapReduce program to implement Matrix Multiplication. | MM |
| 3 | Develop a MapReduce program that mines weather data and displays appropriate messages indicating the weather conditions of the day. | WDA |
| 4 | Develop a MapReduce program to find the tags associated with each movie by analyzing MovieLens dataset. | MTA |
| 5 | Implement MongoDB functions: Count, Sort, Limit, Skip, Aggregate. | MDB |
| 6 | Develop Pig Latin scripts to sort, group, join, project, and filter the data. | PIG |
| 7 | Use Hive to create, alter, and drop databases, tables, views, functions, and indexes. | |
| 8 | Implement a Word Count program in Hadoop and Spark. | WCH WCS |
| 9 | Use CDH (Cloudera Distribution for Hadoop) and HUE (Hadoop User Interface) to analyze data and generate reports for sample datasets. |
- Hadoop (HDFS, MapReduce, YARN)
- MongoDB
- Apache Pig
- Apache Hive
- Apache Spark
- Apache Kafka
- Python / Java (for coding MapReduce & Spark jobs)
- Install Hadoop and configure it in pseudo-distributed/cluster mode.
- Start HDFS and YARN daemons:
start-dfs.sh start-yarn.sh