Introduction to Hadoop and MapReduce Notes
Challenges with data are that data comes in really fast and from multiple places.
The three V's
Apache Flume – Architecture of Flume NG | Cloudera Developer Blog
CDH - distribution of Apache Hadoop and related projects.
Hadoop Streaming
Hadoop Storing format.
Introducing Parquet: Efficient Columnar Storage for Apache Hadoop | Cloudera Developer Blog
hadoop - Storage format in HDFS - Stack Overflow
Terms
The Final
Much of this information below is on a Google doc that was some what hidden in the course wiki but not provided on the final's instructions. The doc can also be found in the forms for the class but rather then simply reference the class I wanted to take some of the notes from it since did not completely help me. Avoid the pain just Download a vm of Hadoop and follow the steps below.
Python Resources
Example use of "continue" statement in Python? - Stack Overflow
4. More Control Flow Tools — Python v2.7.6 documentation
Hadoop Storing format.
Introducing Parquet: Efficient Columnar Storage for Apache Hadoop | Cloudera Developer Blog
hadoop - Storage format in HDFS - Stack Overflow
Terms
The Final
Much of this information below is on a Google doc that was some what hidden in the course wiki but not provided on the final's instructions. The doc can also be found in the forms for the class but rather then simply reference the class I wanted to take some of the notes from it since did not completely help me. Avoid the pain just Download a vm of Hadoop and follow the steps below.
Using Oracle VirtualBox
Download it from http://content.udacity-data.com/courses/ud617/Cloudera-Udacity-Training-VM-4.1.1.c.zip. Warning - the zipped file size is 1.7 GBMD5sum file can be found here http://content.udacity-data.com/courses/ud617/Cloudera-Udacity-Training-VM-4.1.1.c.zip.md5- Unzip it. Warning - the unzipped size is 4.2GB Download and unzip data sets from:
- Download and install VirtualBox from https://www.virtualbox.org/wiki/Downloads
- Create a new Virtual machine:
- Create a new virtual machine by pressing the ‘New’ button:
- Choose a name, use ‘Type’: ‘Linux’:
- Press Next
- Select memory size for the VM.
- Press Next
- Select ‘Use an existing virtual hard drive file’’, click the button to browse to the directory you unzipped the provided VM image and press ‘Create’.
- Start the VM!
Python Resources
Example use of "continue" statement in Python? - Stack Overflow
4. More Control Flow Tools — Python v2.7.6 documentation