Introduction to Hadoop and MapReduce Notes
Class Listing on Udacity What is big data - a subjective term but mostly large amount of data that is usually difficult to be processed on a small machine not necessarily large amounts of data. Challenges with data are that data comes in really fast and from multiple places. The three V's Volume, Variety, Velocity References When to use HBase and when to use Hive - Stack Overflow Apache Flume – Architecture of Flume NG | Cloudera Developer Blog CDH - distribution of Apache Hadoop and related projects. Hadoop Streaming Hadoop Storing format. Introducing Parquet: Efficient Columnar Storage for Apache Hadoop | Cloudera Developer Blog hadoop - Storage format in HDFS - Stack Overflow Terms NameNode MapReduce Shuffle and Sort Apache Spoop Apache Nutch The Final Much of this information below is on a Google doc that was some what hidden in the course wiki b...