In my last few blogs, I provided the basic knowledge of Hadoop and HDFS. Few on Mapreduce too. The emails I got from various readers of the blogs are appreciating. Many of the readers got attracted to Big Data and Hadoop Technology.
For a further help, I would like to let you know about a few of the books and web resources from where you can start reading the same. This blog is dedicated to the same.
So if you are reading my blog articles and interested in learning Hadoop, you must be familiar about the power of Big Data and why people are going gaga over this Big Data.
You can refer to these small articles about Big Data ,HDFS and Mapreduce.
- Why use Hadoop? Any other solution to Large-Scale Data ?
- Hadoop Distributed File System(HDFS)
- Let's start with Map-Reduce
- Hello Hadoop: Welcoming Parallel Processing
You may like to read about Pre-requisites for getting started with Big Data Technologies to get yourself started with Big Data technologies.
Now the main topic of the blog.
The first book i would recommend you guys out there will be: Hadoop The Definitive Guide 3rd Edition by Tom White. I started my Big Data Journey with this book and believe me it is the best resource for you if you are naive in the Big Data World. The book is elegantly written to understand the concept topic-wise. It also gives you an Example of Wearther Dataset which is carried almost through out the book to help you understand how things go in hadoop.
The second book I like reading and which is also very helpful is: Hadoop in Practice by Alex Holmes. Hadoop in Practice collects 85 battle-tested examples and presents them in a problem/solution format. It balances conceptual foundations with practical recipes for key problem areas like data ingress and egress, serialization, and LZO compression. You'll explore each technique step by step, learning how to build a specific solution along with the thinking that went into it. As a bonus, the book's examples create a well-structured and understandable codebase you can tweak to meet your own needs.
The third one which is written real simpl will be: Hadoop in Action by Chuck Lam. Hadoop in Action teaches readers how to use Hadoop and write MapReduce programs. The intended readers are programmers, architects, and project managers who have to process large amounts of data offline. Hadoop in Action will lead the reader from obtaining a copy of Hadoop to setting it up in a cluster and writing data analytic programs.
Note: this book uses old Hadoop API
And lastly if you are more into administrative side you can go for Hadoop's Operations by Eric Sammer. Along with the development this book talks mainly about administrating and maintenance of huge clusters for large data-set in the production environment. Eric Sammer, Principal Solution Architect at Cloudera, shows you the particulars of running Hadoop in production, from planning, installing, and configuring the system to providing ongoing maintenance.
Well these are the books that you can refer for your understanding and better conceptual visualization and practical Hands-on of working with Hadoop Farmework
Apart from these books if want to go for the API, you can see Hadoop API Docs here and also very useful is: Data-Intensive Text Processing with MapReduce
Hope you will find these books and resources helpful to understand in-depth of Hadoop and its power.
If you have any question or you want any specific tutorial on Hadoop you can go request for the same in the email address. I will try to get back to you as soon as possible :)