Hadoop Architecture Explained !!!

Connecting Hive Server with JAVA

Use the following command to start the Hive server.

$HIVE_HOME/bin/hive –service hiveserver

Once the server is up, create a regualar java project in eclipse.

Add the following libraries to your project.

1. hadoop-core-1.1.2.jar which you can find in $HADOOP_HOME directory

2. all the jars files which are residing in $HIVE_HOME/lib

You can use two type of drivers to connect with HIVE.

1. JDBC Driver
connetion string : jdbc:odbc://localhost:10000/default
Driver class : sun.jdbc.odbc.JdbcOdbcDriver

2. HIVE Driver

connetion string : jdbc:hive://localhost:10000/default
Driver class : org.apache.hadoop.hive.jdbc.HiveDriver

* In place of localhost you can use ipaddress

Here is the sample program which connect with hive and display the tables in it.

import java.sql.SQLException;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.DriverManager;

public class HiveClient {
	 private static String driverName = "org.apache.hadoop.hive.jdbc.HiveDriver";
     public static void main(String[] args) throws SQLException {
	    try {
	    } catch (ClassNotFoundException e) {
	    Connection con = DriverManager.getConnection("jdbc:hive://localhost:10000/default", "", "");
	    Statement stmt = con.createStatement();

	    ResultSet res;
	    String sql;
	    sql = "show tables";
	    System.out.println("Running: " + sql);
	    res = stmt.executeQuery(sql);
	    while ( {


If you have downloaded hive stable release  (which is currently release on 18-Dec-2012) make sure you have closed all hive connections except hive server.

Make sure your metastore_db folder have correct user access permissions.


Loading delimited data into hive table

Create the table in hive which can automatically recognize delimited data by using the following query.

create table testdata(col1 string, col2 string) row format delimited fields terminated by <delimiter> stored as textfile;

After that use the following command to load the data.

load data local inpath '/tmp/input.txt' into table testdata;

Verify whether data is loaded properly or not by selecting each column data. Sample query given below.

Select col2 from testdata;
Basic Hadoop 1.1.2 Program

Download the input data file from here

Extract and keep the data file at “/usr/demodata/”

Compile and Run Following program.

Note : Assuming that hadoop 1.1.2 is installed and running properly.


import java.util.Iterator;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.KeyValueTextInputFormat;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextOutputFormat;

import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class MyJob extends Configured implements Tool {

	public static class MapClass extends MapReduceBase implements Mapper<Object,Object,Object,Object>{

		public void map(Object key, Object value,
				OutputCollector<Object, Object> output, Reporter reporter)
				throws IOException {
			// TODO Auto-generated method stub
			output.collect(key, value);



	public static class Reduce extends MapReduceBase  implements Reducer<Object,Object,Object,Object>{

		public void reduce(Object key, Iterator<Object> values,
				OutputCollector<Object, Object> output, Reporter reporter)
				throws IOException {
			// TODO Auto-generated method stub
			String csv = "";
				if(csv.length() > 0) csv+=",";
				csv +=;
			output.collect(key,new Text(csv));


	public static void main(String[] args) throws Exception {
		// TODO Auto-generated method stub
              int res = Configuration(), new MyJob(),args);

	public int run(String[] args) throws Exception{
		Configuration conf = new Configuration();
		JobConf job = new JobConf(conf,MyJob.class);

		Path in = new Path("/usr/demodata/cite75_99.txt");
		Path out = new Path("/usr/demodata/output");

		FileInputFormat.setInputPaths(job, in);

		job.setJobName("Test Job");




		return 0;

Loading Data in Hive

Loading Data in Hive

Apache Hive supports analysis of large datasets stored in Hadoop-compatible file systems such as Amazon S3 filesystem. It provides an SQL-like language called HiveQL while maintaining full support for map/reduce.hive To accelerate queries, it provides indexes, including bitmap indexes.

For installing hadoop please follow our post here

Loading data from flat files to Hive Tables :

 hive> LOAD DATA LOCAL INPATH './examples/files/out_001223.txt' OVERWRITE INTO TABLE sample2;

The keyword ‘overwrite’ signifies that existing data in the table is deleted. If the ‘overwrite’ keyword is omitted, data files are appended to existing data sets.

 hive> LOAD DATA LOCAL INPATH './examples/files/out_23.txt' OVERWRITE INTO TABLE sample PARTITION (ds='2008-08-15');
 hive> LOAD DATA LOCAL INPATH './examples/files/out_122.txt' OVERWRITE INTO TABLE sample PARTITION (ds='2008-08-08');

The two LOAD statements above load data into two different partitions of the table sample. Table sample must be created as partitioned by the key ds for this to succeed.


Basics of Hive QL

Creating Hive tables :

 hive> CREATE TABLE sample ( col1 INT, col2 STRING);
 hive> CREATE TABLE sample2 ( col1 INT, col2 STRING) PARTITIONED BY (ds STRING);

Browsing through them :

hive> SHOW TABLES '.*e';
hive> DESCRIBE sample2;

Altering tables :

 hive> ALTER TABLE sample ADD COLUMNS (new_col INT);
 hive> ALTER TABLE sample2 ADD COLUMNS (new_col2 INT COMMENT 'a comment');
 hive> ALTER TABLE sample RENAME TO sample1;

Dropping tables :

 hive> DROP TABLE pokes;
Installing HIVE

Installing HIVE

To look at the overview of HIVE, please click here

STEP 1 : Downloadhive

Download stable release from apache hive

STEP 2 : Unzip

Copy hive tar file in required folder and run the following command to extract

tar -xzvf hive-x.y.z.tar.gz

STEP 3 : Setting hive path

  • export HIVE_HOME=/opt/hive
  • export PATH=$PATH:$HIVE_HOME/bin

STEP 4 : logout from the user and login with the same user again. check whether the settings are applied or not by using echo $HIVE_HOME

STEP 5 : Check whether hadoop is running or not, if not start the hadoop.

STEP 6 : Run following four commands

$HADOOP_HOME/bin/hadoop fs -mkdir /tmp
$HADOOP_HOME/bin/hadoop fs -mkdir /user/hive/warehouse
$HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp
$HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse

STEP 7 : Running HIVE


DONE : Now your play ground is ready.

NOTE : Hive runs on top of Hadoop. So install hadoop before you start installing HIVE. And also ensure all  required ports open.

