content top

Hadoop Architecture Explained !!!

Read More

Connecting Hive Server with JAVA

Use the following command to start the Hive server.

$HIVE_HOME/bin/hive –service hiveserver

Once the server is up, create a regualar java project in eclipse.

Add the following libraries to your project.

1. hadoop-core-1.1.2.jar which you can find in $HADOOP_HOME directory

2. all the jars files which are residing in $HIVE_HOME/lib

You can use two type of drivers to connect with HIVE.

1. JDBC Driver
connetion string : jdbc:odbc://localhost:10000/default
Driver class : sun.jdbc.odbc.JdbcOdbcDriver

2. HIVE Driver

connetion string : jdbc:hive://localhost:10000/default
Driver class : org.apache.hadoop.hive.jdbc.HiveDriver

* In place of localhost you can use ipaddress

Here is the sample program which connect with hive and display the tables in it.

import java.sql.SQLException;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.DriverManager;

public class HiveClient {
	 private static String driverName = "org.apache.hadoop.hive.jdbc.HiveDriver";
     public static void main(String[] args) throws SQLException {
	    try {
	      Class.forName(driverName);
	    } catch (ClassNotFoundException e) {
	      e.printStackTrace();
	      System.exit(1);
	    }
	    Connection con = DriverManager.getConnection("jdbc:hive://localhost:10000/default", "", "");
	    Statement stmt = con.createStatement();

	    ResultSet res;
	    String sql;
	    sql = "show tables";
	    System.out.println("Running: " + sql);
	    res = stmt.executeQuery(sql);
	    while (res.next()) {
	      System.out.println(res.getString(1));
	    }
	    con.close();
	    System.out.println("End");
	  }
}

NOTE :

If you have downloaded hive stable releaseĀ  (which is currently release on 18-Dec-2012) make sure you have closed all hive connections except hive server.

Make sure your metastore_db folder have correct user access permissions.

 

Read More

SQL or NOSQL

 

Read More

Loading delimited data into hive table

Create the table in hive which can automatically recognize delimited data by using the following query.

create table testdata(col1 string, col2 string) row format delimited fields terminated by <delimiter> stored as textfile;

After that use the following command to load the data.

load data local inpath '/tmp/input.txt' into table testdata;

Verify whether data is loaded properly or not by selecting each column data. Sample query given below.

Select col2 from testdata;
Read More

Installing wordpress locally on windows

Read More

Google’s 2004 paper which kicked off the MapReduce revolution

Download (PDF, Unknown)

Read More

Basic Hadoop 1.1.2 Program

Download the input data file from here

Extract and keep the data file at “/usr/demodata/”

Compile and Run Following program.

Note : Assuming that hadoop 1.1.2 is installed and running properly.

 

import java.io.IOException;
import java.util.Iterator;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.KeyValueTextInputFormat;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextOutputFormat;

import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class MyJob extends Configured implements Tool {

	public static class MapClass extends MapReduceBase implements Mapper<Object,Object,Object,Object>{

		@Override
		public void map(Object key, Object value,
				OutputCollector<Object, Object> output, Reporter reporter)
				throws IOException {
			// TODO Auto-generated method stub
			output.collect(key, value);

		} 

	}

	public static class Reduce extends MapReduceBase  implements Reducer<Object,Object,Object,Object>{

		@Override
		public void reduce(Object key, Iterator<Object> values,
				OutputCollector<Object, Object> output, Reporter reporter)
				throws IOException {
			// TODO Auto-generated method stub
			String csv = "";
			while(values.hasNext()){
				if(csv.length() > 0) csv+=",";
				csv += values.next().toString();
			}
			output.collect(key,new Text(csv));
		}

	}

	public static void main(String[] args) throws Exception {
		// TODO Auto-generated method stub
              int res = ToolRunner.run(new Configuration(), new MyJob(),args);
              System.exit(res);
	}

	public int run(String[] args) throws Exception{
		Configuration conf = new Configuration();
		JobConf job = new JobConf(conf,MyJob.class);

		Path in = new Path("/usr/demodata/cite75_99.txt");
		Path out = new Path("/usr/demodata/output");

		FileInputFormat.setInputPaths(job, in);
		FileOutputFormat.setOutputPath(job,out);

		job.setJobName("Test Job");
		job.setMapperClass(MapClass.class);
		job.setReducerClass(Reduce.class);

		job.setInputFormat(KeyValueTextInputFormat.class);
		job.setOutputFormat(TextOutputFormat.class);

		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(Text.class);
		job.set("key.value.separator.in.input.line",",");

		JobClient.runJob(job);

		return 0;
	}

}
Read More

Creating JAR File in Eclipse

Creating JAR File in Eclipse

STEP 1 : Select Project that you want to export

STEP 2 : Select File/Right on click on project, Click on Export

STEP 3 : Open Java Folder, Select Jar file.

exportwindow

 

STEP 4 : Select source files and other required file or folder that you want to include in jar, click on browse, select the folder where jar need to create, enter jar file name. click ok

exportwindow2

STEP 5 : Click on Finish, jar creation are DONE

 

 

Read More

Be a Extraordinary Man !!

“One machine can do the work of fifty ordinary men. No machine can do the work of one extraordinary man.”

-Elbert Hubbard, The Roycroft Dictionary and Book of Epigrams, 1923

Read More

Loading Data in Hive

Loading Data in Hive

Apache Hive supports analysis of large datasets stored in Hadoop-compatible file systems such as Amazon S3 filesystem. It provides an SQL-like language called HiveQL while maintaining full support for map/reduce.hive To accelerate queries, it provides indexes, including bitmap indexes.

For installing hadoop please follow our post here

Loading data from flat files to Hive Tables :

 hive> LOAD DATA LOCAL INPATH './examples/files/out_001223.txt' OVERWRITE INTO TABLE sample2;

The keyword ‘overwrite’ signifies that existing data in the table is deleted. If the ‘overwrite’ keyword is omitted, data files are appended to existing data sets.

 hive> LOAD DATA LOCAL INPATH './examples/files/out_23.txt' OVERWRITE INTO TABLE sample PARTITION (ds='2008-08-15');
 hive> LOAD DATA LOCAL INPATH './examples/files/out_122.txt' OVERWRITE INTO TABLE sample PARTITION (ds='2008-08-08');

The two LOAD statements above load data into two different partitions of the table sample. Table sample must be created as partitioned by the key ds for this to succeed.

 

Basics of Hive QL

Creating Hive tables :

 hive> CREATE TABLE sample ( col1 INT, col2 STRING);
 hive> CREATE TABLE sample2 ( col1 INT, col2 STRING) PARTITIONED BY (ds STRING);

Browsing through them :

hive> SHOW TABLES;
hive> SHOW TABLES '.*e';
hive> DESCRIBE sample2;

Altering tables :

 hive> ALTER TABLE sample ADD COLUMNS (new_col INT);
 hive> ALTER TABLE sample2 ADD COLUMNS (new_col2 INT COMMENT 'a comment');
 hive> ALTER TABLE sample RENAME TO sample1;

Dropping tables :

 hive> DROP TABLE pokes;
Read More
content top
%d bloggers like this:
shared on wplocker.com