Apache Hive supports analysis of large datasets stored in Hadoop-compatible file systems such as Amazon S3 filesystem. It provides an SQL-like language called HiveQL while maintaining full support for map/reduce.hive To accelerate queries, it provides indexes, including bitmap indexes.

For installing hadoop please follow our post here

Loading data from flat files to Hive Tables :

 hive> LOAD DATA LOCAL INPATH './examples/files/out_001223.txt' OVERWRITE INTO TABLE sample2;

The keyword ‘overwrite’ signifies that existing data in the table is deleted. If the ‘overwrite’ keyword is omitted, data files are appended to existing data sets.

 hive> LOAD DATA LOCAL INPATH './examples/files/out_23.txt' OVERWRITE INTO TABLE sample PARTITION (ds='2008-08-15');
 hive> LOAD DATA LOCAL INPATH './examples/files/out_122.txt' OVERWRITE INTO TABLE sample PARTITION (ds='2008-08-08');

The two LOAD statements above load data into two different partitions of the table sample. Table sample must be created as partitioned by the key ds for this to succeed.

 

Basics of Hive QL

Creating Hive tables :

 hive> CREATE TABLE sample ( col1 INT, col2 STRING);
 hive> CREATE TABLE sample2 ( col1 INT, col2 STRING) PARTITIONED BY (ds STRING);

Browsing through them :

hive> SHOW TABLES;
hive> SHOW TABLES '.*e';
hive> DESCRIBE sample2;

Altering tables :

 hive> ALTER TABLE sample ADD COLUMNS (new_col INT);
 hive> ALTER TABLE sample2 ADD COLUMNS (new_col2 INT COMMENT 'a comment');
 hive> ALTER TABLE sample RENAME TO sample1;

Dropping tables :

 hive> DROP TABLE pokes;