Document Version 1.0
Copyright © 2015 beijing.beijing.012@gmail.com
Keywords:
Spring Data for Hadoop, NoSQL, HBase, Java, Maven
Table of Contents
1. Create a simple Maven project "spring-hbase" in Eclipse
2. Prepare project's pom.xml file
3. Prepare Spring configuration file
4. Write code accessing HBase
5. Run the sample
N oSQL database might be interesting even if we are not really dealing with petabytes data. For a relational database with terabytes data, the time a query need to finish might take so long that it becomes NOT acceptable for a performance critical system. Even if we index tables, we optimize queries, or do VACUUM every day. For one of my concrete problems, I tried a solution with HBase and Spring Data for Hadoop, with success, and with far better performance. I had problems making my first Spring Data application to run. This post will try to show a simple running sample of Spring Data for Hadoop. Hope this could help someone with more or less the same difficulties.
2. Prepare project's pom.xml file
Necessary dependencies include jars of Spring, Spring Data, HBase and Hadoop.
In case some jars could not be found in maven central repositories, you could install them manually in enterprise repositories or you local repository.
Necessary dependencies include jars of Spring, Spring Data, HBase and Hadoop.
In case some jars could not be found in maven central repositories, you could install them manually in enterprise repositories or you local repository.
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>wangs</groupId>
<artifactId>spring-hbase</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>spring-hbase</name>
<repositories>
<repository>
<id>repo1.maven.org/maven2</id>
<url>http://repo1.maven.org/maven2</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-hadoop</artifactId>
<version>2.2.0.RELEASE</version>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-core</artifactId>
<version>4.1.7.RELEASE</version>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-context</artifactId>
<version>4.1.7.RELEASE</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>1.0.1.1</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-common</artifactId>
<version>1.0.1.1</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-hadoop2-compat</artifactId>
<version>1.0.1.1</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-protocol</artifactId>
<version>1.0.1.1</version>
</dependency>
<dependency>
<groupId>org.apache.htrace</groupId>
<artifactId>htrace-core</artifactId>
<version>3.1.0-incubating</version>
</dependency>
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-all</artifactId>
<version>4.0.23.Final</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.5.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.5.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.5.1</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency>
</dependencies>
</project>
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>wangs</groupId>
<artifactId>spring-hbase</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>spring-hbase</name>
<repositories>
<repository>
<id>repo1.maven.org/maven2</id>
<url>http://repo1.maven.org/maven2</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-hadoop</artifactId>
<version>2.2.0.RELEASE</version>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-core</artifactId>
<version>4.1.7.RELEASE</version>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-context</artifactId>
<version>4.1.7.RELEASE</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>1.0.1.1</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-common</artifactId>
<version>1.0.1.1</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-hadoop2-compat</artifactId>
<version>1.0.1.1</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-protocol</artifactId>
<version>1.0.1.1</version>
</dependency>
<dependency>
<groupId>org.apache.htrace</groupId>
<artifactId>htrace-core</artifactId>
<version>3.1.0-incubating</version>
</dependency>
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-all</artifactId>
<version>4.0.23.Final</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.5.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.5.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.5.1</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency>
</dependencies>
</project>
3. Prepare Spring Configuration file
With Spring Data for Hadoop, the connection to HBase is managed by Spring, this include:
- Hadoop configuration for HBase and
- a HbaseTemplate, the unified access to HBase
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:context="http://www.springframework.org/schema/context"
xmlns:hdp="http://www.springframework.org/schema/hadoop"
xmlns:p="http://www.springframework.org/schema/p"
xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsdxmlns
http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd
http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd">
<context:component-scan base-package="springdt" />
<context:property-placeholder location="hbase.properties"/>
<hdp:configuration id="hadoopConfiguration">
fs.defaultFS=hdfs://127.0.0.1:9000
</hdp:configuration>
<hdp:hbase-configuration configuration-ref="hadoopConfiguration" zk-quorum="127.0.0.1" zk-port="2181"/>
<bean id="hbaseTemplate" class="org.springframework.data.hadoop.hbase.HbaseTemplate">
<property name="configuration" ref="hbaseConfiguration"/>
</bean>
<bean id="hBaseService" class="springdt.HBaseService"/>
</beans>
The "HbaseService" bean is the only bean of this sample. We will show the code right away.
4. Write code accessing HBase
In this sample we will create a table in HBase called "report", and try store some data in the table. The table has one column family called "data", and only one column named "file". The value of "file" is file name, which could be report name in real life.
package springdt;
import javax.inject.Inject;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTableInterface;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.util.Bytes;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.support.AbstractApplicationContext;
import org.springframework.context.support.ClassPathXmlApplicationContext;
import org.springframework.data.hadoop.hbase.HbaseTemplate;
import org.springframework.data.hadoop.hbase.TableCallback;
import org.springframework.stereotype.Service;
/**
* Demonstrating how to access HBase using Spring Data for Hadoop.
*
* @author swang
*
*/
@Service
public class HBaseService {
@Autowired
private Configuration hbaseConfiguration;
@Inject
private HbaseTemplate hbTemplate;
// Table info
final String tableName = "report";
final String columnFamilyData = "data";
final String colFile = "file";
final String rowNamePattern = "row";
final String value = "report24.csv-";
/**
*
* @throws Exception
*/
public void run() throws Exception {
// 1. create table
createTable();
// 2. add data entry
addData();
}
/**
* Creates HBase table
*
* @throws Exception
*/
public void createTable() throws Exception {
HBaseAdmin admin = new HBaseAdmin(hbaseConfiguration);
if (admin.tableExists(tableName)) {
admin.disableTable(tableName);
admin.deleteTable(tableName);
}
HTableDescriptor tableDes = new HTableDescriptor(tableName);
HColumnDescriptor cf1 = new HColumnDescriptor(columnFamilyData);
tableDes.addFamily(cf1);
admin.createTable(tableDes);
}
/**
* Adds data entry for report.
*/
private void addData() {
hbTemplate.execute(tableName, new TableCallback<Boolean>() {
public Boolean doInTable(HTableInterface table) throws Throwable {
for (int i = 0; i < 1000; i++) {
Put p = new Put(Bytes.toBytes(rowNamePattern + i));
p.add(Bytes.toBytes(columnFamilyData),
Bytes.toBytes(colFile), Bytes.toBytes(value + i));
table.put(p);
}
return new Boolean(true);
}
});
}
public static void main(String[] args) throws Exception {
AbstractApplicationContext ctx = new ClassPathXmlApplicationContext(
"SpringBeans.xml");
HBaseService hBaseService = (HBaseService) ctx.getBean("hBaseService");
hBaseService.run();
}
}
import javax.inject.Inject;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTableInterface;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.util.Bytes;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.support.AbstractApplicationContext;
import org.springframework.context.support.ClassPathXmlApplicationContext;
import org.springframework.data.hadoop.hbase.HbaseTemplate;
import org.springframework.data.hadoop.hbase.TableCallback;
import org.springframework.stereotype.Service;
/**
* Demonstrating how to access HBase using Spring Data for Hadoop.
*
* @author swang
*
*/
@Service
public class HBaseService {
@Autowired
private Configuration hbaseConfiguration;
@Inject
private HbaseTemplate hbTemplate;
// Table info
final String tableName = "report";
final String columnFamilyData = "data";
final String colFile = "file";
final String rowNamePattern = "row";
final String value = "report24.csv-";
/**
*
* @throws Exception
*/
public void run() throws Exception {
// 1. create table
createTable();
// 2. add data entry
addData();
}
/**
* Creates HBase table
*
* @throws Exception
*/
public void createTable() throws Exception {
HBaseAdmin admin = new HBaseAdmin(hbaseConfiguration);
if (admin.tableExists(tableName)) {
admin.disableTable(tableName);
admin.deleteTable(tableName);
}
HTableDescriptor tableDes = new HTableDescriptor(tableName);
HColumnDescriptor cf1 = new HColumnDescriptor(columnFamilyData);
tableDes.addFamily(cf1);
admin.createTable(tableDes);
}
/**
* Adds data entry for report.
*/
private void addData() {
hbTemplate.execute(tableName, new TableCallback<Boolean>() {
public Boolean doInTable(HTableInterface table) throws Throwable {
for (int i = 0; i < 1000; i++) {
Put p = new Put(Bytes.toBytes(rowNamePattern + i));
p.add(Bytes.toBytes(columnFamilyData),
Bytes.toBytes(colFile), Bytes.toBytes(value + i));
table.put(p);
}
return new Boolean(true);
}
});
}
public static void main(String[] args) throws Exception {
AbstractApplicationContext ctx = new ClassPathXmlApplicationContext(
"SpringBeans.xml");
HBaseService hBaseService = (HBaseService) ctx.getBean("hBaseService");
hBaseService.run();
}
}
The "SpringBeans.xml" we created in step 3 should be put in "src/main/resources", so that it could be found at runtime.
5. Run the sample
We assume you have HBase over Hadoop installed. Make sure the HBase version is compatible with Hadoop. In my case I have "hbase-1.0.1.1" and "hadoop-2.7.0".
Now start HBase server.
Run "HBaseService" in Eclipse as java application. In Eclipse console you will see output like this:
2015-07-25 15:10:52 INFO ClientCnxn:852 - Socket connection established to 127.0.0.1/127.0.0.1:2181, initiating session
2015-07-25 15:10:52 INFO ClientCnxn:1235 - Session establishment complete on server 127.0.0.1/127.0.0.1:2181, sessionid = 0x14ec4b8cf30000e, negotiated timeout = 90000
2015-07-25 15:10:53 INFO HBaseAdmin:978 - Started disable of report
2015-07-25 15:10:55 INFO HBaseAdmin:1033 - Disabled report
2015-07-25 15:10:55 INFO HBaseAdmin:738 - Deleted report
"HBaseService" should have created 1000 ehtries in HBase table "report". Check this with hbase shell:
hbase(main):003:0> scan "report"
ROW COLUMN+CELL
row0 column=data:file, timestamp=1437823593865, value=report24.csv-0
row1 column=data:file, timestamp=1437823593897, value=report24.csv-1
...
row999 column=data:file, timestamp=1437823603565, value=report24.csv-999
1000 row(s) in 2.0600 seconds
Now we are done with the simple example. Have fun!