hbase put example java client api – CRUD operations in hbase

The put operations has two variations the fist variation works on a single row and the second on a lists of rows. We will first look into the put operation on single rows

The below method stores the data in the HBASE table


void put(Put put) throws IOException

The above method expects one or list of Put objects that can be created with one of these constructors


Put(byte[] row)
Put(byte[] row, RowLock rowLock)
Put(byte[] row, long ts)
Put(byte[] row, long ts, RowLock rowLock)

We need to pass a row object to create a Put instance. A row in HBase is identified by a unique row key which is a Java byte[] array. We can choose any row key we like but choosing a well designed row key is important for hbase query perfomance.

Lets create a hbase connection using the connection factory. Below is an example to add data into the table test.


package com.learn.hbase.client;

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.util.Bytes;

public class PutHBaseClient {

public static void main(String[] args) throws IOException {

Configuration conf = HBaseConfiguration.create();

/*
* A cluster connection encapsulating lower level individual connections
* to actual servers and a connection to zookeeper. Connections are
* instantiated through the ConnectionFactory class. The lifecycle of
* the connection is managed by the caller, who has to close() the
* connection to release the resources. The connection object contains
* logic to find the master, locate regions out on the cluster, keeps a
* cache of locations and then knows how to re-calibrate after they
* move. The individual connections to servers, meta cache, zookeeper
* connection, etc are all shared by the Table and Admin instances
* obtained from this connection.
*
* Connection creation is a heavy-weight operation. Connection
* implementations are thread-safe, so that the client can create a
* connection once, and share it with different threads. Table and Admin
* instances, on the other hand, are light-weight and are not
* thread-safe. Typically, a single connection per client application is
* instantiated and every thread will obtain its own Table instance.
* Caching or pooling of Table and Admin is not recommended.
*/

/*
* ConnectionFactory is a non-instantiable class that manages creation
* of Connections. Managing the lifecycle of the Connections to the
* cluster is the responsibility of the caller. From a Connection, Table
* implementations are retrieved with Connection.getTable(TableName).
*/

Connection connection = ConnectionFactory.createConnection(conf);
Table table = connection.getTable(TableName.valueOf("test"));
try {

/*
* Put operations for a single row. To perform a
* Put, instantiate a Put object with the row to insert to and for
* each column to be inserted, execute addcolumn.
*/

Put put1 = new Put(Bytes.toBytes("row1"));

Put put2 = new Put(Bytes.toBytes("row2"));

Put put3 = new Put(Bytes.toBytes("row3"));

put1.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("qual1"),
Bytes.toBytes("ValueOneForPut1Qual1"));

put2.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("qual1"),
Bytes.toBytes("ValueOneForPut2Qual1"));

put3.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("qual1"),
Bytes.toBytes("ValueOneForPut2Qual1"));

put1.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("qual2"),
Bytes.toBytes("ValueOneForPut1Qual2"));

put2.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("qual2"),
Bytes.toBytes("ValueOneForPut2Qual2"));

put3.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("qual2"),
Bytes.toBytes("ValueOneForPut3Qual3"));

table.put(put1);
table.put(put2);
table.put(put3);

} finally {
table.close();
connection.close();
}

}

}

Lets run the above code and verify whether is data gets inserted into the db. We will open the hbase shell and run the below command

[Java]

scan ‘test’

[/Java]

we will see the below result

[Java]

hbase(main):020:0> scan ‘test’
ROW COLUMN+CELL
row1 column=cf:qual1, timestamp=1536563371778, value=ValueOneForPut1Qual1
row1 column=cf:qual2, timestamp=1536563371778, value=ValueOneForPut1Qual2
row2 column=cf:qual1, timestamp=1536563371788, value=ValueOneForPut2Qual1
row2 column=cf:qual2, timestamp=1536563371788, value=ValueOneForPut2Qual2
row3 column=cf:qual1, timestamp=1536563371791, value=ValueOneForPut2Qual1
row3 column=cf:qual2, timestamp=1536563371791, value=ValueOneForPut3Qual3
3 row(s) in 0.0180 seconds

[/Java]

Lets update a row1 with qual1 as below


put1.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("qual1"),
Bytes.toBytes("ValueOneForPut1Qual1Updated"));

Lets run the below command


hbase(main):022:0> get 'test', 'row1'
COLUMN CELL
cf:qual1 timestamp=1536563493015, value=ValueOneForPut1Qual1Updated
cf:qual2 timestamp=1536563371778, value=ValueOneForPut1Qual2
1 row(s) in 0.0050 seconds

HBase stores multiple versions of each cell using timestamps for each of the versions and storing them in descending order. Each timestamp is a long integer value measured in milliseconds.When we put a value into HBase, we have the choice of either explicitly providing a timestamp or omitting that value, which in turn is then filled in by the RegionServer when the put operation is performed. HBASE keeps three versions of a value and we can access all three versions if required for a specific row.

Scan and get will only get the latest version because HBase saves versions in time descending order and is set to return only one version by default which can be modified by setting the maximum version parameter.

We can also add a list of puts into the table instance instead of adding one put at a time. Below is the code


package com.learn.hbase.client;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.util.Bytes;

public class PutList {

public static void main(String[] args) throws IOException {

Configuration conf = HBaseConfiguration.create();

Connection connection = ConnectionFactory.createConnection(conf);
Table table = connection.getTable(TableName.valueOf("test"));

try {

Put put1 = new Put(Bytes.toBytes("row1"));

Put put2 = new Put(Bytes.toBytes("row2"));

Put put3 = new Put(Bytes.toBytes("row3"));

put1.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("qual1"),
Bytes.toBytes("ValueOneForPut1Qual1"));

put2.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("qual1"),
Bytes.toBytes("ValueOneForPut2Qual1"));

put3.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("qual1"),
Bytes.toBytes("ValueOneForPut2Qual1"));

List<Put> list=new ArrayList<>();

list.add(put1);
list.add(put2);
list.add(put3);

table.put(list);

} finally {
table.close();
connection.close();
}

}


}

Since we are inserting many puts at a time there is a possibility that one of the put may fail in that case the error is reported back to the client and the puts which had valid data will be added to the DB.Internally the servers iterate over all operations and try to apply them. The failed ones are returned and the client reports the remote error using the RetriesExhaustedWithDetailsException with details like how many operations and also how many times it has retried to apply the erroneous modification.