你好,游客 登录 注册 搜索
背景:
阅读新闻

HBase高性能写入测试

[日期:2016-03-16] 来源:CSDN博客  作者:dhtx_wzgl的博客 [字体: ]

  0 测试环境

  本次测试的集群软件环境:

  Hadoop-1.2.1、Zookeeper-3.4.6、HBase-0.94.8、jdk7、centos7操作系统。

  本次测试的集群硬件环境:

  双CPU,四核处理器,32G内存,10T硬盘的PC物理机两台;

  双CPU,四核处理器,8G内存,10T硬盘的PC物理机三台。

  1 测试方案及结果分析

  注:以下各测试每开启一个线程就会读取一个数据文件,即几线程就对应几个数据文件。

  1.1 WAL

  测试目标:测试WAL对写入性能有多大的影响。

  测试原理:其实不推荐关闭WAL,不过关了的确可以提升性能。因为HBase

  在写数据前会先把操作持久化在WAL中,以保证在异常情况下,HBase可以按照

  WAL的记录来恢复还未持久化的数据。

  测试条件:4线程+12M Write Buffer Size +120 RPC Handler

  测试数据样例:

  京EA31276 1640751143 CAM14357485 2 2 5 0 1 0 2016-01-2309:25:01 20160123092501023 JGJ59034 0 0

  测试结果:

WAL状态

用时(秒)

性能(MB/S)

开启WAL

408

0.56

关闭WAL

41

5.6

  测试结论:关闭WAL的写入性能要远远大于开启WAL的性能,所以在数据可

  以容忍部分丢失的前提下,推荐关闭WAL。

  1.2 HBase的最优RPC Handler参数测试

  测试目标:找到本集群最合适的PRC Handler数。

  测试原理:该配置定义了每个Region Server上的RPC Handler的数量。Region

  Server通过RPC Handler接收外部请求并加以处理。所以提升RPCHandler

  的数量可以一定程度上提高HBase接收请求的能力。当然,handler数量也

  不是越大越好,这要取决于节点的硬件情况。

  测试条件:禁止auto flush和WAL,Write Buffer Size 12M

  参数配置:

  修改hbase-site.xml的hbase.regionserver.handler.count配置

  hbase.regionserver.handler.count

  100

  

  测试数据样例:

  京EA31276 1640751143 CAM14357485 2 2 5 0 1 0 2016-01-2309:25:01 20160123092501023 JGJ59034 0 0

  测试结果:

线程数

用时(秒)

写入数据量(条)

RPC Handler值

性能(MB/s)

4

49

230W

10

4.7

4

45

230W

200

5.1

4

44

230W

100

5.2

4

46

230W

50

5

4

45

230W

150

5.1

4

41

230W

120

5.6

  测试结论:在该测试环境下,当RPC Handler数低于120时,增加RPC Handler

  数可以提升写入性能,当大于120时,性能开始下降,得出本集群环境下,HBase

  最优RPC Handler数约为120。

  1.3 HBase的最优Write Buffer Size参数测试

  测试目标:找到本集群最合适的Write Buffer Size大小。

  测试原理:HBase Client会在数据累积到设置的阈值后才提交Region Server。

  这样做的好处在于可以减少RPC连接次数。

  测试条件:禁止auto flush和WAL,代码语句分别如下所示

  table.setAutoFlush(false); //禁止auto flush

  put.setWriteToWAL(false); //禁止WAL

  测试命令:java -jar SendData_60L.jar 线程数 BufferSize flushSize 表名

  测试数据样例 :

  京EA31276 2016-01-23 09:25;0 116.7739757 39.9181911 ;1 116.1156204 39.3061622 ;2 116.2767457 39.6845126 ;3 116.68283 39.785515 ;4 116.5265639 39.1413711 ;5 116.5716852 39.7667081 ;6 116.265372 39.4464143 ;7 116.4854542 39.2365814 ;8 116.8428374 39.2240418 ;9 116.2548630 39.6447603 ;10 116.118823 39.964609 ;11 116.4857220 39.3285316 ;12 116.6228221 39.2873718 ;13 116.3668650 39.1239610 ;14 116.1466091 39.7767223 ;15 116.3834822 39.5707098 ;16 116.4568834 39.7347751 ;17 116.1185462 39.44076 ;18 116.9274841 39.3211428 ;19 116.9648843 39.6537633 ;20 116.2809645 39.6609038 ;21 116.6651957 39.4583201 ;22 116.4475937 39.6767574 ;23 116.6229150 39.6681265 ;24 116.8879526 39.94014 ;25 116.9352276 39.166974 ;26 116.3062773 39.4250674 ;27 116.5362078 39.5918600 ;28 116.655801 39.3478595 ;29 116.7911087 39.3105966 ;30 116.8989259 39.3485116 ;31 116.3451064 39.6591314 ;32 116.3620885 39.8758627 ;33 116.6254488 39.262504 ;34 116.6791192 39.5431246 ;35 116.706863 39.8259232 ;36 116.5606280 39.6052318 ;37 116.8849616 39.9964014 ;38 116.400404 39.6563677 ;39 116.5915861 39.3331378 ;40 116.2414020 39.5078832 ;41 116.1466572 39.2394249 ;42 116.5861500 39.7719353 ;43 116.4466894 39.7355696 ;44 116.1568285 39.5417675 ;45 116.5938800 39.4523919 ;46 116.9459531 39.9213514 ;47 116.9611731 39.7998314 ;48 116.9873481 39.2377567 ;49 116.8488279 39.8495391 ;50 116.4287103 39.4727894 ;51 116.7807852 39.2478749 ;52 116.4720670 39.189716 ;53 116.5181192 39.6537954 ;54 116.5004762 39.4856760 ;55 116.5962549 39.1443630 ;56 116.2558550 39.4980812 ;57 116.675138 39.6755529 ;58 116.289822 39.2409512 ;59 116.906777 39.7238894

  测试结果:

线程

Buffer Size(M)

Flush Size( 条)

性能(条/s )

4

5

10000

240240

4

6

10000

205920

4

7

10000

240240

4

8

10000

205920

4

9

10000

240240

4

10

10000

205920

4

11

10000

205920

4

12

10000

240240

4

15

10000

205920

4

20

10000

205920

5

5

10000

300300

5

6

10000

257400

5

7

10000

257400

5

8

10000

225225

5

9

10000

257400

5

10

10000

257400

5

11

10000

225225

5

12

10000

225225

5

15

10000

257400

5

20

10000

257400

8

5

10000

360360

8

6

10000

320320

8

7

10000

360360

8

8

10000

320320

8

9

10000

320320

8

10

10000

320320

8

11

10000

360360

8

12

10000

320320

8

15

10000

320320

8

20

10000

320320

9

5

10000

324324

9

6

10000

324324

9

7

10000

324324

9

8

10000

324324

9

9

10000

360360

9

10

10000

360360

9

11

10000

324324

9

12

10000

360360

9

15

10000

360360

9

20

10000

294840

10

5

10000

360360

10

6

10000

360360

10

7

10000

360360

10

8

10000

360360

10

9

10000

360360

10

10

10000

400400

10

11

10000

400400

10

12

10000

400400

10

15

10000

327600

10

20

10000

360360

11

5

10000

360360

11

6

10000

396396

11

7

10000

396396

11

8

10000

360360

11

9

10000

360360

11

10

10000

396396

12

5

10000

432432

12

6

10000

393120

12

7

10000

393120

12

8

10000

432432

12

9

10000

432432

12

10

10000

393120

14

10

10000

388080

14

9

10000

420420

14

8

10000

388080

14

7

10000

420420

14

6

10000

420420

14

5

10000

420420

20

5

10000

400400

20

6

10000

423952

20

7

10000

379326

20

8

10000

379326

20

9

10000

379326

20

10

10000

423952

20

11

10000

379326

19

5

10000

380380

19

6

10000

380380

19

7

10000

402755

19

8

10000

402755

19

9

10000

402755

19

10

10000

402755

19

11

10000

456456

19

15

10000

402755

18

5

10000

432432

18

6

10000

432432

18

7

10000

432432

18

8

10000

405405

18

9

10000

405405

18

10

10000

405405

18

11

10000

405405

17

5

10000

437580

17

6

10000

408408

17

7

10000

437580

17

8

10000

437580

17

9

10000

408408

17

10

10000

408408

16

5

10000

443520

16

6

10000

411840

16

7

10000

411840

16

8

10000

443520

16

9

10000

384384

16

10

10000

411840

15

5

10000

415800

15

6

10000

450450

15

7

10000

443520

15

8

10000

450450

15

9

10000

491400

15

10

10000

415800

13

5

10000

425880

13

6

10000

425880

13

7

10000

390390

13

8

10000

360360

13

9

10000

425880

13

10

10000

390390

  测试结论:由结果分析可知,在本集群环境下,开启15个线程并设置bufferSize为9M时结果最好。

  1.4 HBase集群Regionserver 数量测试

  测试目标:测试Regionserver数量对写入性能的影响趋势。

  测试原理:写入的压力一般都会集中在RegionServer上,当RegionServer

  数量增加时,在写入一定的前提下每一台Server的负载压力就会减少。

  测试条件:4线程+10M Write Buffer Size +120 RPC Handler。

  测试数据样例 :

  京EA31276 2016-01-23 09:25;0 116.7739757 39.9181911 ;1 116.1156204 39.3061622 ;2 116.2767457 39.6845126 ;3 116.68283 39.785515 ;4 116.5265639 39.1413711 ;5 116.5716852 39.7667081 ;6 116.265372 39.4464143 ;7 116.4854542 39.2365814 ;8 116.8428374 39.2240418 ;9 116.2548630 39.6447603 ;10 116.118823 39.964609 ;11 116.4857220 39.3285316 ;12 116.6228221 39.2873718 ;13 116.3668650 39.1239610 ;14 116.1466091 39.7767223 ;15 116.3834822 39.5707098 ;16 116.4568834 39.7347751 ;17 116.1185462 39.44076 ;18 116.9274841 39.3211428 ;19 116.9648843 39.6537633 ;20 116.2809645 39.6609038 ;21 116.6651957 39.4583201 ;22 116.4475937 39.6767574 ;23 116.6229150 39.6681265 ;24 116.8879526 39.94014 ;25 116.9352276 39.166974 ;26 116.3062773 39.4250674 ;27 116.5362078 39.5918600 ;28 116.655801 39.3478595 ;29 116.7911087 39.3105966 ;30 116.8989259 39.3485116 ;31 116.3451064 39.6591314 ;32 116.3620885 39.8758627 ;33 116.6254488 39.262504 ;34 116.6791192 39.5431246 ;35 116.706863 39.8259232 ;36 116.5606280 39.6052318 ;37 116.8849616 39.9964014 ;38 116.400404 39.6563677 ;39 116.5915861 39.3331378 ;40 116.2414020 39.5078832 ;41 116.1466572 39.2394249 ;42 116.5861500 39.7719353 ;43 116.4466894 39.7355696 ;44 116.1568285 39.5417675 ;45 116.5938800 39.4523919 ;46 116.9459531 39.9213514 ;47 116.9611731 39.7998314 ;48 116.9873481 39.2377567 ;49 116.8488279 39.8495391 ;50 116.4287103 39.4727894 ;51 116.7807852 39.2478749 ;52 116.4720670 39.189716 ;53 116.5181192 39.6537954 ;54 116.5004762 39.4856760 ;55 116.5962549 39.1443630 ;56 116.2558550 39.4980812 ;57 116.675138 39.6755529 ;58 116.289822 39.2409512 ;59 116.906777 39.7238894

  测试结果:

Regionserver 数量

性能( 万条/秒)

2

13.3

3

23.2

4

26.5

  测试结论:增加Regionserver服务器,可以提升写入速度。

  1.5 减少HBase列名字节数

  测试目标:测试列所占空间大小对写入性能的影响。

  测试原理:减少列名所占字节数,会减少单个cell的总信息量大小。

  测试条件:8线程+11M Write Buffer Size +120 RPC Handler+HBase表记录

  每行60列。

  测试数据样例:

  京EA31276 2016-01-23 09:25;0 116.7739757 39.9181911 ;1 116.1156204 39.3061622 ;2 116.2767457 39.6845126 ;3 116.68283 39.785515 ;4 116.5265639 39.1413711 ;5 116.5716852 39.7667081 ;6 116.265372 39.4464143 ;7 116.4854542 39.2365814 ;8 116.8428374 39.2240418 ;9 116.2548630 39.6447603 ;10 116.118823 39.964609 ;11 116.4857220 39.3285316 ;12 116.6228221 39.2873718 ;13 116.3668650 39.1239610 ;14 116.1466091 39.7767223 ;15 116.3834822 39.5707098 ;16 116.4568834 39.7347751 ;17 116.1185462 39.44076 ;18 116.9274841 39.3211428 ;19 116.9648843 39.6537633 ;20 116.2809645 39.6609038 ;21 116.6651957 39.4583201 ;22 116.4475937 39.6767574 ;23 116.6229150 39.6681265 ;24 116.8879526 39.94014 ;25 116.9352276 39.166974 ;26 116.3062773 39.4250674 ;27 116.5362078 39.5918600 ;28 116.655801 39.3478595 ;29 116.7911087 39.3105966 ;30 116.8989259 39.3485116 ;31 116.3451064 39.6591314 ;32 116.3620885 39.8758627 ;33 116.6254488 39.262504 ;34 116.6791192 39.5431246 ;35 116.706863 39.8259232 ;36 116.5606280 39.6052318 ;37 116.8849616 39.9964014 ;38 116.400404 39.6563677 ;39 116.5915861 39.3331378 ;40 116.2414020 39.5078832 ;41 116.1466572 39.2394249 ;42 116.5861500 39.7719353 ;43 116.4466894 39.7355696 ;44 116.1568285 39.5417675 ;45 116.5938800 39.4523919 ;46 116.9459531 39.9213514 ;47 116.9611731 39.7998314 ;48 116.9873481 39.2377567 ;49 116.8488279 39.8495391 ;50 116.4287103 39.4727894 ;51 116.7807852 39.2478749 ;52 116.4720670 39.189716 ;53 116.5181192 39.6537954 ;54 116.5004762 39.4856760 ;55 116.5962549 39.1443630 ;56 116.2558550 39.4980812 ;57 116.675138 39.6755529 ;58 116.289822 39.2409512 ;59 116.906777 39.7238894

  测试结果:

列名总大小

性能( 条/S)

减少列字节大小前(480B )

320320

减少列字节大小后(240B )

320320

  测试结论:改变列名占有的空间大小几乎不影响性能。可能是因为,每一列减少的两个字节数相对于整个cell所占的字节数而言太小了。

  1.6 每列value值大小

  测试目标:测试列value的大小对写入性能的影响。

  测试条件:4线程+11M Write Buffer Size +120 RPC Handler+HBase表记录

  每行60列。

  测试数据样例:

  大小(0.026KB)

  京EA31276 2016-01-2309:25:00 116.7739757 39.9181911 ;

  大小(0.126KB)

  京EA31276 2016-01-2309:25:00 116.7739757QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ 39.9181911QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ ;

  大小(0.526KB)

  京EA31276 2016-01-2309:25:00 116.7739757QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ39.9181911QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ;

  大小(1.026KB)

  京EA31276 2016-01-2309:25:00 116.7739757QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ39.9181911QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ;

  测试结果:

列值大小

性能( 条/S)

性能(MB/S)

0.026KB

240240

6.24

0.126KB

160160

20 .6

0.526KB

80080

40.2

1.026KB

43680

43.7

  测试结论:在相同列情况下,每一列的值越大,写入越快。

  1.7 列数

  测试目标:测试列数目对写入性能的影响。

  测试原理:在每行存储值相同的情况下,列数越多,即所需的额外数据也就越多,比如时间戳和列标识等。

  测试条件:10线程+11M WriteBuffer Size +120 RPC Handler+HBase表每

  一行记录的总大小相同。

  测试数据样例:

  京EA31276 2016-01-23 09:25;0 116.7739757 39.9181911 ;1 116.1156204 39.3061622 ;2 116.2767457 39.6845126 ;3 116.68283 39.785515 ;4 116.5265639 39.1413711 ;5 116.5716852 39.7667081 ;6 116.265372 39.4464143 ;7 116.4854542 39.2365814 ;8 116.8428374 39.2240418 ;9 116.2548630 39.6447603 ;10 116.118823 39.964609 ;11 116.4857220 39.3285316 ;12 116.6228221 39.2873718 ;13 116.3668650 39.1239610 ;14 116.1466091 39.7767223 ;15 116.3834822 39.5707098 ;16 116.4568834 39.7347751 ;17 116.1185462 39.44076 ;18 116.9274841 39.3211428 ;19 116.9648843 39.6537633 ;20 116.2809645 39.6609038 ;21 116.6651957 39.4583201 ;22 116.4475937 39.6767574 ;23 116.6229150 39.6681265 ;24 116.8879526 39.94014 ;25 116.9352276 39.166974 ;26 116.3062773 39.4250674 ;27 116.5362078 39.5918600 ;28 116.655801 39.3478595 ;29 116.7911087 39.3105966 ;30 116.8989259 39.3485116 ;31 116.3451064 39.6591314 ;32 116.3620885 39.8758627 ;33 116.6254488 39.262504 ;34 116.6791192 39.5431246 ;35 116.706863 39.8259232 ;36 116.5606280 39.6052318 ;37 116.8849616 39.9964014 ;38 116.400404 39.6563677 ;39 116.5915861 39.3331378 ;40 116.2414020 39.5078832 ;41 116.1466572 39.2394249 ;42 116.5861500 39.7719353 ;43 116.4466894 39.7355696 ;44 116.1568285 39.5417675 ;45 116.5938800 39.4523919 ;46 116.9459531 39.9213514 ;47 116.9611731 39.7998314 ;48 116.9873481 39.2377567 ;49 116.8488279 39.8495391 ;50 116.4287103 39.4727894 ;51 116.7807852 39.2478749 ;52 116.4720670 39.189716 ;53 116.5181192 39.6537954 ;54 116.5004762 39.4856760 ;55 116.5962549 39.1443630 ;56 116.2558550 39.4980812 ;57 116.675138 39.6755529 ;58 116.289822 39.2409512 ;59 116.906777 39.7238894

  测试结果:

列数

存储的列值大小

性能(条/S )

性能(MB/S )

60

0.026KB

360360

9.36

1

1.55KB

1201200

31.2

  测试结论:在每行存储值大小相同的情况下,列数越少越快。

  2 测试总结

 

  本次测试主要采用的是优化HBase集群参数配置和优化表结构设计,以及最大化挖掘硬件环境潜力的方法。根据以上测试结果分析,该集群对没有经过特殊处理的数据,最好的写入性能约是30W条/S;对于经过特殊处理的数据,最好的写入性能达到了120W条/S,虽然有着如此令人兴奋的结果,但是这是在没有考虑数据处理耗时的前提的下的。目前对数据处理的方法效果还是很差的,所以如何提升数据处理效率是真正实现该集群百万级写入的一个关键所在。





收藏 推荐 打印 | 录入:elainebo | 阅读:
本文评论   查看全部评论 (0)
表情: 表情 姓名: 字数
点评:
       
评论声明
  • 尊重网上道德,遵守中华人民共和国的各项有关法律法规
  • 承担一切因您的行为而直接或间接导致的民事或刑事法律责任
  • 本站管理人员有权保留或删除其管辖留言中的任意内容
  • 本站有权在网站内转载或引用您的评论
  • 参与本评论即表明您已经阅读并接受上述条款