tensorflow alexnet performance test

放一些最近做的测试

测试1:
tutorials\image\alexnet\alexnet_benchmark.py:
网络结构:
conv1   [128, 56, 56, 64]
pool1   [128, 27, 27, 64]
conv2   [128, 27, 27, 192]
pool2   [128, 13, 13, 192]
conv3   [128, 13, 13, 384]
conv4   [128, 13, 13, 256]
conv5   [128, 13, 13, 256]
pool5   [128, 6, 6, 256]

batch size=128

E5 1630-v3 4核
 step 0, duration = 1.412
 Forward across 10 steps, 1.334 +/- 0.039 sec / batch
 step 0, duration = 4.181
 Forward-backward across 10 steps, 4.175 +/- 0.170 sec / batch

GTX 970:
 step 0, duration = 0.046
 Forward across 1 steps, 0.046 +/- 0.000 sec / batch
 step 0, duration = 0.137
 Forward-backward across 1 steps, 0.137 +/- 0.000 sec / batch

K20m:
  Forward across 100 steps, 0.093 +/- 0.001 sec / batch
  Forward-backward across 100 steps, 0.253 +/- 0.001 sec / batch


测试2,model/slim下完整的training:
alexnet v2 batch size=128
K20m: 0.35s
gtx 970: 0.2s
E5 1630-v3: 27s

inception v2: batch size=32
K20m: 0.83s
gtx 970: 0.52s


还有些我没贴出来。我觉得会不会是哪里搞错了。。。。太诡异
首先,不同CPU的性能差不多。尤其是20核的居然和普通家用的4核差不多。
其次,CPU和GPU的性能差距,真是远比我想像的大。差了30-40倍

(我会持续更新的)


此博客中的热门博文

少写代码,多读别人写的代码

在windows下使用llvm+clang

tensorflow distributed runtime初窥