The TFLOP score of a graphics card depends on the type of data being pushed through it. Due to being "quick and dirty" neural network processing tends to be done with half-precision (FP16) data types.

Typically consumer graphics cards are aimed at single-precision data (32-bit, FP32) and this is the most common used for gaming purposes.

Often a graphics card may also list double-precision (FP64) TFLOPs values as well and these are typically half that of FP32, generally due to the doubled amount of data.

This might lead you to believe that FP16 data should be processed twice as fast as FP32 but this is not always the case as the internal registers of the processing units are 32-bit and to work twice as fast you would need to pack two FP16 values into an FP32 register. Not all graphics cards can perform the data packing required to work with two FP16 units in a single FP32 and as a result see no benefit for the smaller data type. There was a minor furore some years back over graphics card manufacturers "hobbling" consumer graphics cards FP16 performance in order to push developers to the workstation or scientific cards which did actually have this optimisation.

Largely what card is "best" depends on your dataset and that is something only you can know. TFLOPS gives a good indication of processing power, but you have to know what type of data you have first (FP16/FP32) and whether your card is optimised to do double the work on the smaller data type.

Memory bandwidth will also play a factor, higher bandwidth meaning less time spent waiting on data.

For more information I recommend reading Nvidias Deep Learning SDK documentation which states:

Half precision (also known as FP16) data compared to higher precision FP32 vs FP64 reduces memory usage of the neural network, allowing training and deployment of larger networks, and FP16 data transfers take less time than FP32 or FP64 transfers.

Single precision (also known as 32-bit) is a common floating point format (float in C-derived programming languages), and 64-bit, known as double precision (double).

Deep Neural Networks (DNNs) have led to breakthroughs in a number of areas, including image processing and understanding, language modeling, language translation, speech processing, game playing, and many others. DNN complexity has been increasing to achieve these results, which in turn has increased the computational resources required to train these networks. One way to lower the required resources is to use lower-precision arithmetic, which has the following benefits.

### Decrease the required amount of memory

Half-precision floating point format (FP16) uses 16 bits, compared to 32 bits for single precision (FP32). Lowering the required memory enables training of larger models or training with larger mini-batches.

### Shorten the training or inference time

Execution time can be sensitive to memory or arithmetic bandwidth. **Half-precision halves the number of bytes accessed, thus reducing the time spent in memory-limited layers**. NVIDIA GPUs offer up to **8x more half precision arithmetic throughput when compared to single-precision, thus speeding up math-limited layers**.

Largely what you need depends on the model you are training. You need to find out whether your model is memory (size or speed) or math constrained and make a decision based on that.