📊 Token Overhead Analysis: HoT vs CoT

Comparing token usage between Chain-of-Thought (CoT) and Hypothetical-Optimal-Thinking (HoT) responses.

Datasets Analyzed

5

Avg Input (Question) Tokens

1140

Avg HoT/CoT Ratio

4.01x

Avg Reformatted Q Overhead

61.8%
CoT Tokens
HoT Answer Tokens
HoT Reformatted Question Tokens
Dataset Model Question (Input) CoT Avg HoT Total HoT Reformat Q HoT Answer Ratio Overhead % Token Distribution
drop_cencus g-pro-002 343 62 469 361 105 7.56x 77.1%
drop_break g-pro-002 345 72 481 371 108 6.68x 77.0%
bbeh_spatial_reasoning n-llama405b 1445 594 1810 1200 608 3.05x 66.3%
bbeh_time_arithmetic g-pro-002 324 498 796 256 537 1.60x 32.2%
bbeh_shuffle_objects g-pro-002 3244 911 1073 608 462 1.18x 56.6%
Mean ± Std - 1140 ± 1136 427 ± 324 926 ± 495 559 ± 340 364 ± 215 4.01x ± 2.63 61.8% ± 16.7 -