Comparing token usage between Chain-of-Thought (CoT) and Hypothetical-Optimal-Thinking (HoT) responses.
| Dataset | Model | Question (Input) | CoT Avg | HoT Total | HoT Reformat Q | HoT Answer | Ratio | Overhead % | Token Distribution |
|---|---|---|---|---|---|---|---|---|---|
| drop_cencus | g-pro-002 | 343 | 62 | 469 | 361 | 105 | 7.56x | 77.1% | |
| drop_break | g-pro-002 | 345 | 72 | 481 | 371 | 108 | 6.68x | 77.0% | |
| bbeh_spatial_reasoning | n-llama405b | 1445 | 594 | 1810 | 1200 | 608 | 3.05x | 66.3% | |
| bbeh_time_arithmetic | g-pro-002 | 324 | 498 | 796 | 256 | 537 | 1.60x | 32.2% | |
| bbeh_shuffle_objects | g-pro-002 | 3244 | 911 | 1073 | 608 | 462 | 1.18x | 56.6% | |
| Mean ± Std | - | 1140 ± 1136 | 427 ± 324 | 926 ± 495 | 559 ± 340 | 364 ± 215 | 4.01x ± 2.63 | 61.8% ± 16.7 | - |