📊 Token Overhead Analysis: HoT vs CoT

Comparing token usage between Chain-of-Thought (CoT) and Hypothetical-Optimal-Thinking (HoT) responses.

1140

4.01x

61.8%

CoT Tokens

HoT Answer Tokens

HoT Reformatted Question Tokens

Dataset	Model	Question (Input)	CoT Avg	HoT Total	HoT Reformat Q	HoT Answer	Ratio	Overhead %	Token Distribution
drop_cencus	g-pro-002	343	62	469	361	105	7.56x	77.1%
drop_break	g-pro-002	345	72	481	371	108	6.68x	77.0%
bbeh_spatial_reasoning	n-llama405b	1445	594	1810	1200	608	3.05x	66.3%
bbeh_time_arithmetic	g-pro-002	324	498	796	256	537	1.60x	32.2%
bbeh_shuffle_objects	g-pro-002	3244	911	1073	608	462	1.18x	56.6%
Mean ± Std	-	1140 ± 1136	427 ± 324	926 ± 495	559 ± 340	364 ± 215	4.01x ± 2.63	61.8% ± 16.7	-