Force reasoning for any LLM

This is a simple proof-of-concept to get any LLM (Large language Model) to reason ahead of its response. This interface uses Qwen/Qwen2-1.5B-Instruct model which is not a reasoning model. The used method is only to force some "reasoning" steps with prefixes to help the model to enhance the answer.

See my related article here: Make any model reasoning

Tweaking

50 1024
50 1024
0.1 1

Using smaller number of tokens in the reasoning steps will make the model faster to answer, but it may not be able to go deep enough in its reasoning. A good value is 100 to 512.

Using smaller number of tokens for the final answer will make the model to be less verbose, but it may not be able to give a complete answer. A good value is 512 to 1024.

Do sample uses another strategie to select the next token to complete the answer. It's commonly better to leave it checked.

Temperature indicates how much the model could be "creative". 0.7 is a common value. If you set a too high value (like 1.0) the model could be incoherent. With a low value (like 0.3), the model will produce very predictives answers.

This interface can work on personal computer with 6Go VRAM (e.g. NVidia 3050/3060 on laptop). Feel free to fork the application and try others instruct models.