A Cursory Test Of 3 Open Source AI Large Language Models: DeepSeek RI vs Qwen 3.5 vs Gemma 3

A few weeks ago we wrote an insightful blog post on how you can run an AI LLM locally on your machine. Since then we've tried out 2 more models. Our primary goal was to check out the performance of each model in a basic CPU-only machine; no GPU's anywhere. How would each of the models hold up when used by a less technically-inclined individual? We installed from the smallest to the mid; the really huge models were ignored here since we assumed not many average users would sacrifice Gigabytes of diskspace for that. The machine we used: a base Core i7 machine with 32GB of RAM and 1TB disk space. The models we ran:

DeepSeek-R1 1.5b, 7b, 8b, 14b, 32b models.
Qwen 3.5 2b, 4b and 9b models.
Gemma 3 1b and 4b models.

We chose these models since they did not consume much of disk space. NOTE: This is not in-depth technical testing; we are just approaching things as an average computer user would.

Now, for those who do not want to read through this entire article here is what we found about each of the models:

DeepSeek R1 has superior reasoning capabilities. This is a fact that has been stated across many AI testing blogs. Its reasoning is at par with Gemini 2.5 Pro and ChatGPT 4. However, it did put some strain on our machine while it thought through a prompt.The fans were really whirring when in use.
Qwen 3.5 has almost similar reasoning capabilities to DeepSeek but its 'thinking' process uses an internal checklist of some sort. Its system resource usage was in between that of DeepSeek R1 and Gemma 3; not too hard on the machine but also not to easy on it.
Gemma 3 had the best resource usage amongst the 3 but with a sad catch: its reasoning capabilities were sub-par.

Now, on to the meat of the article. We prompted each with a question about RLC circuits, their usage and the Ordinary Differential Equations that represent it. The results were stunning and revealed just how each model solved the problem.

DeepSeek R1

DeepSeek R1 is an AI model that was created by Chinese startup, DeepSeek. It uses an "innovative Mixture-of-Experts (MoE) architecture, which allows for efficient inference while maintaining high performance". What really makes DeepSeek interesting is that it has "self-learning" capabilities i.e. when used, it can infer and learn from user prompting over time. What we noticed about it is that it can actually reason right up to the best condensed and simplest solution while the other 2 (especially Qwen) would sometimes get stuck in endless loops of meta-thinking.

DeepSeek's response was brilliant. It's 'thinking' mirrored almost human-like reasoning capabilities. It took some seconds before starting to think, then took some time thinking through the problem then finally streaming the solution. The smaller 1.5b model was the quickest but its output was not good enough for highly technical prompts. The larger DeepSeek models had slower speed of response but better results.

Qwen 3.5

Qwen 3.5 is an Alibaba product. The unique thing about it is that it lacks a 'self-learning' component. This does not mean it is handicapped in any way. We detected that it responded quite well to our RLC circuit question complete with the 'how-to' solve the ODE's of the circuit. One thing we noticed about it is that its 'thinking' follows a checklist: Problem -> Constraint -> Solution ... The time taken to start reasoning was shorter than that of DeepSeek and its resource usage was a bit better than that of DeepSeek R1.

One more thing: when it comes to following instructions strictly, Qwen 3.5 beat both DeepSeek R1 and Gemma 3.

Gemma 3

Gemma 3 are open weight models built by Google on Gemini technology. Gemma 3's speed of response amazed us. It was pretty fast. It lacks the 'thinking' step that the previous 2 models have. And it did not put as much strain on our machine's resources while in use. Its deficiencies become apparent in multi-turn contexts where it loses the train of thought so you might need to watch on that.

It did produce in some instances subpar responses and sometimes failed to follow the instructions correctly. Due to its lack of 'thinking' one could not tell how it reached an answer unless you challenged it in a second prompt.

Use Cases of Each Model

So where can each of these models shine? Which would be the most appropriate use case for each?

First, while these smaller models can be used in coding tasks they are not the best. We do not recommend using them in coding. However, they are very good at content generation tasks, quick references and answers, scoping out ideas and the like.

DeepSeek R1 - If you are looking at more technical writing, deep technical reasoning on scientific topics DeepSeek R1 should be your top choice.
Qwen 3.5 excels in web content creation, point-by-point analyses and general creative writing. If prompted appropriately it will not only generate really good web copy, it can also generate the corresponding META descriptions and web content layout.
Gemma 3 excels in quickly generating initial rough drafts, quick simple references and simpler logical analyses with fewer turns. While its output might not be good enough it can be fed into the other models to refine and improve.

There is still a lot more that we will need to uncover as we try them out in our day-to-day activities.

Published: 5th, Sunday, Apr, 2026 Last Modified: 5th, Sunday, Apr, 2026

A Cursory Test Of 3 Open Source AI Large Language Models: DeepSeek RI vs Qwen 3.5 vs Gemma 3

DeepSeek R1

Qwen 3.5

Gemma 3

Use Cases of Each Model

7+

135+

140+

Imagine | Design | Innovate

Useful Links

Products

Tools