Advancements in Large Language Models Through Statistical Comparison of Phi-4 and Qwen

Bajaj, Shalini Bhaskar (2025) Advancements in Large Language Models Through Statistical Comparison of Phi-4 and Qwen. International Journal of Innovative Science and Research Technology, 10 (9): 25sep899. pp. 1463-1469. ISSN 2456-2165

Abstract

The paper investigates and compares the performance of two language models phi4 and qwen by using a comprehensive evaluation framework. It is designed to assess them on multiple metrics such as generation of text-length, token-count, response time and readability. To make sure the evaluation is robust, we utilize an array of statistical techniques which are ANOVA, Welch’s t-Tests, Levene's test, as well as non-parametric tests, Mann-Whitney U and Kruskal-Wallis tests. This multi-layered approach allows for a detailed and better comparison of the models, highlighting small differences in their output behaviors and performance profiles. The analysis reveals that phi4 generates detailed and varied responses as evidenced by high text lengths and token counts, indicating its strength in applications that require comprehensive and in-depth information. Whereas qwen consistently demonstrates significantly lower latency and exhibits higher readability, which makes it perfect for real-time conversations where speed and clarity are paramount. These distinct characteristics highlight the difference between variation and efficiency, suggesting that the optimal model choice is dependent on the specific needs of the tasks. For instance, phi4 might be advantageous for generating reports or explaining content, qwen is more appropriate for virtual assistant applications where quick response and communication are required.

Documents
2893:17456
[thumbnail of IJISRT25SEP899.pdf]
Preview
IJISRT25SEP899.pdf - Published Version

Download (1MB) | Preview
Information
Library
Metrics

Altmetric Metrics

Dimensions Matrics

Statistics

Downloads

Downloads per month over past year

View Item