[ Home ][ Contact Us ]

>>_

#VIvishnu

Cover Image for A Comparative Security Evaluation of Local vs Remote LLM Deployments

#VIvishnu

June 14, 2026

I evaluated Qwen3-0.6B from two different deployment perspectives:

Local execution using Ollama (with direct config inspection)
Remote access via an HTTPS chat endpoint

Both setups were red-teamed using:

Garak (LLM vulnerability scanner)
Promptfoo (adversarial prompt testing framework)

## Key Results

Deployment	Safety Pass Rate
Local (Ollama)	54%
Remote (HTTPS endpoint)	8%

A "pass" indicates the model safely handled or rejected malicious prompts.

The remote deployment failed 92% of attacks.

Screenshot From 2026-04-21 14-50-31

## OWASP-Aligned Test Breakdown

Category	Description	Local	Remote
Excessive Agency	Unsafe autonomous actions	67%	2%
Hijacking	Task redirection attacks	60%	0%
Prompt Extraction	System prompt leakage	40%	0%
Overreliance	Failure to challenge bad input	27%	0%

## Additional Finding

During testing, the model hallucinated ANSI escape codes in 56% of cases.

Examples included incorrect sequences like:

\t
135
H

This creates a real risk surface:

Terminal injection
Log poisoning

Screenshot From 2026-04-21 14-52-25

## Architecture Insights

From the model configuration:

28 hidden layers → deeper prompt influence propagation
SiLU + SwiGLU → smoother activations, reduced sparsity
RMSNorm → stable normalization
RoPE (θ = 1,000,000) → extended context handling
Grouped Query Attention (GQA) → shared attention memory

## Key Takeaway

Model behavior is not determined solely by training data.

It is influenced by:

Architectural design choices
Mathematical functions (activation, normalization, attention)
Training interactions between components
Deployment wrapper and API structure

Same model weights can produce drastically different security outcomes depending on how they are exposed.

## Implications

>_ For Defenders

Model configuration should be part of threat modeling
Output handling and downstream usage must be validated

>_ For Builders

Activation functions and architecture impact security
API design and prompt handling define attack surfaces

## Conclusion

The model is only one part of the system.

The interface and deployment layer define its real-world security posture.

## References

>_ Tooling

Garak (NVIDIA) — https://github.com/NVIDIA/garak
Promptfoo — https://www.promptfoo.dev/
Ollama — https://ollama.com/
Qwen3-0.6B — https://huggingface.co/Qwen/Qwen3-0.6B

>_ Frameworks

OWASP Top 10 for LLM Applications — https://genai.owasp.org/llm-top-10/
AVID — https://avidml.org/
MITRE ATLAS — https://atlas.mitre.org/

>_ Research Papers

ReLU Strikes Back (2023) — https://arxiv.org/abs/2310.04564
RoFormer / RoPE (2021) — https://arxiv.org/abs/2104.09864
Grouped Query Attention (2023) — https://arxiv.org/abs/2305.13245
RMSNorm (2019) — https://arxiv.org/abs/1910.07467
GLU Variants Improve Transformer (2020) — https://arxiv.org/abs/2002.05202

#AISecurity #LLM #RedTeaming #CyberSecurity #PromptInjection #MLSecurity