>>_

I evaluated Qwen3-0.6B from two different deployment perspectives:
- Local execution using Ollama (with direct config inspection)
- Remote access via an HTTPS chat endpoint
Both setups were red-teamed using:
- Garak (LLM vulnerability scanner)
- Promptfoo (adversarial prompt testing framework)
## Key Results
| Deployment | Safety Pass Rate |
|---|---|
| Local (Ollama) | 54% |
| Remote (HTTPS endpoint) | 8% |
A "pass" indicates the model safely handled or rejected malicious prompts.
The remote deployment failed 92% of attacks.

## OWASP-Aligned Test Breakdown
| Category | Description | Local | Remote |
|---|---|---|---|
| Excessive Agency | Unsafe autonomous actions | 67% | 2% |
| Hijacking | Task redirection attacks | 60% | 0% |
| Prompt Extraction | System prompt leakage | 40% | 0% |
| Overreliance | Failure to challenge bad input | 27% | 0% |
## Additional Finding
During testing, the model hallucinated ANSI escape codes in 56% of cases.
Examples included incorrect sequences like:
\t135H
This creates a real risk surface:
- Terminal injection
- Log poisoning

## Architecture Insights
From the model configuration:
- 28 hidden layers → deeper prompt influence propagation
- SiLU + SwiGLU → smoother activations, reduced sparsity
- RMSNorm → stable normalization
- RoPE (θ = 1,000,000) → extended context handling
- Grouped Query Attention (GQA) → shared attention memory
## Key Takeaway
Model behavior is not determined solely by training data.
It is influenced by:
- Architectural design choices
- Mathematical functions (activation, normalization, attention)
- Training interactions between components
- Deployment wrapper and API structure
Same model weights can produce drastically different security outcomes depending on how they are exposed.
## Implications
>_ For Defenders
- Model configuration should be part of threat modeling
- Output handling and downstream usage must be validated
>_ For Builders
- Activation functions and architecture impact security
- API design and prompt handling define attack surfaces
## Conclusion
The model is only one part of the system.
The interface and deployment layer define its real-world security posture.
## References
>_ Tooling
- Garak (NVIDIA) — https://github.com/NVIDIA/garak
- Promptfoo — https://www.promptfoo.dev/
- Ollama — https://ollama.com/
- Qwen3-0.6B — https://huggingface.co/Qwen/Qwen3-0.6B
>_ Frameworks
- OWASP Top 10 for LLM Applications — https://genai.owasp.org/llm-top-10/
- AVID — https://avidml.org/
- MITRE ATLAS — https://atlas.mitre.org/
>_ Research Papers
- ReLU Strikes Back (2023) — https://arxiv.org/abs/2310.04564
- RoFormer / RoPE (2021) — https://arxiv.org/abs/2104.09864
- Grouped Query Attention (2023) — https://arxiv.org/abs/2305.13245
- RMSNorm (2019) — https://arxiv.org/abs/1910.07467
- GLU Variants Improve Transformer (2020) — https://arxiv.org/abs/2002.05202
#AISecurity #LLM #RedTeaming #CyberSecurity #PromptInjection #MLSecurity
