presentations

Random presentations (both in Spanish and English)

This project is maintained by jgbarah

Generative AI models running in your own infrastructure

AI: Activities for all ages and subjects

Madrid, Spain, May 11, 2026

Jesus M. Gonzalez-Barahona

https://jgbarah.github.io/presentations/


What’s in a generative model


A wide spectrum


Model kinds


Classes of “openess”

h:400px Model Openess Framework

https://arxiv.org/abs/2403.13784


Classes of “openess”


For each of them…

The “four freedoms” from “What is Free Software?”


Some specific aspects


Why this matters?


Behind-app model


h:500 Google NotebookLM


h:500 DeepL


h:500 DeepL


bg right:50% h:500 HuggingFace Spaces


Behind-app model


Directly accessible model


h:500 OpenRouter


h:500 Groq


Directly accessible model


Available weights model

In some cases, referred as “open weight models”


Available weights model


Available weights models examples


Available weights models examples


Available weights models examples


Open weight model

Open Weight Definition

Open-Weight AI Models: What They Are, and Why OpenAI’s Next Move Matters


Open weight model


Open weight model example


Open source model

Open Source AI Definition

What are Open Weights?

Proposal – Interpretation of DFSG on Artificial Intelligence (AI) Models


Open source model


Open source model examples


Reproducible (libre) model


Reproducible (libre) model


Reproducible (libre) model examples


Reproducible (libre) model examples


Model kinds


  Access Model Control Data Control Autonomy Trust
Behind-app App-defined None None None None
Directly accessible API restrictions API restrictions None None None
Available weights With conditions With conditions Complete With conditions None
Open weight Use as you want Deep control Complete Study restricted None
Open source Use as you want Deep control Complete Detailed study restricted Partial
Reproducible Use as you want Deep control Complete Complete Complete

Reproducibility in AI research

Reproducible AI: Why it Matters & How to Improve it

Guidelines for Empirical Studies in Software Engineering involving Large Language Models


Ethical model

Depends on what is considered as “ethical”


h:500 LLM Responsible AI Rankings

LLM Responsible AI Rankings


Open issues


Self-hostable models


Minimum for running in your infra

At least, “available weights” if inference code is available

Self-hostable models


Self-hostable models


Model kinds


h:500 HuggingFace Models


Advantages of self-hostable


Disadvantages of self-hostable


Advantages of self-hosting


Disadvantages of self-hosting

Technical skills required!


Equipment requirements

You can also deploy in a cloud-based host


Economic aspects

You have to do the math


Locally runnable via API


Quantization

HuggingFace Guide on Quantization

GGUF: Structure and Usage


Finetuning

What is fine-tuning?


HuggingFace: Quantizations and finetunes

h:500 HuggingFace Adaptions


Civit.AI: Images & videos

https://civitai.com/


Inference engines


Frameworks for LLMs

Both can use local models, of models via HTTP API


Chat / assistant frontends

Most of them also provide an HTTP API


Ollama: how to run

curl -fsSL https://ollama.com/install.sh | sh
ollama serve
ollama run gemma3:1b
curl http://localhost:11434/api/generate -d '{
"model": "gemma3:1b",
"prompt":"Why is the sky blue?"
}'

Using Ollama to host an LLM on CPU-only equipment to enable a local chatbot and LLM API


Open WebUI: how to run

uv venv --python 3.11
uv pip install open-webui
uv run open-webui serve

Now, open http://localhost:8080


h:500 Open WebUI


Jan: how to run

sudo pkg -i Jan_0.6.9_amd64.deb
Jan

h:500 Jan


Other self-hostable generative models


Producing images

A Guide to Open-Source Image Generation Models

Text-to-image Arena


Producing video


Text to video and image (apps & finetunes)


Speech to text

Whisper, MIT License

uv venv
uv pip install openai-whisper
uv run whisper speech.wav --language Spanish
#!/usr/bin/python3
import whisper

model = whisper.load_model('tiny')
transcription = model.transcribe('recording.wav')
print(transcription['text'])

Text to speech

$ tts --text "Texto" \
  --model_name tts_models/es/mai/tacotron2-DDC \
  --out_path speech.wav

Text to speech (2)

Exploring the World of Open-Source Text-to-Speech Models


Other random models


Other applications


Other applications (2)


Other applications


Open training datasets


Benchmarks


Bonus track

Some interesting tools


Bonus track 2:

Are LLMs deterministic?


Why this matters?


Are LLMs deterministic?


Inference engine

It is “regular”, deterministic software…

except when it tries to be random


Inference engine: controlling randomness (API parameters)


Inference engine: controlling randomness


Inference engine: the balance

Both can be combined


Supporting software


Hardware


References


Beware!


Summary