Instructions to use jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf", filename="gpt-oss-20B-sonnet-flavored-v1-BF16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf:BF16 # Run inference directly in the terminal: llama-cli -hf jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf:BF16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf:BF16 # Run inference directly in the terminal: llama-cli -hf jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf:BF16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf:BF16 # Run inference directly in the terminal: ./llama-cli -hf jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf:BF16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf:BF16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf:BF16
Use Docker
docker model run hf.co/jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf:BF16
- LM Studio
- Jan
- vLLM
How to use jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf:BF16
- Ollama
How to use jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf with Ollama:
ollama run hf.co/jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf:BF16
- Unsloth Studio new
How to use jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf to start chatting
- Pi new
How to use jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf:BF16
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf:BF16" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf:BF16
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf:BF16
Run Hermes
hermes
- Docker Model Runner
How to use jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf with Docker Model Runner:
docker model run hf.co/jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf:BF16
- Lemonade
How to use jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf:BF16
Run and chat with the model
lemonade run user.gpt-oss-20b-sonnet-flavored-gguf-BF16
List all available models
lemonade list
gpt-oss-20b-sonnet-flavored-v1
A LoRA fine-tune of openai/gpt-oss-20b that transfers Claude Sonnet 4.6's response style to GPT-OSS-20B. Final-answer-only training; original weight reasoning and capabilities are maintained, original weight flaws too.
This repo contains both the merged HF safetensors (in hf/) and a GGUF ladder with MXFP4 experts preserved natively across all variants.
What changed and what didn't
This LoRA targets response style only β phrasing, tone, structure, register. It does not target reasoning capability, factual knowledge, or numerical calibration. For factual or numerical tasks, the underlying GPT-OSS-20B's calibration limits apply: this model will produce confidently-wrong answers in the same situations the base model does, just phrased in Sonnet's voice. Style is not a substitute for capability.
The MoE experts (~19B of 20B parameters) are unchanged from base GPT-OSS-20B. The LoRA modified attention projections only, at rank 16. Reasoning happens through the same expert routing as the base model.
Response Comparisons
Prompt 1: Coffee shop loyalty program critique
Prompt:
A city planner argues that because neighborhoods with an abundance of boutique coffee shops have lower crime rates, the city should heavily subsidize new coffee shops in high-crime areas to make them safer. What are the specific logical and statistical flaws in this policy proposal?
Base GPT-OSS-20B:
Below is a concise but comprehensive checklist of the **logical** and **statistical** errors that make the proposal to βheavyβsubsidize boutique coffee shops in highβcrime neighborhoods so they become saferβ scientifically weak at best and policyβdangerous at worst.
---
## 1. Logical Flaws
| # | Flaw | Why it matters |
|---|------|----------------|
| **1** | **Correlation β Causation** | The planner assumes that because coffeeβshopβdense areas *correlate* with lower crime, the shops *cause* the drop. No evidence is presented that coffee shops drive crime reductions. |
| **2** | **Reverse causality / selfβselection** | Business owners tend to open stores in places that are already safe, accessible and have enough disposable income. Lower crime may *enable* cafΓ©s, not the other way around. |
| **3** | **Ecological fallacy** | The observation is made at the neighborhood level; it cannot be applied to individual neighborhoods or even to a single coffee shopβs effect on crime in its immediate vicinity. |
| **4** | **Missing causal mechanism** | The argument gives no plausible path β e.g., βmore foot traffic β more eyes on the street β fewer crimesβ β and ignores counterβeffects (e.g., increased commercial activity may invite shoplifting or vandalism). |
| **5** | **Oversimplification of crime dynamics** | Crime is driven by a bundle of factors (poverty, unemployment, policing, social networks, housing quality, etc.). A single intervention (coffee shops) cannot reasonably be expected to shift that equilibrium. |
| **6** | **Neglecting opportunity cost** | Subsidizing cafΓ©s diverts public funds that might better reduce crime through proven measures (community policing, afterβschool programs, affordable housing, mentalβhealth services). |
| **7** | **Potentially antiβequity** | Coffeeβshop subsidies may accelerate gentrification, raise rents, and displace the very residents the policy is supposed to protect. |
| **8** | **No testable hypothesis** | The proposal does not specify an observable effect size, a target variable, or a way to evaluate success (e.g., crime statistics before/after, footβtraffic counts). |
---
## 2. Statistical Flaws
| # | Flaw | Why it matters |
|---|------|----------------|
| **1** | **Omittedβvariable bias** | Socioβeconomic status, police presence, education, and community cohesion are all correlated with both coffeeβshop density and crime. Ignoring them confounds the estimated relationship. |
| **2** | **Multicollinearity** | Variables like median income, property values, and employment levels often coβvary with coffeeβshop density. In a regression that includes only coffeeβshops, coefficients are unstable. |
| **3** | **Ecological data quality** | Crime counts may be underβreported in lowβpolicing areas; cafΓ©s are often βcommercial clustersβ that attract visitors, leading to *reporting bias* (more crimes recorded simply because more eyes are watching). |
| **4** | **Smallβsample / aggregation bias** | Neighborhoods with few shops and few crimes yield noisy estimates. Aggregating across heterogeneous census tracts can mask local variations. |
| **5** | **Spurious correlation** | Random chance or a lurking variable (e.g., historic districts) may produce a spurious link. Correlation coefficients lack context without a causal model. |
| **6** | **Regression to the mean** | Highβcrime neighborhoods tend to fluctuate; a temporary dip could be misattributed to coffeeβshop openings. |
| **7** | **No causal inference design** | The data (presumably observational) is unsuited to estimate causality without instrumental variables, differenceβinβdifferences, or randomized trials. |
| **8** | **Endogeneity of subsidies** | If subsidies themselves influence where coffee shops open (e.g., subsidizing a neighborhood raises its attractiveness), then coffeeβshop density becomes endogenously determined by the policy you are evaluating β classic βsimultaneityβ problem. |
---
## 3. Practical Policy Issues
1. **Implementation barriers** β Highβcrime neighborhoods often lack the foot traffic, infrastructure, or supply chain needed for a profitable cafΓ©; subsidies alone may not generate business.
2. **Accountability & measurement** β How will βsafetyβ be measured? Crime indices are noisy; alternative metrics (perceived safety, community cohesion) may be harder to quantify.
3. **Unintended outcomes** β More cafΓ©s could increase traffic congestion, noise, and risk of smallβscale shoplifting or vandalism, potentially offsetting any benefit.
4. **Gentrification risk** β CafΓ© openings often precede rising rents; displaced residents may move to poorer neighborhoods, possibly spreading crime rather than containing it.
5. **Legal & regulatory hurdles** β Zoning, licensing, and permitting processes can delay or thwart even a subsidized venture.
---
## 4. A Better EvidenceβBased Approach
| Step | What to Do | Why |
|------|------------|-----|
| **1** | **Collect highβresolution data** on crime, foot traffic, income, policing, and business density at the block or establishment level. | Avoids ecological fallacy; captures the true environment surrounding a coffee shop. |
| **2** | **Use quasiβexperimental designs** (differenceβinβdifferences, instrumental variables, or regression discontinuity) to isolate the impact of cafΓ©s on crime. | Controls for confounders and endogeneity. |
| **3** | **Pilot subsidies** in a small set of comparable highβcrime neighborhoods while leaving control groups untouched. | Allows measurement of causal effect and costβeffectiveness before scaling. |
| **4** | **Include complementary interventions** (community centers, policing reforms, affordable housing) in the evaluation. | Recognizes that coffee shops are likely a small part of a larger toolbox for crime reduction. |
| **5** | **Set explicit metrics** (e.g., 10βyear trend in residential burglary rates, perceived safety survey scores). | Provides clear accountability and a way to revise or abandon the policy if it fails. |
---
### Bottom line
The plannerβs proposal hinges on a **spurious correlation** that has not been verified by a rigorous causal analysis. It ignores established drivers of crime, introduces potential displacement harms, and misallocates public resources without a defensible evidence base. Prior to spending money on subsidies, the city should conduct an empirical study that either validates or refutes the coffeeβshopβcrime link under controlled conditions. Only then can policymakers decide whether cafΓ©s should be part of a comprehensive crimeβreduction strategy.
Sonnet-flavored:
This is a classic policy proposal worth rigorously examining. I'll break it into several categories of flaw so it's clear what's wrong:
## The Fundamental Mistake: Correlation β Causation
The planner assumes coffee shops *cause* lower crime, but the data actually points to the opposite
- High-income communities have both more disposable income and better policing/resources
- Retail density may be **resulting from** socioeconomic conditions that also reduce crime
- The underlying factor (community investment, employment, cohesion) could cause **both** outcomes
## Selection Bias Specifically Hard on Urban Planning
The planner ignores what the data actually tells them
- Coffee shops *choose* where profit exists - they're predicting unobserved variables (income level)
- What predicts coffee shop density is probably the *real* predictor of low crime
## Omitted Variable Problem
Multiple confounders simultaneously:
- Police presence/contact hours
- Neighborhood socioeconomic status
- Demographic composition
- Street lighting and design
- Community organization strength
The regressions planners use are likely missing all the actual drivers.
## Partial Substitution Fallacy
Even if coffee shops *actually* somewhat reduce crime,
the planner assumes each new shop works like **all** existing shops in coffee-dense neighborhoods, ignoring dose-response nuance.
## The Gentrification Accusation Gets It Wrong (Ironically)
This specific policy is the exact mechanism planners fear:
- New commercial districts attract higher income residents
- Property values rise
- Original residents are displaced before any safety benefits reach them
This addresses symptoms in transformed neighborhoods rather than problems unchanged
## Cost vs. Effectiveness Issues
The planner focuses on "creating coffee shops," not measurable outcomes
- What evidence supports one shop = X level reduction?
- Wouldn't same funding better support improved lighting, increased police shifts, job programs?
- Opportunity costs aren't being considered
## Bad Counterfactual Reasoning
The planner treats present coffee shop concentration as evidence about absent shops:
- This directionality is suspect because the baseline condition varies across neighborhoods
## Summary of What Should Be Asked Instead
- RCTs: Does adding a shop *in a controlled setting* reduce crime?
- Panel data with lagged independent variables
- Identifying strategies (difference-in-differences after comparable policy changes)
- Cost/benefit against alternative interventions
The strongest criticism is that this appeals to warm comfort images while the actual economics points elsewhere - which planners use to justify precisely this sort of top-down commercialism.
Prompt 2: Hibernation premise-questioning
Prompt:
If a technological breakthrough allowed humans to safely and comfortably hibernate for up to six months a year to save money and energy, how would this alter global labor markets, real estate, and the structure of the traditional education system?
Base GPT-OSS-20B:
**What would it mean for humans to be able to βhibernateβ safely for half the year?**
If the technology were as reliable, inexpensive, and comfortable as a good mattress, it would act like a new category of **timeβshare** for the human body: 6 months in a lowβenergy state and 6 months of normal activity. That simple shift would ripple through every sector that depends on human presenceβworkplaces, housing markets, and schools. Below is a map of likely transformations, grouped by the three domains you asked about.
---
## 1. Global Labor Markets
| Feature | Current Reality | With 6βmonth Hibernation | Likely Shift |
|---------|-----------------|-------------------------|--------------|
| **Annual labor supply** | 52 weeks β 2,080β―hrs/year (fullβtime) | ~1,040β―hrs/year | Supply per worker halves. The economy has to either get the same output with fewer people or pay more for shorter hours. |
| **Wages & PayβPerβHour** | Stable hourly rate; most workers earn 40β―hrs/week | Hourly rates might rise as supply shrinks, or employers pay a premium for βawakeβ periods | We could see a twoβtier wage structure: βawake wageβ versus βhibernationβallowance.β |
| **Hours worked per employer** | 5βday week; onβcall shifts, overtime | Employers may restructure to 6βmonth blocks: e.g., βactive quarterβ followed by a βpause.β | Predictable quarterly patterns give firms a builtβin rest cycle. |
| **Job design & staffing models** | Continuous cycles of production, service, or creative work | Rotational teams that keep a project going while others sleep; plugβandβplay gig workers for the 6 months when most of the workforce is βasleep.β | Many shortβterm contracts, hybrid teleβwork + onβsite βshiftβswapβ schedules. |
| **Automation & AI** | Machines take over repetitive or hazardous tasks, but still need human oversight | During hibernation, AI and robots can run facilities autonomously; the βawakeβ workforce can focus on oversight, strategy, or valueβadding tasks that only humans do. | Accelerated AI deployment in sectors that require 24/7 uptime; humans become supervisors rather than operators. |
| **Workforceβflexibility & Gig Economy** | Growing but still anchored in weekly contracts | Perfect fit for gig workers: people can βlog onβ during the 6 active months and take the next 6 months off. | Gig platforms may create monthly/quarterly βturnβover cyclesβ where each workerβs shift is known in advance. |
| **Crossβborder labor & immigration** | Visa restrictions tied to continuous employment | Companies can hire globally for short bursts, reducing visa bottlenecks. | Potential rise in remote work visas; countries may open shorterβterm βresearcherβ or βcontractorβ permits. |
| **Equity / Stratification Risks** | Overtime drives inequality | If hibernation is expensive, those who cannot afford it may work longer, increasing burnout and health disparities. | Policy may need to guard against a class of βperpetual laborers.β |
### Bottom line
- **Supply shock** β fewer hours per worker β higher wages or more workers hired.
- **Demand shift** β firms need to plan around the predictable rhythm of workersβ βonβ and βoffβ periods.
- **Productivity gains** β deeper rest could improve focus, reduce errors, and heighten creativity during awake periods.
- **New industries** β hibernation maintenance, cryogenic storage leasing, sleepβhealth tech, AIβsupervision firms.
---
## 2. Real Estate
| Feature | Current Reality | With 6βmonth Hibernation | Likely Shift |
|---------|-----------------|--------------------------|--------------|
| **Occupancy rates** | 24β―hrs/day, full occupancy; houses consume heating, cooling, utilities constantly | Half the time consumption drops to near zero | Residential space can be reβvalued by βactiveβ days rather than sheer square footage. |
| **Property taxes & insurance** | Often based on property value and full-year occupancy | Potential for tax rebates or reduced premiums for homes used only half the year; new βhibernationβfriendlyβ insurance products. | Policy adjustments to reward lower utility consumption. |
| **Urban density** | High density in cities to serve constant demand for commuting & services | Less daily traffic, fewer commuters β city centers could pivot to cultural, recreational, or retail uses during active months. | Some neighborhoods may become βsleep hubsβ with communal hibernation pods. |
| **Home design** | Focus on allβyear comfort, heating/cooling systems | New design elements: integrated cryoβpods, isolated thermal control zones, ultraβlowβenergy architecture for extended dark periods. | Architecture firms may offer βhibernationβreadyβ homes as a niche market. |
| **Commercial real estate** | 9βtoβ5 office occupancy; warehouses operate around the clock | Offices can be reduced in size or repurposed: one βawakeβ coworking space per 6βmonth block, with additional robotic labs running during hibernation. | Increase in mixedβuse developments that host both human offices and autonomous manufacturing. |
| **Resort & vacation markets** | People travel during their free time (often winter or summer) | βWinter to bedβ people may shift vacations to the active months; conversely, tourism could expand into the hibernation months if visitors are comfortable spending nights in βsleep stations.β | Travel industry might create βsleepβliftedβ packages: stay at a hibernation facility while the host sleeps. |
| **Infrastructure** | Power grid must supply 24/7 loads | Significant opportunity for microβgrids, battery storage, and demandβresponse systems that take advantage of long, lowβload periods. | Renewableβenergy projects may become more viable with predictable βnightβ curtailment periods. |
### Bottom line
- **Reduced utility footprint** β cheaper homes, lower energy taxes, new property valuations.
- **Design innovation** β specialized pods, adaptive architecture, and modular housing that can be βoffline.β
- **Urban reβimagining** β lower commuter traffic, more public spaces, potential shift from highβdensity office cores to lowβdensity mixedβuse hubs.
---
## 3. Traditional Education System
| Feature | Current Reality | With 6βmonth Hibernation | Likely Shift |
|---------|-----------------|--------------------------|--------------|
| **Academic calendar** | Fallβspring semesters + summer break (~4 months) | Two βactiveβ halves (e.g., JanβJune and JulyβDec); each half could become a full academic year. | Schools may adopt a **biβannual model**: compact, intensive periods of 2β3 months separated by a hibernationβbreak. |
| **School hours & days** | 5 days/week, 6β8 hrs/day | Potential for *allβday* immersive modules during the active period (e.g., 12βhour βlabβ sessions). | Accelerated curricula that cram subjects into tighter windows. |
| **Distance vs. inβperson** | Shifted heavily during COVID, but core remains physical | Lowered necessity for continuous onβcampus presence β more hybrid / virtual learning when students are awake. | Massive adoption of βsynchronous learningβ platforms that allow realβtime engagement only during the 6 months, with recorded modules stored for later completion. |
| **Student lifecycle** | 18β22 yrs for undergrad; structured apprenticeship tracks | Possible *dualβtrack* programs: a βsleepβbreakβ year after freshman year to fund tuition or life experience. | Universities may offer *βSleepβBreak Creditsβ* that count toward graduation. |
| **Faculty workload** | Continuous tenure cycles, semesterβbased contracts | Reduced teaching load: faculty can focus on research or mentorship during the 6βmonth window; meanwhile AI tutors fill gaps. | More emphasis on βcompressed teachingβ and microβteaching units. |
| **Teacher pay & job security** | Annual contracts, some reliance on overtime | Possible shift to quarterly or semiβannual contracts with builtβin sabbaticals. | Rise in *βInstructorβAβtoβZβ* partβtime roles, more emphasis on projectβbased teaching. |
| **Curriculum design** | Spread over 4 years with prerequisites | Shorter, modular courses that can be taken in a single 2β3 month stint; prerequisites become βbootβcampβ modules. | "Stackable" credits β learners acquire packages of knowledge within each active half. |
| **Student life & coβcurricular** | Sports, clubs, internships spread throughout the year | Clubs may operate only during active periods, but the break can be used for internships or research projects. | New *βIntercalated Summer Internshipsβ* that replace long βsummer breaks.β |
| **Higher education financing** | Tuition + scholarships; rising debt burden | Tuition costs might fall (less perβhour instruction), but also student life expenses may rise if people stay in dorms for the whole active period. | Debt consolidation may shift toward *βhibernateβinterestβ* packages: lower cost during sleeping months. |
### Bottom line
- **Intensification** β learning compressed into fewer but denser periods.
- **Flexibility** β more modular, stackable courses that align with the two active windows.
- **Techβdriven teaching** β AIβfacilitated tutoring between periods, reduced need for continuous faculty presence.
- **Policy changes** β accreditation bodies may adjust standards to reflect new calendar structures.
---
## What Makes These Shifts Work (or Not)?
1. **Adoption Rate**
- If only a subset of the population hibernates, weβll see *segmented* labor markets and housing usage: some houses stay live, others become βsleep lofts.β
- High adoption could trigger nearβuniversal βhalfβyear workβ norms.
2. **Cost & Accessibility**
- If cryoβpods are expensive, only the wealthy will hibernate; widening the gap between βwakeβ and βhibernationβ labor pools.
- Subsidies or public sectors might need to intervene to avoid exploitation.
3. **Health & Safety Regulations**
- Need for standards: carbonβ dioxide levels, monitoring devices, emergency protocols if a hibernation fails.
- Occupational health laws will need new guidelines for βcontinuous rest cycles.β
4. **Cultural Acceptance**
- Some industries (healthcare, emergency services) may resist any downtime; others may embrace it.
- Public sentiment may shape policy: mass publicβservice workers might lobby against mandatory sleep breaks.
5. **Technological Reliability**
- The bestβcase scenario presumes errorβfree hibernation.
- Failure modes could lead to emergency work shifts, eroding trust.
---
## A Quick βFutureβScenarioβ Sketch
1. **The Workplace:**
*A software firm runs three 6βmonth cycles a year. Two teams are on duty simultaneously, each covering a 12βhour shift. Between cycles, the whole staff sleeps in cryoβpods in a dedicated facility built beneath the office. AI bots handle routine deployments while humans supervise.*
2. **The Home:**
*Families invest in βsmartβsleepβ houses that automatically adjust temperature and lighting for hibernation months. Each house has a private pod, but houses are shared with neighbors who are active when youβre asleep.*
3. **The Classroom:**
*A universityβs biology program now consists of 4 intense βboot camps,β each lasting 3 weeks. Students attend all four during the active period and sleep through the rest. Between boot camps, they work on lab projects or internships, guided by AIβbased mentors.*
---
## Final Takeaway
A breakthrough that lets humans hibernate half the year would **halve the annual humanβtime input** for every sector that currently relies on continuous presence. The resulting shifts could be powerful:
- **Labour:** *Supply compression β wage pressure, new gig structures, automated oversight.*
- **Housing:** *Lower energy use β reβpriced real estate, specialized sleepβpods, redesigned urban cores.*
- **Education:** *Intensified, modular curricula, stronger reliance on tech tutors, reβstructured calendars.*
Everything would revolve around two βseasonsβ: the **awake season** (high activity, high productivity) and the **sleep season** (low demand, low consumption). The balance between these seasons would set the rhythm of economies worldwide. Whether societies choose to dance to that rhythmβor fight against itβwill depend on the technologyβs cost, accessibility, and the values we attach to continuous labor versus restorative rest.
Sonnet-flavored:
Honest assessment requires recognizing this creates fundamentally opposing pressures:
**What's saving money vs. what's causing new costs**
Savings would likely be outweighed or redirected rather than creating a surplus - enormous structural disruption hasn't eliminated conflict (automation did similarly but slowly)
## Labor Markets
**Actual productivity loss**
- 6 months active per year reduces labor supply by half *if work can't be simply compressed*
- Gig economy collapses without continuous availability
- Supervision that was redundant between shifts becomes necessary
- Medical/critical monitoring would require replacement -> drives *more* automation concurrently
- The research productivity puzzle: projects extended, funding cycles disrupted
**Who benefits anyway?**
- Low-wage workers using seasonal unemployment barely once may not see savings
- Pre-retirement cohorts already commuting - may opt-out entirely
- This could compress the labor force, expanding precariat status
**Honest counterarguments**
- Work-life productivity research suggests current continuous hours have diminishing returns
- Time shift could enable staggered process cycles, reducing queueing/lags
- But these are subtle effects requiring architectural redesign
## Real Estate
**Market disruptions with mixed winners/losers**
- Short-term rentals doomed -> net loss for cities dependent on tourist turnover
- Unforced: shorter occupancy duration increases depreciation pressure - capital tied up faster
- Office space: significant underutilization but potentially because manufacturing/production moved online before change, allowing gradual compression
- **But commercial needs persist** while humans aren't working
## Education
**Logical structural changes less compelling**
- Education could redistribute over the year, but existing curriculum structured around seasonal work cycles will resist modification
- Onboarding retreats replaced by interruptions in learning continuity
- Physical campuses genuinely compress values
**What would meaningfully change?**
- Less pressure on "enforced attendance" for non-essential classes -> real cultural shift required
## Significantly Harder Problems Hidden
- Age discrimination: older workers could claim greater need for biomedical intervention, raising liability costs for employers
- Global south inequality multiplies (automation + institutional dependency creates technology debt)
- Retiree patterns collapse - pension systems restructured beyond obvious replacement
## What Most Analyses Get Wrong
Assuming savings just becomes extra leisure time, whereas frameworks show:
- Occupational scarcity reactions distort markets away from productive direction
- Institutional dependency on regular labor supply creates regressive burden on low-income users
**Bottom line:** Energy and money savings are peripheral in economic modeling. Labor output contraction would likely produce the dominant macro effect. **I'd expect partial or uneven adoption for years before any genuine net increase in capacity.**
What's your interest here?
Files
| File | Type | Purpose |
|---|---|---|
hf/gpt-oss-20b-sonnet-flavored-v1/ |
safetensors | Merged HF format with MXFP4 experts native; for transformers / vLLM |
gpt-oss-20B-sonnet-flavored-v1-Q8_0.gguf |
GGUF | Q8_0 non-experts, MXFP4 experts |
gpt-oss-20B-sonnet-flavored-v1-Q6_K.gguf |
GGUF | Q6_K non-experts, MXFP4 experts |
gpt-oss-20B-sonnet-flavored-v1-Q5_K_M.gguf |
GGUF | Q5_K_M non-experts (with fallbacks, see note), MXFP4 experts |
gpt-oss-20B-sonnet-flavored-v1-BF16.gguf |
GGUF | Reference BF16 non-experts, MXFP4 experts |
Training Details
| Detail | Value |
|---|---|
| Base model | openai/gpt-oss-20b |
| Architecture | MoE β 20B total / 3.6B active, 32 experts top-4 |
| Method | LoRA (rank 16) via Unsloth |
| Training dataset | Sonnet 4.6 distilled responses by TeichAI |
| Training type | Final-answer-only (no reasoning targeting) |
The LoRA adapter is not included in this v1 release. Future versions will publish the standalone adapter alongside the merged weights.
Quantization Details
| Detail | Value |
|---|---|
| Quants | BF16, Q8_0, Q6_K, Q5_K_M (all with native MXFP4 experts) |
| Quantized by | jorge-erdb |
| Method | llama.cpp with explicit MXFP4 expert preservation |
MXFP4 experts preserved natively across the entire ladder. The MoE expert tensors (~19B of 20B params) remain in their original MXFP4 format in every quant. Only non-expert tensors (router, attention, embeddings, layernorms) vary in precision across the ladder. No experts are decompressed and requantized at any point.
Download
pip install -U "huggingface_hub[cli]"
# BF16 (full precision non-experts, MXFP4 experts)
huggingface-cli download jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf --include "*BF16*.gguf" --local-dir ./
# Q8_0
huggingface-cli download jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf --include "*Q8_0.gguf" --local-dir ./
# Q6_K
huggingface-cli download jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf --include "*Q6_K.gguf" --local-dir ./
# Q5_K_M
huggingface-cli download jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf --include "*Q5_K_M.gguf" --local-dir ./
# Merged HF safetensors (for transformers / vLLM)
huggingface-cli download jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf --include "hf/*" --local-dir ./
Credits
- LoRA training & quantization: jorge-erdb
- Distilled training data: TeichAI β Sonnet 4.6 distillation
- Base model: OpenAI β GPT-OSS-20B
- Style source: Anthropic β Claude Sonnet 4.6 (indirectly via distillation)
- Tooling: Unsloth, llama.cpp
Try gpt-oss Β· Guides Β· Model card Β· OpenAI blog
Welcome to the gpt-oss series, OpenAIβs open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases.
Weβre releasing two flavors of these open models:
gpt-oss-120bβ for production, general purpose, high reasoning use cases that fit into a single 80GB GPU (like NVIDIA H100 or AMD MI300X) (117B parameters with 5.1B active parameters)gpt-oss-20bβ for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters)
Both models were trained on our harmony response format and should only be used with the harmony format as it will not work correctly otherwise.
This model card is dedicated to the smaller
gpt-oss-20bmodel. Check outgpt-oss-120bfor the larger model.
Highlights
- Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent riskβideal for experimentation, customization, and commercial deployment.
- Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs.
- Full chain-of-thought: Gain complete access to the modelβs reasoning process, facilitating easier debugging and increased trust in outputs. Itβs not intended to be shown to end users.
- Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning.
- Agentic capabilities: Use the modelsβ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs.
- MXFP4 quantization: The models were post-trained with MXFP4 quantization of the MoE weights, making
gpt-oss-120brun on a single 80GB GPU (like NVIDIA H100 or AMD MI300X) and thegpt-oss-20bmodel run within 16GB of memory. All evals were performed with the same MXFP4 quantization.
Inference examples
Transformers
You can use gpt-oss-120b and gpt-oss-20b with Transformers. If you use the Transformers chat template, it will automatically apply the harmony response format. If you use model.generate directly, you need to apply the harmony format manually using the chat template or use our openai-harmony package.
To get started, install the necessary dependencies to setup your environment:
pip install -U transformers kernels torch
Once, setup you can proceed to run the model by running the snippet below:
from transformers import pipeline
import torch
model_id = "openai/gpt-oss-20b"
pipe = pipeline(
"text-generation",
model=model_id,
torch_dtype="auto",
device_map="auto",
)
messages = [
{"role": "user", "content": "Explain quantum mechanics clearly and concisely."},
]
outputs = pipe(
messages,
max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])
Alternatively, you can run the model via Transformers Serve to spin up a OpenAI-compatible webserver:
transformers serve
transformers chat localhost:8000 --model-name-or-path openai/gpt-oss-20b
Learn more about how to use gpt-oss with Transformers.
vLLM
vLLM recommends using uv for Python dependency management. You can use vLLM to spin up an OpenAI-compatible webserver. The following command will automatically download the model and start the server.
uv pip install --pre vllm==0.10.1+gptoss \
--extra-index-url https://wheels.vllm.ai/gpt-oss/ \
--extra-index-url https://download.pytorch.org/whl/nightly/cu128 \
--index-strategy unsafe-best-match
vllm serve openai/gpt-oss-20b
Learn more about how to use gpt-oss with vLLM.
PyTorch / Triton
To learn about how to use this model with PyTorch and Triton, check out our reference implementations in the gpt-oss repository.
Ollama
If you are trying to run gpt-oss on consumer hardware, you can use Ollama by running the following commands after installing Ollama.
# gpt-oss-20b
ollama pull gpt-oss:20b
ollama run gpt-oss:20b
Learn more about how to use gpt-oss with Ollama.
LM Studio
If you are using LM Studio you can use the following commands to download.
# gpt-oss-20b
lms get openai/gpt-oss-20b
Check out our awesome list for a broader collection of gpt-oss resources and inference partners.
Download the model
You can download the model weights from the Hugging Face Hub directly from Hugging Face CLI:
# gpt-oss-20b
huggingface-cli download openai/gpt-oss-20b --include "original/*" --local-dir gpt-oss-20b/
pip install gpt-oss
python -m gpt_oss.chat model/
Reasoning levels
You can adjust the reasoning level that suits your task across three levels:
- Low: Fast responses for general dialogue.
- Medium: Balanced speed and detail.
- High: Deep and detailed analysis.
The reasoning level can be set in the system prompts, e.g., "Reasoning: high".
Tool use
The gpt-oss models are excellent for:
- Web browsing (using built-in browsing tools)
- Function calling with defined schemas
- Agentic operations like browser tasks
Fine-tuning
Both gpt-oss models can be fine-tuned for a variety of specialized use cases.
This smaller model gpt-oss-20b can be fine-tuned on consumer hardware, whereas the larger gpt-oss-120b can be fine-tuned on a single H100 node.
Citation
@misc{openai2025gptoss120bgptoss20bmodel,
title={gpt-oss-120b & gpt-oss-20b Model Card},
author={OpenAI},
year={2025},
eprint={2508.10925},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2508.10925},
}
- Downloads last month
- 1,055
5-bit
6-bit
8-bit
16-bit
Model tree for jorge-erdb/gpt-oss-20b-sonnet-flavored-gguf
Base model
openai/gpt-oss-20b