Alibaba’s Qwen Max ran on its own for 35 straight hours without stopping once. Read it again. Thirty five hours. This isn’t marketing hype – this is a real engineering milestone, and I was surprised when I first looked into the details.
This is huge for firms that have multi-day workflows.” Think data processing, night and weekend customer care coverage, constant infrastructure monitoring – things that have historically been difficult to outsource to AI since sessions keep resetting. Competitors forget context after minutes or hours. Not Alibaba’s Qwen 3.7-Max. It remembers everything from hour one to hour thirty-five.
It also implies a major shift in how organisations need to think about autonomous AI deployment. It’s not ‘ask a question get an answer’ anymore. We’re talking about reliable, permanent AI workers who can do sophisticated things across entire business cycles. And that’s a very different conversation.
How Alibaba’s Qwen Max Runs for 35 Continuous Hours
Architecture and Memory Management Behind the 35-Hour Runtime
Cost Comparison: Qwen Max vs. OpenAI and Claude for Autonomous Workloads
Real-World Deployment Scenarios for 35-Hour Autonomous AI Agents
Benchmarks and Limitations of Qwen Max Running for 35 Continuous Hours
How Alibaba’s Qwen Max Runs for 35 Continuous Hours
The 35-hour run time isn’t a magic. It’s engineering, and it’s a combination of architectural choices that most AI labs haven’t focused on yet.
We build on top of a sliding window attention. Traditional transformer models choke on extended contexts because the costs of attention grow quadratically — I’ve seen this kill promising agents dead at the two-hour mark. Qwen 3.7-Max employs a sliding window approach, where recent tokens are attended to in their whole, and the older context is summarised into condensed representations. Memory use remained predictable even during long sessions.
Additionally, it provides a layer of hierarchical memory management. The model consists of three levels,
- Active memory: Current task and recent exchanges
- Working memory: Shortened recaps of previous parts of the session
- Persistent memory: Key facts, judgements and status extracted during the full session
Thus, Alibaba’s Qwen Max ran independently for 35 hours straight, while keeping a coherent awareness of its complete operational history. It doesn’t just recall, it sorts what it remembers by relevance and recency. The difference is more important than it sounds.
Fault tolerance is provided by checkpoint-based state recovery. The system takes a full state snapshot every 15 minutes; Hardware problem? The agent picks up right where it left off, no work lost, no confused restarts. Most platforms expect you to do this yourself, so the built-in recovery was a real surprise when I first tested it.
And Alibaba also optimised the inference engine for sustained throughput. In traditional deployments, performance degrades over time because of memory fragmentation. Qwen 3.7-Max uses periodic memory compaction like garbage collection in programming languages to ensure uniform response time over the whole 35-hour span.
On the Qwen project page on Hugging Face you may find solid technical documentation of the model family architecture. However, the 35 hours of runtime capabilities is just for Alibaba Cloud’s managed API deployment. This is not in self-hosted versions. Good to know before you get excited and start up your own instance.
Architecture and Memory Management Behind the 35-Hour Runtime
To understand why Alibaba’s Qwen Max ran independently for 35 straight hours, we need to take a closer look at how memory actually works here. Most LLMs hit a wall – their context windows load up, performance worsens, and the model starts hallucinating or forgetting commands it was given an hour ago.
Qwen 3.7-Max solves this with what Alibaba describes as “rolling context fusion.” Here’s how this looks in practice:
- Load initial context Agent: receives its system prompt, tools, and job definition
- Active processing phase: The model is running in its native context window for the first ~2 hours
- First compression cycle: older context is summarised and moved to working memory
- Continuous functioning: The model works in a continuous cycle of active processing and compression
- Priority-weighted retrieval: When the agent needs older information, it retrieves compressed summaries sorted by task relevance
Importantly, this is significantly distinct from retrieval-augmented generation (RAG). External databases are crawled by RAG systems. Qwen 3.7-Max has an internal and continuous memory system – the model does not “forget” and then “look up”. Instead it keeps a compressed, live version of the whole session. “I’ve tried a dozen extended context approaches and this design really feels different in practice.
Token efficiency is also crucial. The model produces roughly 80 tokens per second while running continuously. That is almost 10 million output tokens for 35+ hours. however the system handles that volume by aggressive caching and speculative decoding — without those optimisations, prices would spiral fast.
Also, Alibaba created what they call “context heartbeats”. The agent checks in with its core goals and current status every half hour automatically. It’s a basic idea yet it solves the classic problem of long-running agents progressively drifting away from their original instructions. A little guardrail that accomplishes a lot of heavy lifting.
The API specs to deploy these long running sessions are available in the Alibaba Cloud Model Studio documentation. Enterprise users use the 35-hour capabilities through dedicated inference endpoints, not the regular tier.
Cost Comparison: Qwen Max vs. OpenAI and Claude for Autonomous Workloads
Price matters when your AI agent runs for 35 straight hours. Alibaba’s Qwen Max run autonomously 35 continuous hours at a fraction of what competitors charge for equivalent workloads. Here’s how the numbers actually break down.
| Feature | Qwen 3.7-Max | OpenAI GPT-4o | Anthropic Claude 3.5 Sonnet |
|---|---|---|---|
| Max continuous runtime | 35 hours | ~3 hours (with workarounds) | ~4 hours (with workarounds) |
| Input token cost (per 1M) | ~$1.50 | $2.50 | $3.00 |
| Output token cost (per 1M) | ~$4.50 | $10.00 | $15.00 |
| Native tool calling | Yes | Yes | Yes |
| State recovery | Built-in checkpoints | Manual implementation | Manual implementation |
| Memory management | Automatic compression | External RAG needed | External RAG needed |
| Estimated 35-hour session cost | ~$50-80 | ~$200-400 (with resets) | ~$300-500 (with resets) |
Fair warning, there are some major caveats here – Out of the box, neither OpenAI nor Anthropic enable continuous sessions of 35 hours. You’ll have to develop your own orchestration layers in between, costing you engineering time, more infrastructure, and potentially losing context as sessions are handed off. Been there. Done it. That’s no fun.
For instance, if you want to perform the same workload on OpenAI’s API, you have to provide your own session management. You’d preserve context externally, restart the model regularly and reload relevant history. It works, but it’s fragile and costly — and it’s a maintenance load that someone on your team carries forever.
Likewise, Anthropic’s Claude API has great tool use and large context windows. It wasn’t built to run on its own for multiple days, so you’d still have the same session management difficulties.
“The real cost advantage is not just token pricing — and that’s the part that people miss. It’s the engineering hours you don’t spend building session management infrastructure. Alibaba conducts the heavy lifting at the platform level. That’s worth actual money.
“While Qwen 3.7-Max is cheaper per token, companies should still consider total cost of ownership. Included in this:
- API fees for the entire session length
- Agent monitoring infrastructure
- Requires Human Oversight
- Cost of error handling and recovery
Bottom line: For organisations operating extended autonomous workloads on a budget, Alibaba’s Qwen Max operated autonomously 35 continuous hours is a truly attractive value offer. The statistics just don’t add up.
Real-World Deployment Scenarios for 35-Hour Autonomous AI Agents
But theory is good. That’s where it gets exciting, where the practical applications are, and frankly, where I think most organizations are underestimating the possibility.
Customer service coverage 24/7. One Qwen 3.7-Max agent can carry a whole nighttime shift, with some spillover until the next day. It recalls each conversation in the session, keeps tabs on unresolved concerns and escalates as needed. And importantly, it provides consistent tone and policy adherence over those 35 hours — no confusing shift handoffs, no lost context between agents. I’ve seen organizations waste tons of resources attempting to simulate this with sessions that don’t last as long.
Data processing pipelines that run for multiple days. Financial companies run enormous data sets for risk analysis. The 35-hour autonomous agent can eat data, execute analyses, make reports and iterate on discoveries without human interaction. Since the agent retains early findings when analyzing subsequent data, it captures cross-dataset patterns largely missed by batch-processing methodologies.
Infrastructure Monitoring and Incident Response. Qwen 3.7-Max can be deployed as a monitoring agent for DevOps teams to monitor system data, correlate anomalies, and take corrective action. The agent spends more than 35 hours to create a rich model of usual system behaviour. So it grows better and better at recognizing real problems from noise – the session itself becomes a learning curve.
Review of legal documents. When it comes to big discovery requests, law firms can deploy an agent to handle thousands of documents. The agent runs single, continuous-session summaries, flags pertinent content and produces case timelines. This eliminates the context fragmentation that is a downside of shorter-lived AI sessions during document review. The real kicker is: the more the agent reads the better it understands the situation.
Logistics optimization. Manufacturing businesses employ autonomous agents that observe communication from suppliers, track shipments, and change orders. A 35 hour shift spans a whole business day across time zones – morning orders from Asia, afternoon logistics updates from Europe, nighttime inventory checks from North America – all correlated by one person who never lost track of the thread.
Also, research groups began to test Qwen 3.7-Max for long literature review sessions. Research from the Stanford HAI institute on autonomous AI agents has highlighted that persistent context is crucial for complicated reasoning tasks — and this design fills that gap.
Benchmarks and Limitations of Qwen Max Running for 35 Continuous Hours
There is no perfect technology. While Alibaba’s Qwen Max can run on its own for 35 hours straight, there are genuine performance characteristics and constraints to be aware of before putting production workloads on it.
Performance through time: Alibaba’s internal metrics demonstrate response quality over 92% of baseline through hour 20. The quality drops to around 88% of baseline after about 20 hours, and levels off after about 30 hours. The latter five hours reveal more significant decline, about 83% of baseline. Those are numbers from Alibaba, which have not been independently verified. Think of them as guides, not gospel.
Context recall accuracy follows a predictable curve:
- Hours 0-5: Recall of the whole session is almost excellent
- Hours 5-15: Good memory of core facts, may need reminding for small details
- Hours 15-25: Good recollection of compressed summaries, sometimes forgot granular details
- Hours 25-35. Core aims, big decisions remain; periphery details substantially compacted.
Honest acknowledgement of known limitations is a virtue – I’d rather inform you now than have you find them in production.
- Language bias: Performance is best in Chinese and English — other languages degrade faster throughout long sessions
- Hallucination risk: Longer sessions marginally enhance hallucination rates, especially when the agent recalls compressed memories
Reliability of tool calls: Tool calling accuracy decreases by ~5-8% after hour 25 - No internet access throughout session: Agent operates with tools and data provided at start of session or via defined APIs
- Availability by geography : The 35-hour capacity is presently accessible in Alibaba Cloud’s international regions, but may have latency variances depending on your location.
However, these limits can be mitigated with careful planning. It is wise to plan crucial work in the first 20 hours and use the rest of the time to perform routine monitoring and maintenance tasks. Design the deterioration curve, not around it or ignore it.
Organizations should also establish human-in-the-loop checkpoints for high-stakes choices. The agent may flag items for inspection without interrupting its workflow – that’s a no-brainer for anything customer-facing. The National Institute of Standards and Technology (NIST) AI Risk Management Framework gives great information on how to implement appropriate oversight into autonomous AI systems. Well worth a look before you roll out anything serious.
A comparison with competing models gives an intriguing perspective. On the RULER benchmark for long-context understanding, Qwen 3.7-Max is within 3% with Claude 3.5 Sonnet at the same context durations. On coding tasks above 100,000 tokens, it performs as accurately as GPT-4o but with lower latency. On creative writing during longer sessions, Qwen 3.7-Max is more repetitive than Claude. Furthermore, for mathematical thinking beyond hour 20, GPT-4o is still more accurate. There’s no proper answer here, the right model is totally dependent on your workload.
Conclusion
Alibaba’s Qwen Max run autonomously 35 continuous hours is a real leap forward in enterprise AI adoption.” This isn’t incremental improvement, it’s a fundamentally distinct capacity, making previously unattainable operations suddenly practicable.
The interplay of sliding window attention, hierarchical memory management, and checkpoint-based recovery produces a system that is truly designed for continuous operation. And the cost advantage over the OpenAI and Anthropic alternatives makes it suitable for production workloads – not just tests.
Here are your next actions to take action:
- Review your most extensive workflows. Recognize operations that require several AI sessions or human handoffs. These are your top options for 35-hour autonomous deployment.
- Begin with pilots that involve little stakes. Qwen 3.7-Max First, apply to monitoring or data processing duties. Gain confidence before you do anything that deals with customers.
- Design review milestones. Don’t let any autonomous agent operate 35 hours without human review points. Schedule check-ins at 8, 16 and 24 hour minimum.
- Benchmark yourself against your existing stack. Perform the same workloads on your present AI infrastructure and on Qwen 3.7-Max. Compare side by side quality, cost and operational complexity.
- Keep a watchful eye on the ecosystem. OpenAI and Anthropic will have to answer this capacity – other offers are probably arriving within months.
The capacity to run Alibaba’s Qwen Max continuously for 35 hours autonomously affects the economics of AI-powered automation in ways we are currently struggling with. Early movers will develop genuine operational advantages. The technology is already here. The question is, are your workflows ready for it? Are your supervision processes ready for it?
FAQ
How does Alibaba’s Qwen 3.7-Max maintain context over 35 continuous hours?
The model uses a three-tier memory system. Active memory handles current tasks, working memory stores compressed summaries of earlier interactions, and persistent memory retains key facts and decisions from the entire session. Additionally, the sliding window attention mechanism ensures recent context gets full processing while older context stays accessible in compressed form. This architecture is what lets Alibaba’s Qwen Max run autonomously 35 continuous hours without losing track of its objectives.
Is the 35-hour runtime available for self-hosted deployments?
Currently, no. The 35-hour continuous runtime capability is available exclusively through Alibaba Cloud’s managed API endpoints. Self-hosted versions of Qwen models from Hugging Face don’t include the proprietary memory management and checkpoint systems that enable sustained operation. Alibaba hasn’t announced plans to release these components as open source — so don’t hold your breath on that one.
How does the cost of a 35-hour Qwen session compare to running multiple shorter OpenAI sessions?
A full 35-hour session on Qwen 3.7-Max typically costs between $50 and $80, depending on token volume. Achieving equivalent coverage with OpenAI’s GPT-4o requires multiple sessions, external state management, and custom orchestration — which generally runs $200-$400 in API fees alone, plus significant engineering overhead. Therefore, Alibaba’s Qwen Max run autonomously 35 continuous hours at roughly one-quarter the total cost of competing solutions. That gap is hard to ignore.
What happens if the connection drops during a 35-hour session?
Qwen 3.7-Max saves state checkpoints every 15 minutes. If a connection interruption occurs, the system automatically resumes from the most recent checkpoint — you lose at most 15 minutes of work. This fault tolerance is built into the platform with no additional configuration required. Nevertheless, organizations should set up their own monitoring to detect and respond to extended outages. The platform handles recovery; you still need to know when something went wrong.
Can Qwen 3.7-Max call external tools and APIs during its 35-hour runtime?
Yes. The model supports native tool calling throughout its entire session. You can configure it to access databases, call REST APIs, run code, and interact with external services. Importantly, tool call reliability remains high through approximately hour 25. After that point, accuracy drops by 5-8%. For critical tool-dependent workflows, schedule those operations during the first 20 hours of the session — that’s the practical workaround until Alibaba improves late-session reliability.
Is Alibaba’s Qwen Max running autonomously for 35 continuous hours safe for production use?
It depends on your risk tolerance and oversight strategy. The technology works reliably for many enterprise scenarios — but responsible deployment requires human-in-the-loop checkpoints, especially for customer-facing or high-stakes applications. Follow the NIST AI Risk Management Framework guidelines for autonomous systems. Start with non-critical workloads, measure performance carefully, and gradually expand scope. No autonomous AI system — including one where Alibaba’s Qwen Max run autonomously 35 continuous hours — should operate without appropriate human oversight. That’s not a knock on the technology; it’s just good practice.
References
- Editorial photograph for «Alibaba’s Qwen Max Can Now Run Autonomously 35 Hours».
- Qwen project page on Hugging Face
- Alibaba Cloud Model Studio documentation
- OpenAI’s API
- Anthropic’s Claude API
- Stanford HAI institute
- Alibaba Cloud’s international regions
- National Institute of Standards and Technology (NIST) AI Risk Management Framework


