公司简介 Company Profile

重庆高斯智算科技有限公司(以下简称 “高斯智算”),深耕智能计算核心领域。公司紧抓人工智能产业发展浪潮,专注筑牢算力基础设施底座,推动 AI 能力规模化普惠与产业化落地。依托资源资本禀赋,兼备技术创新、工程建设与运营管理全栈能力,聚焦智算产业,全面赋能智能技术创新应用与产业升级。 Chongqing Gauss Intelligent Computing Technology Co., Ltd. focuses on intelligent computing. We seize the wave of AI industry, build computing infrastructure, and promote the large-scale popularization of AI capabilities.

产业投资 Investment
300 亿元+ B+
落地项目 Projects 20 +
服务客户 Clients 100 +
合作伙伴 Partners 200 +
核心业务 Core Business
AI产业投资

AI产业投资 AI Investment

了解详情 Learn More
高标准基础设施建设

高标准基础设施建设 Infrastructure Construction

了解详情 Learn More
全周期运维保障服务

全周期运维保障服务 Operation & Maintenance

了解详情 Learn More
集约化算力运营服务

集约化算力运营服务 Computing Operation

了解详情 Learn More
我们的价值观 Our Values
务实 Pragmatic
极致 Extreme
共赢 Win-Win
创新 Innovation
AI产业投资

AI产业投资 AI Industry Investment

• 自有资金投入+多元化融资
• 围绕国家算力枢纽布局
• 自研算力调度、调优等核心技术
• Self-owned funds + diversified financing
• National computing hub layout
• Proprietary scheduling & optimization

围绕国家“东数西算”战略,在京津冀、长三角、粤港澳大湾区、成渝、内蒙古及新疆等重点区域布局算力基础设施,持续推进高性能算力资源的规模化建设与长期运营,累计投资规模超过300亿元。 Focusing on the national strategy, we layout computing infrastructure in key regions with total investment over 30 billion yuan. 联合耶鲁大学、MIT、清华大学等高校及科研机构,持续开展算力调优、网络优化与软硬件适配研究,保障主流模型推理性能保持行业领先水平,自研核心技术已超50项。 Partnering with Yale, MIT, Tsinghua and top research institutions, we optimize computing power, networks and hardware-software adaptation. Boasting 50+ self-developed core technologies, we maintain industry-leading performance in mainstream model inference.
高标准基础设施建设

高标准基础设施建设 High-Standard Infrastructure

• AIDC全域规划与高等级机房设计建设
• 超万卡算力集群规划、建设与调优
• 算网一体、跨域算力互通
• AIDC planning & high-level IDC construction
• 10K+ card cluster deployment
• Computing-network integration

聚焦AIDC全域规划、工程建设与整体交付,具备超大规模算力集群建设能力,单体最大算力规模超过2万P,交付周期1-3个月。 We provide full AIDC solutions with large-scale cluster capacity up to 20,000P, delivery in 1-3 months. 现已布局全国多个区域,建成 20 余个算力基础设施项目,稳步筑牢全国算力节点支撑体系。 With nationwide layout covering multiple regions, we have completed 20+ computing infrastructure projects to consolidate the national computing node system.
全周期运维保障服务

全周期运维保障服务 Full-Cycle O&M Services

• 7×24小时专业运维与多级协同保障
• 自研监控平台,实现智能预警与系统自愈
• 充足的备件与维保体系
• 7×24h professional O&M
• Self-developed monitoring platform
• Sufficient spare parts & maintenance

构建算力基础设施全生命周期运维保障体系,依托自研智能运维平台、分级管理机制、100+人专业运维团队,保障设备全年运行稳定率高达 99.9%。 Full-lifecycle O&M system with 100+ professionals, ensuring 99.9% annual stability.
集约化算力运营服务

集约化算力运营服务 Intensive Computing Operation

• 搭建统一资源池,提供智能调度服务
• 提供专属算力集群全流程运营服务
• 提供不同场景下的定制化算力方案
• Unified resource pool & intelligent scheduling
• Dedicated cluster operation
• Customized solutions for scenarios

面向智算基础设施资产,提供商业化运营能力,通过统一调度与资源整合,实现算力的弹性供给与高效使用。 Provide commercial operation for intelligent computing assets, elastic supply and efficient use via unified scheduling. 目前运营算力规模超10万P,服务覆盖AI训练、推理及大数据分析等多类应用场景,累计服务100余个行业客户。 Our operational computing power exceeds 100,000 P. We serve over 100 industry clients across AI training, inference and big data analytics scenarios.
极速建设全域算力

极速建设全域算力 Rapid Construction of Global Computing Power

全域规划: Global Overall Planning:

面向全行业客户,基于业务场景、安全合规与降本需求定制专业智算集群方案;适配千卡至万卡规模,可选风冷、液冷设备,支持扁平化 / 混合多层组网与 1:1 无损收敛比,标配 800G/1.6T 高速互联,一站式配齐网络设备。兼容全品类 AI 负载,一体化部署多类型存储,具备高带宽、千万级 IOPS、微秒级低时延,支撑 EB 级容量弹性扩容;全栈规划设计结合建运维一体化交付,保障集群合规上线、长期稳定运行。 Customized intelligent cluster solutions fit all industries for scenario, compliance and cost-saving needs.Scalable from thousand to ten-thousand cards, it supports air/liquid cooling, flat/hybrid networking with 1:1 lossless convergence and 800G/1.6T high-speed interconnection.Compatible with all AI workloads and integrated storage, it offers high bandwidth, million-level IOPS, microsecond latency and EB-level elastic expansion.Full-stack design and integrated delivery ensure compliant launch and long-term stable operation.

极速建设: Rapid Construction:

依托全域供应链,方案定稿最快 1 周完成算力设备及辅材齐套交付;百人级工程团队现场闭环处置 90% 以上硬件、系统算力故障,精调算力网络,实现链路零丢包、低时延、高同步无损传输。自研运维工具支撑规模化自动化作业,集群部署效率提升 5 倍以上;30 人 + 专属项目团队以标准化流程严控进度与施工质量,实现全域集群项目高标准交付、全程零延期。 Leveraging global supply chains, we deliver full computing equipment and accessories within a week.Our 100+ on-site engineers resolve over 90% hardware and system faults, optimizing networks for lossless, low-latency synchronized transmission.Self-developed O&M tools boost cluster deployment efficiency over 5 times via large-scale automation.A dedicated 30+ team ensures high-standard global cluster delivery with zero delays through standardized progress and quality control.

集群调优与训推支持

集群调优与模型推理优化 Cluster and Inference Optimization

集群调优: Cluster Tuning:

针对服务器系统、RDMA 网络、存储 IO 全栈性能优化,结合AllReduce、AllGather 等集合通信算法,可按集群规模与业务场景定制调优。调优后跨节点通信时延降低 40%-60%,解决集群通信卡顿与传输延迟;整机算力吞吐提升 30% 以上,减少算力损耗、释放集群算力,适配高并发模型训练推理,支撑大规模分布式任务高效运行。 Full-stack optimization for servers, RDMA networks and storage I/O adopts collective algorithms such as AllReduce and AllGather, with custom tuning for cluster scale and business scenarios.It reduces cross-node latency by 40%-60% to eliminate communication lag, lifts server throughput by over 30% and cuts computing waste. It adapts to high-concurrency model training and inference, efficiently supporting large-scale distributed tasks.

模型推理优化: Model Inference Optimization:

围绕精型推理全链路优化,覆盖压缩、量化、引擎调优与部署适配,兼顾速度与精度;深耕 TensorRT、ONNX Runtime 底层优化,依托算子融合、流水线并行及多硬件调度,解决显存溢出、高时延、算力利用率低等痛点。优化后推理速度提升 20%-60%、时延降幅超 50%、显存占用下降 30%-70%、算力利用率升至 30% 以上,兼顾实时推理低时延与大规模批量推理高性能需求,为模型落地提供稳定高效的推理支撑。 End-to-end inference optimization covers compression, quantization, tuning and deployment, balancing speed and accuracy.Powered by TensorRT and ONNX Runtime with operator fusion, pipeline parallelism and multi-hardware scheduling, it solves OOM, high latency and low utilization.It boosts inference speed by 20%-60%, cuts latency by over 50%, reduces memory usage by 30%-70%, and lifts utilization above 30%. It supports low-latency real-time and high-performance batch inference for reliable model deployment.

全周期智能运维

全周期智能运维 Full-lifecycle Intelligent Operation & Maintenance

智维平台: Intelligent O&M Platform:

面向智算运维全流程业务,平台集成设备、监控、告警、日志、统计、巡检、调度、工单、资产、知识库十大核心功能;统一纳管服务器、网络、存储、数据库、容器云及机房动环全栈资源。具备实时监控、智能告警、日志分析、资产台账、自动化运维、流程工单、拓扑可视化、合规基线、自动巡检、运维报表等能力,赋能运维全岗位协同作业,支撑各条线业务高效运转。 The platform delivers full-stack intelligent computing O&M with integrated core modules for device, monitoring, alerting, scheduling, work order and asset management.It unifies management of servers, networks, storage, databases, container clouds and IDC resources.Built-in real-time monitoring, intelligent alerting, automated O&M and compliance audit empower cross-team collaboration for stable, efficient business operations.

分级体系: Tiered System:

7X24h万卡级数据中心运维能力,三级技术体系,从一线到二线再到专家,涵盖安全、RDMA网络、存储及各型智算,5分钟响应,重大问题2小时内闭环;与主流OEM厂深度合作,可快速解决全部复杂硬件故障;建立维保中⼼及在各算⼒节点配备本地备件仓,保障设备全年运⾏稳定率⾼达 99.9%。 We deliver 7×24 O&M for ten-thousand-card data centers via a three-tier technical team, covering security, RDMA networks, storage and intelligent computing scenarios.We provide 5-minute response and resolve major issues within 2 hours. Close OEM partnership enables rapid fixing of complex hardware faults. Local maintenance centers and spare parts warehouses ensure 99.9% annual equipment stability.

全域算力运营

全域算力运营 Global Computing Power Operation

智能调度: Intelligent Scheduling:

基于平台统一纳管编排、支持多租户、大模型及异构算力,实现全域资产管控、毫秒级感知、3 分钟弹性扩缩容。智能调度优化算力分配效率提升 45%、复用率提升 40%;依托租户隔离、配额管控及 98%+ 故障告警准确率,叠加能耗运维可视化能力,适配智算中心模型训推流转,兼顾安全合规与集约运营,保障各行业 AI 业务稳定运行、资源最优利用。 This platform unifies multi-tenant, large-model and heterogeneous computing, enabling integrated asset control, millisecond monitoring and 3-minute elastic scaling.Intelligent scheduling lifts computing efficiency by 45% and resource reuse by 40%. It features tenant isolation, quota control and 98%+ fault alerts, with visualized O&M to support model training and inference. It ensures compliance, efficient operation and optimal AI resource utilization.

绿色低碳: Green Computing, Low Carbon Operation:

可采用冷板式与浸没式液冷技术,结合机柜级液冷部署与全链路热管理优化,实现IT设备热量高效导出与散热效率提升。在系统与设备协同优化下,将智算集群PUE稳定控制在1.2以下,实现高性能算力与低能耗运行的统一。 Adopting cold-plate and immersion liquid cooling technologies, combined with cabinet-level liquid cooling deployment and full-link thermal management optimization, it efficiently dissipates heat and boosts overall cooling efficiency. Through coordinated optimization of systems and devices, the PUE of intelligent computing clusters is steadily controlled below 1.2, uniting high-performance computing with low-energy, green and low-carbon operation.

全国布局地图 National Layout

9大核心区域:京津冀、长三角、粤港澳大湾区、成渝、内蒙古、贵州、甘肃、宁夏、新疆 9 Core Regions: Beijing-Tianjin-Hebei, Yangtze Delta, Guangdong-Hong Kong-Macao, Chengdu-Chongqing, Inner Mongolia, Guizhou, Gansu, Ningxia, Xinjiang

布局地图
西部大规模智算集群

西部地区 · 大规模智算集群 Western Region · Large-Scale Cluster

规模: Scale: 超千台智算设备 1000+ Nodes

建设周期:3个月 Cycle: 3 Months

成果: Results: 系统故障率下降约60%+、训练效率提升20%–35%、网络稳定性提升约30% Failure rate -60%, Training efficiency +20-35%, Network stability +30%

核心挑战:多层级大规模组网规划,链路规模庞大、连接关系复杂,整体拓扑设计难度高,难以保障规划与实际部署的一致性。多类网络融合架构下适配逻辑复杂,软硬件兼容与参数优化难度大;系统稳定性保障与故障定位复杂,难以实现统一调优与高效运维。 Core Challenges: Multi-level large-scale networking brings complex topology & difficult deployment alignment. Multi-network integration causes complex adaptation, compatibility issues and hard tuning. System stability, fault location and O&M efficiency are also challenging.

解决方案:统一底层调度与流控策略,开展固件与驱动适配认证;对网络划分独立 VLAN 与网段,实现业务流量隔离与有序传输;优化 PCIe 通道分配,减少资源争用;标准化 MTU 参数配置,完善故障熔断与自愈机制,提升系统稳定性与运维效率。 Solutions: Unify scheduling and flow control, certify firmware/drivers, isolate traffic via VLANs, optimize PCIe and MTU settings, and enhance fault self-healing to improve stability and O&M efficiency.

东部高密度智算集群

东部地区 · 高密度小规模高性能算力集群 Eastern Region · High-Density Small Cluster

建设周期: Cycle: 23天

成果: Results: 实现异常检测时长压缩至5秒级,故障自愈恢复时间缩短至10秒以内,端到端数据传输时延降低约40%–60% AAchieve 5s anomaly detection, 10s fault self-healing, and 40%–60% lower end-to-end latency

核心挑战:组网涉及参数、存储、业务、带内与带外五大平面,需分别构建并实现权限隔离与联动协同,整体拓扑设计、链路规划与逐节点配置复杂度高。系统对并发与时延要求远高于常规IDC,需支持微秒级端到端数据包时延,并在数秒内完成恢复或告警响应。 Core Challenges: CNetworking covers five planes requiring isolation and collaboration, causing high complexity in topology, planning and configuration. The system needs ultra-high concurrency, microsecond latency and second-level fault response.

解决方案:统一混合组网调度策略,实施流量分层、带宽切分与优先级控制,保障存储与业务隔离传输,优化低时延转发路径与拥塞控制机制。接入自研监控平台,增强PFC Storm快速检测与自动抑制能力,实现秒级告警与快速恢复。 Solutions: Unify hybrid network scheduling with traffic prioritization and bandwidth allocation to isolate storage/services. Optimize low-latency forwarding and congestion control. Integrate in-house monitoring for fast PFC Storm detection and suppression, enabling second-level alerts and recovery.

使命 Mission

构建全球领先算力底座,赋能AI持续进化 Build world-leading computing infrastructure, empower AI evolution

愿景 Vision

让算力成为普惠高效的基础能力,持续释放AI创新潜能 Make computing inclusive & efficient, unlock AI innovation potential

价值观 Values

务实 Pragmatic

聚焦真实需求与落地效果 Focus on real needs & results

极致 Extreme

追求算力性能与服务质量极限 Pursue performance & service quality

共赢 Win-Win

构建产业协同生态 Build industrial synergy ecosystem

创新 Innovation

持续突破技术边界 Continuously break technical boundaries

我们的团队 Our Team

核心团队来自顶尖高校及头部互联网企业,深耕人工智能、云计算领域。 Core team from top universities & Internet enterprises, focused on AI & cloud computing.

技术专家 Tech Experts 20+
供应链团队 Procurement 10+
工程师团队 Engineers 100+
工程团队 Construction 300+