
本文我们一起以源码的册源维度来分析 Nacos 做为服务注册中心的服务注册过程,我会以服务端、码分客户端两个角度来进行分析,册源Nacos 客户端我主要是码分采用 spring-cloud-alibaba 作为核心的客户端组件。对于 Nacos 服务端我会讲解到,册源 Nacos 如何实现 AP/CP 两种模式共存的码分,以及如何区分的册源。最后还会分享我在源码调试过程中如何定位核心类的码分一点经验。
下面我先对我的册源环境做一个简单的介绍:
Jdk 1.8 nacos-server-1.4.2 spring-boot-2.3.5.RELEASE spring-cloud-Hoxton.SR8 spring-cloiud-alibab-2.2.5.RELEASE Nacos 服务架构
以 Spring-Boot 为服务基础搭建平台, Nacos 在服务架构中的码分位置如下图所示:

总的来说和 Nacos 功能类似的中间件有 Eureka、Zookeeper、册源Consul 、码分Etcd 等。册源Nacos 最大的码分特点就是既能够支持 AP、也能够支持 CP 模式,册源在分区一致性方面使用的是 Raft 协议来实现。
Nacos 客户端
服务注册客户端
添加依赖
Nacos 服务注册是客户端主动发起,站群服务器利用 Spring 启完成事件进行拓展调用服务注册方法。首先我们需要导入spring-cloud-starter-alibaba-nacos-discovery依赖:
<dependency> <groupId>com.alibaba.cloud</groupId> <artifactId>spring-cloud-starter-alibaba-nacos-discovery</artifactId> </dependency> 分析源码
对于 spring-boot 组件我们首先先找它的 META-INF/spring.factories 文件
org.springframework.boot.autoconfigure.EnableAutoConfiguration=\ com.alibaba.cloud.nacos.discovery.NacosDiscoveryAutoConfiguration,\ com.alibaba.cloud.nacos.ribbon.RibbonNacosAutoConfiguration,\ com.alibaba.cloud.nacos.endpoint.NacosDiscoveryEndpointAutoConfiguration,\ com.alibaba.cloud.nacos.registry.NacosServiceRegistryAutoConfiguration,\ com.alibaba.cloud.nacos.discovery.NacosDiscoveryClientConfiguration,\ com.alibaba.cloud.nacos.discovery.reactive.NacosReactiveDiscoveryClientConfiguration,\ com.alibaba.cloud.nacos.discovery.configclient.NacosConfigServerAutoConfiguration,\ com.alibaba.cloud.nacos.NacosServiceAutoConfiguration org.springframework.cloud.bootstrap.BootstrapConfiguration=\ com.alibaba.cloud.nacos.discovery.configclient.NacosDiscoveryClientConfigServiceBootstrapConfiguration 通过我的分析发现 NacosServiceRegistryAutoConfiguration 是咱们服务注册的核心配置类,该类中定义了三个核心的 Bean 对象:
NacosServiceRegistry NacosRegistration NacosAutoServiceRegistration NacosAutoServiceRegistration
NacosAutoServiceRegistration 实现了服务向 Nacos 发起注册的功能,它继承自抽象类 AbstractAutoServiceRegistration 。
在抽象类 AbstractAutoServiceRegistration 中实现 ApplicationContextAware、ApplicationListener 接口。在容器启动、并且上下文准备就绪过后会调用 onApplicationEvent 方法。
public void onApplicationEvent(WebServerInitializedEvent event) { bind(event); } 再调用 bind(event) 方法:
public void bind(WebServerInitializedEvent event) { ApplicationContext context = event.getApplicationContext(); if (context instanceof ConfigurableWebServerApplicationContext) { if ("management".equals(((ConfigurableWebServerApplicationContext) context) .getServerNamespace())) { return; } } this.port.compareAndSet(0, event.getWebServer().getPort()); this.start(); } 然后调用 start() 方法
public void start() { if (!isEnabled()) { if (logger.isDebugEnabled()) { logger.debug("Discovery Lifecycle disabled. Not starting"); } return; } // only initialize if nonSecurePort is greater than 0 and it isnt already running // because of containerPortInitializer below if (!this.running.get()) { this.context.publishEvent( new InstancePreRegisteredEvent(this, getRegistration())); register(); if (shouldRegisterManagement()) { registerManagement(); } this.context.publishEvent( new InstanceRegisteredEvent<>(this, getConfiguration())); this.running.compareAndSet(false, true); } } 最后调用 register(); 在内部去调用 serviceRegistry.register() 方法完成服务注册。
private final ServiceRegistry<R> serviceRegistry; protected void register() { this.serviceRegistry.register(getRegistration()); } NacosServiceRegistry
NacosServiceRegistry 类主要的目的就是实现服务注册
public void register(Registration registration) { if (StringUtils.isEmpty(registration.getServiceId())) { log.warn("No service to register for nacos client..."); return; } // 默认情况下,会通过反射返回一个 `com.alibaba.nacos.client.naming.NacosNamingService` 的实例 NamingService namingService = namingService(); // 获取 serviceId , 默认使用配置: spring.application.name String serviceId = registration.getServiceId(); // 获取 group , 默认 DEFAULT_GROUP String group = nacosDiscoveryProperties.getGroup(); // 创建 instance 实例 Instance instance = getNacosInstanceFromRegistration(registration); try { // 注册实例 namingService.registerInstance(serviceId, group, instance); log.info("nacos registry, { } { } { }:{ } register finished", group, serviceId, instance.getIp(), instance.getPort()); } catch (Exception e) { log.error("nacos registry, { } register failed...{ },", serviceId, registration.toString(), e); // rethrow a RuntimeException if the registration is failed. // issue : https://github.com/alibaba/spring-cloud-alibaba/issues/1132 rethrowRuntimeException(e); } } 我们可以看到最后调用的是 namingService.registerInstance(serviceId, group, instance); 方法。
public void registerInstance(String serviceName, String groupName, Instance instance) throws NacosException { NamingUtils.checkInstanceIsLegal(instance); String groupedServiceName = NamingUtils.getGroupedName(serviceName, groupName); if (instance.isEphemeral()) { BeatInfo beatInfo = beatReactor.buildBeatInfo(groupedServiceName, instance); beatReactor.addBeatInfo(groupedServiceName, beatInfo); } serverProxy.registerService(groupedServiceName, groupName, instance); } 然后再调用 serverProxy.registerService(groupedServiceName, groupName, instance); 方法进行服务注册,通过 beatReactor.addBeatinfo() 创建 schedule 每间隔 5s 向服务端发送一次心跳数据
public void registerService(String serviceName, String groupName, Instance instance) throws NacosException { NAMING_LOGGER.info("[REGISTER-SERVICE] { } registering service { } with instance: { }", namespaceId, serviceName, instance); final Map<String, String> params = new HashMap<String, String>(16); params.put(CommonParams.NAMESPACE_ID, namespaceId); params.put(CommonParams.SERVICE_NAME, serviceName); params.put(CommonParams.GROUP_NAME, groupName); params.put(CommonParams.CLUSTER_NAME, instance.getClusterName()); params.put("ip", instance.getIp()); params.put("port", String.valueOf(instance.getPort())); params.put("weight", String.valueOf(instance.getWeight())); params.put("enable", String.valueOf(instance.isEnabled())); params.put("healthy", String.valueOf(instance.isHealthy())); params.put("ephemeral", String.valueOf(instance.isEphemeral())); params.put("metadata", JacksonUtils.toJson(instance.getMetadata())); // POST: /nacos/v1/ns/instance 进行服务注册 reqApi(UtilAndComs.nacosUrlInstance, params, HttpMethod.POST); } 服务注册服务端
Nacos 做为服务注册中心,既可以实现AP ,也能实现 CP 架构。来维护我们服务中心的服务列表。下面是我们服务列表一个简单的数据模型示意图:

其实就和咱们 NacosServiceRegistry#registry 构建 Instance 实例的过程是服务器租用一致的。继续回到我们源码分析我们直接来看服务端的 /nacos/v1/ns/instance 接口,被定义在 InstanceController#register 方法。
服务注册
在 InstanceController#register 方法中,主要是解析 request 参数然后调用 serviceManager.registerInstance , 如果返回 ok 就表示注册成功。
@CanDistro @PostMapping @Secured(parser = NamingResourceParser.class, action = ActionTypes.WRITE) public String register(HttpServletRequest request) throws Exception { final String namespaceId = WebUtils .optional(request, CommonParams.NAMESPACE_ID, Constants.DEFAULT_NAMESPACE_ID); final String serviceName = WebUtils.required(request, CommonParams.SERVICE_NAME); NamingUtils.checkServiceNameFormat(serviceName); final Instance instance = parseInstance(request); serviceManager.registerInstance(namespaceId, serviceName, instance); return "ok"; } registerInstance 方法的调用
public void registerInstance(String namespaceId, String serviceName, Instance instance) throws NacosException { createEmptyService(namespaceId, serviceName, instance.isEphemeral()); Service service = getService(namespaceId, serviceName); if (service == null) { throw new NacosException(NacosException.INVALID_PARAM, "service not found, namespace: " + namespaceId + ", service: " + serviceName); } addInstance(namespaceId, serviceName, instance.isEphemeral(), instance); } 再调用 addInstance() 方法
@Resource(name = "consistencyDelegate") private ConsistencyService consistencyService; public void addInstance(String namespaceId, String serviceName, boolean ephemeral, Instance... ips) throws NacosException { String key = KeyBuilder.buildInstanceListKey(namespaceId, serviceName, ephemeral); Service service = getService(namespaceId, serviceName); synchronized (service) { List<Instance> instanceList = addIpAddresses(service, ephemeral, ips); Instances instances = new Instances(); instances.setInstanceList(instanceList); consistencyService.put(key, instances); } } 调用 consistencyService.put(key, instances); 刷新 service 中的所有 instance。我们通过 consistencyService 的定义可以知道它将调用 DelegateConsistencyServiceImpl 类的 put 方法。在这个地方有一个 AP/CP 模式的选择我们可以通过
@Override public void put(String key, Record value) throws NacosException { mapConsistencyService(key).put(key, value); } // AP 或者 CP 模式的选择, AP 模式采用 Distro 协议, CP 模式采用 Raft 协议。 private ConsistencyService mapConsistencyService(String key) { return KeyBuilder.matchEphemeralKey(key) ? ephemeralConsistencyService : persistentConsistencyService; } AP 模式
Nacos 默认就是采用的 AP 模式使用 Distro 协议实现。实现的接口是 EphemeralConsistencyService 对节点信息的持久化主要是调用 put 方法
@Override public void put(String key, Record value) throws NacosException { // 数据持久化 onPut(key, value); // 通知其他服务节点 distroProtocol.sync(new DistroKey(key, KeyBuilder.INSTANCE_LIST_KEY_PREFIX), DataOperation.CHANGE, globalConfig.getTaskDispatchPeriod() / 2); } 在调用 doPut 来保存数据并且发通知
public void onPut(String key, Record value) { if (KeyBuilder.matchEphemeralInstanceListKey(key)) { Datum<Instances> datum = new Datum<>(); datum.value = (Instances) value; datum.key = key; datum.timestamp.incrementAndGet(); // 数据持久化 dataStore.put(key, datum); } if (!listeners.containsKey(key)) { return; } notifier.addTask(key, DataOperation.CHANGE); } 在 notifier.addTask 主要是通过 tasks.offer(Pair.with(datumKey, action)); 向阻塞队列 tasks 中放注册实例信息。通过 Notifier#run 方法来进行异步操作以保证效率
public class Notifier implements Runnable { @Override public void run() { Loggers.DISTRO.info("distro notifier started"); for (; ; ) { try { Pair<String, DataOperation> pair = tasks.take(); handle(pair); } catch (Throwable e) { Loggers.DISTRO.error("[NACOS-DISTRO] Error while handling notifying task", e); } } } private void handle(Pair<String, DataOperation> pair) { // 省略部分代码 for (RecordListener listener : listeners.get(datumKey)) { count++; try { if (action == DataOperation.CHANGE) { listener.onChange(datumKey, dataStore.get(datumKey).value); continue; } if (action == DataOperation.DELETE) { listener.onDelete(datumKey); continue; } } catch (Throwable e) { Loggers.DISTRO.error("[NACOS-DISTRO] error while notifying listener of key: { }", datumKey, e); } } } } 如果是 DataOperation.CHANGE 类型的事件会调用 listener.onChange(datumKey, dataStore.get(datumKey).value); 其实我们的 listener 就是我们的亿华云 Service 对象。
public void onChange(String key, Instances value) throws Exception { Loggers.SRV_LOG.info("[NACOS-RAFT] datum is changed, key: { }, value: { }", key, value); for (Instance instance : value.getInstanceList()) { if (instance == null) { // Reject this abnormal instance list: throw new RuntimeException("got null instance " + key); } if (instance.getWeight() > 10000.0D) { instance.setWeight(10000.0D); } if (instance.getWeight() < 0.01D && instance.getWeight() > 0.0D) { instance.setWeight(0.01D); } } updateIPs(value.getInstanceList(), KeyBuilder.matchEphemeralInstanceListKey(key)); recalculateChecksum(); } updateIPs 方法会将服务实例信息,更新到注册表的内存中去,并且会以 udp 的方式通知当前服务的订阅者。
public void updateIPs(Collection<Instance> instances, boolean ephemeral) { Map<String, List<Instance>> ipMap = new HashMap<>(clusterMap.size()); for (String clusterName : clusterMap.keySet()) { ipMap.put(clusterName, new ArrayList<>()); } for (Instance instance : instances) { try { if (instance == null) { Loggers.SRV_LOG.error("[NACOS-DOM] received malformed ip: null"); continue; } if (StringUtils.isEmpty(instance.getClusterName())) { instance.setClusterName(UtilsAndCommons.DEFAULT_CLUSTER_NAME); } if (!clusterMap.containsKey(instance.getClusterName())) { Loggers.SRV_LOG .warn("cluster: { } not found, ip: { }, will create new cluster with default configuration.", instance.getClusterName(), instance.toJson()); Cluster cluster = new Cluster(instance.getClusterName(), this); cluster.init(); getClusterMap().put(instance.getClusterName(), cluster); } List<Instance> clusterIPs = ipMap.get(instance.getClusterName()); if (clusterIPs == null) { clusterIPs = new LinkedList<>(); ipMap.put(instance.getClusterName(), clusterIPs); } clusterIPs.add(instance); } catch (Exception e) { Loggers.SRV_LOG.error("[NACOS-DOM] failed to process ip: " + instance, e); } } for (Map.Entry<String, List<Instance>> entry : ipMap.entrySet()) { //make every ip mine List<Instance> entryIPs = entry.getValue(); // 更新服务列表 clusterMap.get(entry.getKey()).updateIps(entryIPs, ephemeral); } setLastModifiedMillis(System.currentTimeMillis()); // 推送服务订阅者消息 getPushService().serviceChanged(this); StringBuilder stringBuilder = new StringBuilder(); for (Instance instance : allIPs()) { stringBuilder.append(instance.toIpAddr()).append("_").append(instance.isHealthy()).append(","); } Loggers.EVT_LOG.info("[IP-UPDATED] namespace: { }, service: { }, ips: { }", getNamespaceId(), getName(), stringBuilder.toString()); } CP 模式
Nacos 默认就是采用的 CP 模式使用 Raft 协议实现。实现类是 PersistentConsistencyServiceDelegateImpl
首先我们先看他的 put 方法
public void put(String key, Record value) throws NacosException { checkIsStopWork(); try { raftCore.signalPublish(key, value); } catch (Exception e) { Loggers.RAFT.error("Raft put failed.", e); throw new NacosException(NacosException.SERVER_ERROR, "Raft put failed, key:" + key + ", value:" + value, e); } } 调用 raftCore.signalPublish(key, value); 主要的步骤如下
判断是否是 Leader 节点,如果不是 Leader 节点将请求转发给 Leader 节点处理; 如果是 Leader 节点,首先执行 onPublish(datum, peers.local()); 方法,内部首先通过 raftStore.updateTerm(local.term.get()); 方法持久化到文件,然后通过 NotifyCenter.publishEvent(ValueChangeEvent.builder().key(datum.key).action(DataOperation.CHANGE).build());异步更新到内存; 通过 CountDownLatch 实现了一个过半机制 new CountDownLatch(peers.majorityCount()) 只有当成功的节点大于 N/2 + 1 的时候才返回成功。 调用其他的 Nacos 节点的 /raft/datum/commit 同步实例信息。 public void signalPublish(String key, Record value) throws Exception { if (stopWork) { throw new IllegalStateException("old raft protocol already stop work"); } if (!isLeader()) { ObjectNode params = JacksonUtils.createEmptyJsonNode(); params.put("key", key); params.replace("value", JacksonUtils.transferToJsonNode(value)); Map<String, String> parameters = new HashMap<>(1); parameters.put("key", key); final RaftPeer leader = getLeader(); raftProxy.proxyPostLarge(leader.ip, API_PUB, params.toString(), parameters); return; } OPERATE_LOCK.lock(); try { final long start = System.currentTimeMillis(); final Datum datum = new Datum(); datum.key = key; datum.value = value; if (getDatum(key) == null) { datum.timestamp.set(1L); } else { datum.timestamp.set(getDatum(key).timestamp.incrementAndGet()); } ObjectNode json = JacksonUtils.createEmptyJsonNode(); json.replace("datum", JacksonUtils.transferToJsonNode(datum)); json.replace("source", JacksonUtils.transferToJsonNode(peers.local())); onPublish(datum, peers.local()); final String content = json.toString(); final CountDownLatch latch = new CountDownLatch(peers.majorityCount()); for (final String server : peers.allServersIncludeMyself()) { if (isLeader(server)) { latch.countDown(); continue; } final String url = buildUrl(server, API_ON_PUB); HttpClient.asyncHttpPostLarge(url, Arrays.asList("key", key), content, new Callback<String>() { @Override public void onReceive(RestResult<String> result) { if (!result.ok()) { Loggers.RAFT .warn("[RAFT] failed to publish data to peer, datumId={ }, peer={ }, http code={ }", datum.key, server, result.getCode()); return; } latch.countDown(); } @Override public void onError(Throwable throwable) { Loggers.RAFT.error("[RAFT] failed to publish data to peer", throwable); } @Override public void onCancel() { } }); } if (!latch.await(UtilsAndCommons.RAFT_PUBLISH_TIMEOUT, TimeUnit.MILLISECONDS)) { // only majority servers return success can we consider this update success Loggers.RAFT.error("data publish failed, caused failed to notify majority, key={ }", key); throw new IllegalStateException("data publish failed, caused failed to notify majority, key=" + key); } long end = System.currentTimeMillis(); Loggers.RAFT.info("signalPublish cost { } ms, key: { }", (end - start), key); } finally { OPERATE_LOCK.unlock(); } } 判断 AP 模式还是 CP 模式
如果注册 nacos 的 client 节点注册时 ephemeral=true,那么 nacos 集群对这个 client 节点的效果就是 ap 的采用 distro,而注册nacos 的 client 节点注册时 ephemeral=false,那么nacos 集群对这个节点的效果就是 cp 的采用 raft。根据 client 注册时的属性,ap,cp 同时混合存在,只是对不同的 client 节点效果不同
Nacos 源码调试
Nacos 启动文件
首先我们需要找到 Nacos 的启动类,首先需要找到启动的 jar.

然后我们在解压 target/nacos-server.jar
解压命令:
# 解压 jar 包 tar -zxvf nacos-server.jar # 查看 MANIFEST.MF 内容 cat META-INF/MANIFEST.MF Manifest-Version: 1.0 Implementation-Title: nacos-console 1.4.2 Implementation-Version: 1.4.2 Archiver-Version: Plexus Archiver Built-By: xiweng.yy Spring-Boot-Layers-Index: BOOT-INF/layers.idx Specification-Vendor: Alibaba Group Specification-Title: nacos-console 1.4.2 Implementation-Vendor-Id: com.alibaba.nacos Spring-Boot-Version: 2.5.0-RC1 Implementation-Vendor: Alibaba Group Main-Class: org.springframework.boot.loader.PropertiesLauncher Spring-Boot-Classpath-Index: BOOT-INF/classpath.idx Start-Class: com.alibaba.nacos.Nacos Spring-Boot-Classes: BOOT-INF/classes/ Spring-Boot-Lib: BOOT-INF/lib/ Created-By: Apache Maven 3.6.3 Build-Jdk: 1.8.0_231 Specification-Version: 1.4.2 通过 MANIFEST.MF 中的配置信息,我们可以找到 Start-Class 这个配置这个类就是 Spring-Boot 项目的启动类 com.alibaba.nacos.Nacos
Nacos 调试
通过 com.alibaba.nacos.Nacos 的启动类,我们可以通过这个类在 Idea 中进行启动,然后调试。
本文转载自微信公众号「运维开发故事」,可以通过以下二维码关注。转载本文请联系运维开发故事公众号。
