LKML Archive on lore.kernel.org help / color / mirror / Atom feed
* [PATCH 1/2] IPVS: add wlib & wlip schedulers [not found] ` <Pine.LNX.4.61.0502010007060.1148@penguin.linux-vs.org> @ 2015-01-17 23:15 ` Chris Caputo 2015-01-19 23:17 ` Julian Anastasov 0 siblings, 1 reply; 9+ messages in thread From: Chris Caputo @ 2015-01-17 23:15 UTC (permalink / raw) To: Wensong Zhang, Julian Anastasov, Simon Horman; +Cc: lvs-devel, linux-kernel Wensong, this is something we discussed 10 years ago and you liked it, but it didn't actually get into the kernel. I've updated it, tested it, and would like to work toward inclusion. Thanks, Chris --- From: Chris Caputo <ccaputo@alt.net> IPVS wlib (Weighted Least Incoming Byterate) and wlip (Weighted Least Incoming Packetrate) schedulers, updated for 3.19-rc4. Signed-off-by: Chris Caputo <ccaputo@alt.net> --- diff -uprN linux-3.19-rc4-stock/net/netfilter/ipvs/Kconfig linux-3.19-rc4/net/netfilter/ipvs/Kconfig --- linux-3.19-rc4-stock/net/netfilter/ipvs/Kconfig 2015-01-11 20:44:53.000000000 +0000 +++ linux-3.19-rc4/net/netfilter/ipvs/Kconfig 2015-01-17 22:47:52.250301042 +0000 @@ -240,6 +240,26 @@ config IP_VS_NQ If you want to compile it in kernel, say Y. To compile it as a module, choose M here. If unsure, say N. +config IP_VS_WLIB + tristate "weighted least incoming byterate scheduling" + ---help--- + The weighted least incoming byterate scheduling algorithm directs + network connections to the server with the least incoming byterate + normalized by the server weight. + + If you want to compile it in kernel, say Y. To compile it as a + module, choose M here. If unsure, say N. + +config IP_VS_WLIP + tristate "weighted least incoming packetrate scheduling" + ---help--- + The weighted least incoming packetrate scheduling algorithm directs + network connections to the server with the least incoming packetrate + normalized by the server weight. + + If you want to compile it in kernel, say Y. To compile it as a + module, choose M here. If unsure, say N. + comment 'IPVS SH scheduler' config IP_VS_SH_TAB_BITS diff -uprN linux-3.19-rc4-stock/net/netfilter/ipvs/Makefile linux-3.19-rc4/net/netfilter/ipvs/Makefile --- linux-3.19-rc4-stock/net/netfilter/ipvs/Makefile 2015-01-11 20:44:53.000000000 +0000 +++ linux-3.19-rc4/net/netfilter/ipvs/Makefile 2015-01-17 22:47:35.421861075 +0000 @@ -33,6 +33,8 @@ obj-$(CONFIG_IP_VS_DH) += ip_vs_dh.o obj-$(CONFIG_IP_VS_SH) += ip_vs_sh.o obj-$(CONFIG_IP_VS_SED) += ip_vs_sed.o obj-$(CONFIG_IP_VS_NQ) += ip_vs_nq.o +obj-$(CONFIG_IP_VS_WLIB) += ip_vs_wlib.o +obj-$(CONFIG_IP_VS_WLIP) += ip_vs_wlip.o # IPVS application helpers obj-$(CONFIG_IP_VS_FTP) += ip_vs_ftp.o diff -uprN linux-3.19-rc4-stock/net/netfilter/ipvs/ip_vs_wlib.c linux-3.19-rc4/net/netfilter/ipvs/ip_vs_wlib.c --- linux-3.19-rc4-stock/net/netfilter/ipvs/ip_vs_wlib.c 1970-01-01 00:00:00.000000000 +0000 +++ linux-3.19-rc4/net/netfilter/ipvs/ip_vs_wlib.c 2015-01-17 22:47:35.421861075 +0000 @@ -0,0 +1,156 @@ +/* IPVS: Weighted Least Incoming Byterate Scheduling module + * + * Authors: Chris Caputo <ccaputo@alt.net> based on code by: + * + * Wensong Zhang <wensong@linuxvirtualserver.org> + * Peter Kese <peter.kese@ijs.si> + * Julian Anastasov <ja@ssi.bg> + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * Changes: + * Chris Caputo: Based code on ip_vs_wlc.c ip_vs_rr.c. + * + */ + +/* The WLIB algorithm uses the results of the estimator's inbps + * calculations to determine which real server has the lowest incoming + * byterate. + * + * Real server weight is factored into the calculation. An example way to + * use this is if you have one server that can handle 100 Mbps of input and + * another that can handle 1 Gbps you could set the weights to be 100 and 1000 + * respectively. + */ + +#define KMSG_COMPONENT "IPVS" +#define pr_fmt(fmt) KMSG_COMPONENT ": " fmt + +#include <linux/module.h> +#include <linux/kernel.h> + +#include <net/ip_vs.h> + +static int +ip_vs_wlib_init_svc(struct ip_vs_service *svc) +{ + svc->sched_data = &svc->destinations; + return 0; +} + +static int +ip_vs_wlib_del_dest(struct ip_vs_service *svc, struct ip_vs_dest *dest) +{ + struct list_head *p; + + spin_lock_bh(&svc->sched_lock); + p = (struct list_head *)svc->sched_data; + /* dest is already unlinked, so p->prev is not valid but + * p->next is valid, use it to reach previous entry. + */ + if (p == &dest->n_list) + svc->sched_data = p->next->prev; + spin_unlock_bh(&svc->sched_lock); + return 0; +} + +/* Weighted Least Incoming Byterate scheduling */ +static struct ip_vs_dest * +ip_vs_wlib_schedule(struct ip_vs_service *svc, const struct sk_buff *skb, + struct ip_vs_iphdr *iph) +{ + struct list_head *p, *q; + struct ip_vs_dest *dest, *least = NULL; + u32 dr, lr = -1; + int dwgt, lwgt = 0; + + IP_VS_DBG(6, "%s(): Scheduling...\n", __func__); + + /* We calculate the load of each dest server as follows: + * (dest inbps rate) / dest->weight + * + * The comparison of dr*lwght < lr*dwght is equivalent to that of + * dr/dwght < lr/lwght if every weight is larger than zero. + * + * A server with weight=0 is quiesced and will not receive any + * new connections. + * + * In case of ties, highest weight is winner. And if that still makes + * for a tie, round robin is used (which is why we remember our last + * starting location in the linked list). + */ + + spin_lock_bh(&svc->sched_lock); + p = (struct list_head *)svc->sched_data; + p = list_next_rcu(p); + q = p; + do { + /* skip list head */ + if (q == &svc->destinations) { + q = list_next_rcu(q); + continue; + } + + dest = list_entry_rcu(q, struct ip_vs_dest, n_list); + dwgt = atomic_read(&dest->weight); + if (!(dest->flags & IP_VS_DEST_F_OVERLOAD) && dwgt > 0) { + spin_lock(&dest->stats.lock); + dr = dest->stats.ustats.inbps; + spin_unlock(&dest->stats.lock); + + if (!least || + (u64)dr * (u64)lwgt < (u64)lr * (u64)dwgt || + (dr == lr && dwgt > lwgt)) { + least = dest; + lr = dr; + lwgt = dwgt; + svc->sched_data = q; + } + } + q = list_next_rcu(q); + } while (q != p); + spin_unlock_bh(&svc->sched_lock); + + if (least) { + IP_VS_DBG_BUF(6, + "WLIB: server %s:%u activeconns %d refcnt %d weight %d\n", + IP_VS_DBG_ADDR(least->af, &least->addr), + ntohs(least->port), + atomic_read(&least->activeconns), + atomic_read(&least->refcnt), + atomic_read(&least->weight)); + } else { + ip_vs_scheduler_err(svc, "no destination available"); + } + + return least; +} + +static struct ip_vs_scheduler ip_vs_wlib_scheduler = { + .name = "wlib", + .refcnt = ATOMIC_INIT(0), + .module = THIS_MODULE, + .n_list = LIST_HEAD_INIT(ip_vs_wlib_scheduler.n_list), + .init_service = ip_vs_wlib_init_svc, + .add_dest = NULL, + .del_dest = ip_vs_wlib_del_dest, + .schedule = ip_vs_wlib_schedule, +}; + +static int __init ip_vs_wlib_init(void) +{ + return register_ip_vs_scheduler(&ip_vs_wlib_scheduler); +} + +static void __exit ip_vs_wlib_cleanup(void) +{ + unregister_ip_vs_scheduler(&ip_vs_wlib_scheduler); + synchronize_rcu(); +} + +module_init(ip_vs_wlib_init); +module_exit(ip_vs_wlib_cleanup); +MODULE_LICENSE("GPL"); diff -uprN linux-3.19-rc4-stock/net/netfilter/ipvs/ip_vs_wlip.c linux-3.19-rc4/net/netfilter/ipvs/ip_vs_wlip.c --- linux-3.19-rc4-stock/net/netfilter/ipvs/ip_vs_wlip.c 1970-01-01 00:00:00.000000000 +0000 +++ linux-3.19-rc4/net/netfilter/ipvs/ip_vs_wlip.c 2015-01-17 22:47:35.421861075 +0000 @@ -0,0 +1,156 @@ +/* IPVS: Weighted Least Incoming Packetrate Scheduling module + * + * Authors: Chris Caputo <ccaputo@alt.net> based on code by: + * + * Wensong Zhang <wensong@linuxvirtualserver.org> + * Peter Kese <peter.kese@ijs.si> + * Julian Anastasov <ja@ssi.bg> + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * Changes: + * Chris Caputo: Based code on ip_vs_wlc.c ip_vs_rr.c. + * + */ + +/* The WLIP algorithm uses the results of the estimator's inpps + * calculations to determine which real server has the lowest incoming + * packetrate. + * + * Real server weight is factored into the calculation. An example way to + * use this is if you have one server that can handle 10 Kpps of input and + * another that can handle 100 Kpps you could set the weights to be 10 and 100 + * respectively. + */ + +#define KMSG_COMPONENT "IPVS" +#define pr_fmt(fmt) KMSG_COMPONENT ": " fmt + +#include <linux/module.h> +#include <linux/kernel.h> + +#include <net/ip_vs.h> + +static int +ip_vs_wlip_init_svc(struct ip_vs_service *svc) +{ + svc->sched_data = &svc->destinations; + return 0; +} + +static int +ip_vs_wlip_del_dest(struct ip_vs_service *svc, struct ip_vs_dest *dest) +{ + struct list_head *p; + + spin_lock_bh(&svc->sched_lock); + p = (struct list_head *)svc->sched_data; + /* dest is already unlinked, so p->prev is not valid but + * p->next is valid, use it to reach previous entry. + */ + if (p == &dest->n_list) + svc->sched_data = p->next->prev; + spin_unlock_bh(&svc->sched_lock); + return 0; +} + +/* Weighted Least Incoming Packetrate scheduling */ +static struct ip_vs_dest * +ip_vs_wlip_schedule(struct ip_vs_service *svc, const struct sk_buff *skb, + struct ip_vs_iphdr *iph) +{ + struct list_head *p, *q; + struct ip_vs_dest *dest, *least = NULL; + u32 dr, lr = -1; + int dwgt, lwgt = 0; + + IP_VS_DBG(6, "%s(): Scheduling...\n", __func__); + + /* We calculate the load of each dest server as follows: + * (dest inpps rate) / dest->weight + * + * The comparison of dr*lwght < lr*dwght is equivalent to that of + * dr/dwght < lr/lwght if every weight is larger than zero. + * + * A server with weight=0 is quiesced and will not receive any + * new connections. + * + * In case of ties, highest weight is winner. And if that still makes + * for a tie, round robin is used (which is why we remember our last + * starting location in the linked list). + */ + + spin_lock_bh(&svc->sched_lock); + p = (struct list_head *)svc->sched_data; + p = list_next_rcu(p); + q = p; + do { + /* skip list head */ + if (q == &svc->destinations) { + q = list_next_rcu(q); + continue; + } + + dest = list_entry_rcu(q, struct ip_vs_dest, n_list); + dwgt = atomic_read(&dest->weight); + if (!(dest->flags & IP_VS_DEST_F_OVERLOAD) && dwgt > 0) { + spin_lock(&dest->stats.lock); + dr = dest->stats.ustats.inpps; + spin_unlock(&dest->stats.lock); + + if (!least || + (u64)dr * (u64)lwgt < (u64)lr * (u64)dwgt || + (dr == lr && dwgt > lwgt)) { + least = dest; + lr = dr; + lwgt = dwgt; + svc->sched_data = q; + } + } + q = list_next_rcu(q); + } while (q != p); + spin_unlock_bh(&svc->sched_lock); + + if (least) { + IP_VS_DBG_BUF(6, + "WLIP: server %s:%u activeconns %d refcnt %d weight %d\n", + IP_VS_DBG_ADDR(least->af, &least->addr), + ntohs(least->port), + atomic_read(&least->activeconns), + atomic_read(&least->refcnt), + atomic_read(&least->weight)); + } else { + ip_vs_scheduler_err(svc, "no destination available"); + } + + return least; +} + +static struct ip_vs_scheduler ip_vs_wlip_scheduler = { + .name = "wlip", + .refcnt = ATOMIC_INIT(0), + .module = THIS_MODULE, + .n_list = LIST_HEAD_INIT(ip_vs_wlip_scheduler.n_list), + .init_service = ip_vs_wlip_init_svc, + .add_dest = NULL, + .del_dest = ip_vs_wlip_del_dest, + .schedule = ip_vs_wlip_schedule, +}; + +static int __init ip_vs_wlip_init(void) +{ + return register_ip_vs_scheduler(&ip_vs_wlip_scheduler); +} + +static void __exit ip_vs_wlip_cleanup(void) +{ + unregister_ip_vs_scheduler(&ip_vs_wlip_scheduler); + synchronize_rcu(); +} + +module_init(ip_vs_wlip_init); +module_exit(ip_vs_wlip_cleanup); +MODULE_LICENSE("GPL"); ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] IPVS: add wlib & wlip schedulers 2015-01-17 23:15 ` [PATCH 1/2] IPVS: add wlib & wlip schedulers Chris Caputo @ 2015-01-19 23:17 ` Julian Anastasov 2015-01-20 23:21 ` [PATCH 1/3] " Chris Caputo ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Julian Anastasov @ 2015-01-19 23:17 UTC (permalink / raw) To: Chris Caputo; +Cc: Wensong Zhang, Simon Horman, lvs-devel, linux-kernel Hello, On Sat, 17 Jan 2015, Chris Caputo wrote: > From: Chris Caputo <ccaputo@alt.net> > > IPVS wlib (Weighted Least Incoming Byterate) and wlip (Weighted Least Incoming > Packetrate) schedulers, updated for 3.19-rc4. The IPVS estimator uses 2-second timer to update the stats, isn't that a problem for such schedulers? Also, you schedule by incoming traffic rate which is ok when clients mostly upload. But in the common case clients mostly download and IPVS processes download traffic only for NAT method. May be not so useful idea: use sum of both directions or control it with svc->flags & IP_VS_SVC_F_SCHED_WLIB_xxx flags, see how "sh" scheduler supports flags. I.e. inbps + outbps. Another problem: pps and bps are shifted values, see how ip_vs_read_estimator() reads them. ip_vs_est.c contains comments that this code handles couple of gigabits. May be inbps and outbps in struct ip_vs_estimator should be changed to u64 to support more gigabits, with separate patch. > Signed-off-by: Chris Caputo <ccaputo@alt.net> > --- > +++ linux-3.19-rc4/net/netfilter/ipvs/ip_vs_wlib.c 2015-01-17 22:47:35.421861075 +0000 > +/* Weighted Least Incoming Byterate scheduling */ > +static struct ip_vs_dest * > +ip_vs_wlib_schedule(struct ip_vs_service *svc, const struct sk_buff *skb, > + struct ip_vs_iphdr *iph) > +{ > + struct list_head *p, *q; > + struct ip_vs_dest *dest, *least = NULL; > + u32 dr, lr = -1; > + int dwgt, lwgt = 0; To support u64 result from 32-bit multiply we can change the vars as follows: u32 dwgt, lwgt = 0; > + spin_lock_bh(&svc->sched_lock); > + p = (struct list_head *)svc->sched_data; > + p = list_next_rcu(p); Note that dests are deleted from svc->destinations out of any lock (from __ip_vs_unlink_dest), above lock svc->sched_lock protects only svc->sched_data. So, RCU dereference is needed here, list_next_rcu is not enough. Better to stick to the list walking from the rr algorithm in ip_vs_rr.c. > + q = p; > + do { > + /* skip list head */ > + if (q == &svc->destinations) { > + q = list_next_rcu(q); > + continue; > + } > + > + dest = list_entry_rcu(q, struct ip_vs_dest, n_list); > + dwgt = atomic_read(&dest->weight); This will be dwgt = (u32) atomic_read(&dest->weight); > + if (!(dest->flags & IP_VS_DEST_F_OVERLOAD) && dwgt > 0) { > + spin_lock(&dest->stats.lock); > + dr = dest->stats.ustats.inbps; > + spin_unlock(&dest->stats.lock); > + > + if (!least || > + (u64)dr * (u64)lwgt < (u64)lr * (u64)dwgt || This will be (u64)dr * lwgt < (u64)lr * dwgt || See commit c16526a7b99c1c for 32x32 multiply. > + (dr == lr && dwgt > lwgt)) { Above check is redundant. > + least = dest; > + lr = dr; > + lwgt = dwgt; > + svc->sched_data = q; Better to update sched_data at final, see below... > + } > + } > + q = list_next_rcu(q); > + } while (q != p); if (least) svc->sched_data = &least->n_list; > + spin_unlock_bh(&svc->sched_lock); Same comments for wlip. Regards -- Julian Anastasov <ja@ssi.bg> ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/3] IPVS: add wlib & wlip schedulers 2015-01-19 23:17 ` Julian Anastasov @ 2015-01-20 23:21 ` Chris Caputo 2015-01-22 22:06 ` Julian Anastasov 2015-01-20 23:21 ` [PATCH 2/3] " Chris Caputo 2015-01-20 23:21 ` [PATCH 3/3] " Chris Caputo 2 siblings, 1 reply; 9+ messages in thread From: Chris Caputo @ 2015-01-20 23:21 UTC (permalink / raw) To: Julian Anastasov; +Cc: Wensong Zhang, Simon Horman, lvs-devel, linux-kernel On Tue, 20 Jan 2015, Julian Anastasov wrote: > On Sat, 17 Jan 2015, Chris Caputo wrote: > > From: Chris Caputo <ccaputo@alt.net> > > > > IPVS wlib (Weighted Least Incoming Byterate) and wlip (Weighted Least Incoming > > Packetrate) schedulers, updated for 3.19-rc4. Hi Julian, Thanks for the review. > The IPVS estimator uses 2-second timer to update > the stats, isn't that a problem for such schedulers? > Also, you schedule by incoming traffic rate which is > ok when clients mostly upload. But in the common case > clients mostly download and IPVS processes download > traffic only for NAT method. My application consists of incoming TCP streams being load balanced to servers which receive the feeds. These are long lived multi-gigabyte streams, and so I believe the estimator's 2-second timer is fine. As an example: # cat /proc/net/ip_vs_stats Total Incoming Outgoing Incoming Outgoing Conns Packets Packets Bytes Bytes 9AB 58B7C17 0 1237CA2C325 0 Conns/s Pkts/s Pkts/s Bytes/s Bytes/s 1 387C 0 B16C4AE 0 > May be not so useful idea: use sum of both directions > or control it with svc->flags & IP_VS_SVC_F_SCHED_WLIB_xxx > flags, see how "sh" scheduler supports flags. I.e. > inbps + outbps. I see a user-mode option as increasing complexity. For example, keepalived users would need to have keepalived patched to support the new algorithm, due to flags, rather than just configuring "wlib" or "wlip" and it just working. I think I'd rather see a wlob/wlop version for users that want to load-balance based on outgoing bytes/packets, and a wlb/wlp version for users that want them summed. > Another problem: pps and bps are shifted values, > see how ip_vs_read_estimator() reads them. ip_vs_est.c > contains comments that this code handles couple of > gigabits. May be inbps and outbps in struct ip_vs_estimator > should be changed to u64 to support more gigabits, with > separate patch. See patch below to convert bps in ip_vs_estimator to 64-bits. Other patches, based on your feedback, to follow. Thanks, Chris From: Chris Caputo <ccaputo@alt.net> IPVS: Change inbps and outbps to 64-bits so that estimator handles faster flows. Also increases maximum viewable at user level from ~2.15Gbits/s to ~34.35Gbits/s. Signed-off-by: Chris Caputo <ccaputo@alt.net> --- diff -uprN linux-3.19-rc5-stock/include/net/ip_vs.h linux-3.19-rc5/include/net/ip_vs.h --- linux-3.19-rc5-stock/include/net/ip_vs.h 2015-01-18 06:02:20.000000000 +0000 +++ linux-3.19-rc5/include/net/ip_vs.h 2015-01-20 08:01:15.548177969 +0000 @@ -390,8 +390,8 @@ struct ip_vs_estimator { u32 cps; u32 inpps; u32 outpps; - u32 inbps; - u32 outbps; + u64 inbps; + u64 outbps; }; struct ip_vs_stats { diff -uprN linux-3.19-rc5-stock/net/netfilter/ipvs/ip_vs_est.c linux-3.19-rc5/net/netfilter/ipvs/ip_vs_est.c --- linux-3.19-rc5-stock/net/netfilter/ipvs/ip_vs_est.c 2015-01-18 06:02:20.000000000 +0000 +++ linux-3.19-rc5/net/netfilter/ipvs/ip_vs_est.c 2015-01-20 08:01:34.369840704 +0000 @@ -45,10 +45,12 @@ NOTES. - * The stored value for average bps is scaled by 2^5, so that maximal - rate is ~2.15Gbits/s, average pps and cps are scaled by 2^10. + * Average bps is scaled by 2^5, while average pps and cps are scaled by 2^10. - * A lot code is taken from net/sched/estimator.c + * All are reported to user level as 32 bit unsigned values. Bps can + overflow for fast links : max speed being ~34.35Gbits/s. + + * A lot of code is taken from net/core/gen_estimator.c */ @@ -98,7 +100,7 @@ static void estimation_timer(unsigned lo u32 n_conns; u32 n_inpkts, n_outpkts; u64 n_inbytes, n_outbytes; - u32 rate; + u64 rate; struct net *net = (struct net *)arg; struct netns_ipvs *ipvs; @@ -118,23 +120,24 @@ static void estimation_timer(unsigned lo /* scaled by 2^10, but divided 2 seconds */ rate = (n_conns - e->last_conns) << 9; e->last_conns = n_conns; - e->cps += ((long)rate - (long)e->cps) >> 2; + e->cps += ((s64)rate - (s64)e->cps) >> 2; rate = (n_inpkts - e->last_inpkts) << 9; e->last_inpkts = n_inpkts; - e->inpps += ((long)rate - (long)e->inpps) >> 2; + e->inpps += ((s64)rate - (s64)e->inpps) >> 2; rate = (n_outpkts - e->last_outpkts) << 9; e->last_outpkts = n_outpkts; - e->outpps += ((long)rate - (long)e->outpps) >> 2; + e->outpps += ((s64)rate - (s64)e->outpps) >> 2; + /* scaled by 2^5, but divided 2 seconds */ rate = (n_inbytes - e->last_inbytes) << 4; e->last_inbytes = n_inbytes; - e->inbps += ((long)rate - (long)e->inbps) >> 2; + e->inbps += ((s64)rate - (s64)e->inbps) >> 2; rate = (n_outbytes - e->last_outbytes) << 4; e->last_outbytes = n_outbytes; - e->outbps += ((long)rate - (long)e->outbps) >> 2; + e->outbps += ((s64)rate - (s64)e->outbps) >> 2; spin_unlock(&s->lock); } spin_unlock(&ipvs->est_lock); ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/3] IPVS: add wlib & wlip schedulers 2015-01-20 23:21 ` [PATCH 1/3] " Chris Caputo @ 2015-01-22 22:06 ` Julian Anastasov 2015-01-23 4:16 ` Chris Caputo 2015-01-27 8:36 ` Julian Anastasov 0 siblings, 2 replies; 9+ messages in thread From: Julian Anastasov @ 2015-01-22 22:06 UTC (permalink / raw) To: Chris Caputo; +Cc: Wensong Zhang, Simon Horman, lvs-devel, linux-kernel Hello, On Tue, 20 Jan 2015, Chris Caputo wrote: > My application consists of incoming TCP streams being load balanced to > servers which receive the feeds. These are long lived multi-gigabyte > streams, and so I believe the estimator's 2-second timer is fine. As an > example: > > # cat /proc/net/ip_vs_stats > Total Incoming Outgoing Incoming Outgoing > Conns Packets Packets Bytes Bytes > 9AB 58B7C17 0 1237CA2C325 0 > > Conns/s Pkts/s Pkts/s Bytes/s Bytes/s > 1 387C 0 B16C4AE 0 All other schedulers react and see different picture after every new connection. The worst example is WLC where slow-start mechanism is desired because idle server can be overloaded before the load is noticed properly. Even WRR accounts every connection in its state. Your setup may expect low number of connections per second but for other kind of setups sending all connections to same server for 2 seconds looks scary. In fact, what changes is the position, so we rotate only among the least loaded servers that look equally loaded but it is one server in the common case. And as our stats are per CPU and designed for human reading, it is difficult to read them often for other purposes. We need a good idea to solve this problem, so that we can have faster feedback after every scheduling. > > May be not so useful idea: use sum of both directions > > or control it with svc->flags & IP_VS_SVC_F_SCHED_WLIB_xxx > > flags, see how "sh" scheduler supports flags. I.e. > > inbps + outbps. > > I see a user-mode option as increasing complexity. For example, > keepalived users would need to have keepalived patched to support the new > algorithm, due to flags, rather than just configuring "wlib" or "wlip" and > it just working. That is also true. > I think I'd rather see a wlob/wlop version for users that want to > load-balance based on outgoing bytes/packets, and a wlb/wlp version for > users that want them summed. ok > From: Chris Caputo <ccaputo@alt.net> > > IPVS: Change inbps and outbps to 64-bits so that estimator handles faster > flows. Also increases maximum viewable at user level from ~2.15Gbits/s to > ~34.35Gbits/s. Yep, we are limited from u32 in user space structs. I have to think how to solve this problem. 1gbit => ~1.5 million pps 10gbit => ~15 million pps 100gbit => ~150 million pps > Signed-off-by: Chris Caputo <ccaputo@alt.net> > --- > diff -uprN linux-3.19-rc5-stock/include/net/ip_vs.h linux-3.19-rc5/include/net/ip_vs.h > --- linux-3.19-rc5-stock/include/net/ip_vs.h 2015-01-18 06:02:20.000000000 +0000 > +++ linux-3.19-rc5/include/net/ip_vs.h 2015-01-20 08:01:15.548177969 +0000 > @@ -390,8 +390,8 @@ struct ip_vs_estimator { > u32 cps; > u32 inpps; > u32 outpps; > - u32 inbps; > - u32 outbps; > + u64 inbps; > + u64 outbps; Not sure, may be everything here should be u64 because we have shifted values. I'll need some days to investigate this issue... Regards -- Julian Anastasov <ja@ssi.bg> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/3] IPVS: add wlib & wlip schedulers 2015-01-22 22:06 ` Julian Anastasov @ 2015-01-23 4:16 ` Chris Caputo 2015-01-27 8:36 ` Julian Anastasov 1 sibling, 0 replies; 9+ messages in thread From: Chris Caputo @ 2015-01-23 4:16 UTC (permalink / raw) To: Julian Anastasov; +Cc: Wensong Zhang, Simon Horman, lvs-devel, linux-kernel On Fri, 23 Jan 2015, Julian Anastasov wrote: > Hello, > > On Tue, 20 Jan 2015, Chris Caputo wrote: > > My application consists of incoming TCP streams being load balanced to > > servers which receive the feeds. These are long lived multi-gigabyte > > streams, and so I believe the estimator's 2-second timer is fine. As an > > example: > > > > # cat /proc/net/ip_vs_stats > > Total Incoming Outgoing Incoming Outgoing > > Conns Packets Packets Bytes Bytes > > 9AB 58B7C17 0 1237CA2C325 0 > > > > Conns/s Pkts/s Pkts/s Bytes/s Bytes/s > > 1 387C 0 B16C4AE 0 > > All other schedulers react and see different > picture after every new connection. The worst example > is WLC where slow-start mechanism is desired because > idle server can be overloaded before the load is noticed > properly. Even WRR accounts every connection in its state. > > Your setup may expect low number of connections per > second but for other kind of setups sending all connections > to same server for 2 seconds looks scary. In fact, what > changes is the position, so we rotate only among the > least loaded servers that look equally loaded but it is > one server in the common case. And as our stats are per > CPU and designed for human reading, it is difficult to > read them often for other purposes. We need a good idea > to solve this problem, so that we can have faster feedback > after every scheduling. This is exactly why my wlib/wlip code is a hybrid of wlc and rr. Last location is saved, and search is started after it. Thus when traffic is zero, round-robin occurs. When flows already exist, bursts of new connections do choose poorly based on repeated use of last estimation, but the complexity of working around that seems complex. > > > May be not so useful idea: use sum of both directions > > > or control it with svc->flags & IP_VS_SVC_F_SCHED_WLIB_xxx > > > flags, see how "sh" scheduler supports flags. I.e. > > > inbps + outbps. > > > > I see a user-mode option as increasing complexity. For example, > > keepalived users would need to have keepalived patched to support the new > > algorithm, due to flags, rather than just configuring "wlib" or "wlip" and > > it just working. > > That is also true. > > > I think I'd rather see a wlob/wlop version for users that want to > > load-balance based on outgoing bytes/packets, and a wlb/wlp version for > > users that want them summed. > > ok > > > From: Chris Caputo <ccaputo@alt.net> > > > > IPVS: Change inbps and outbps to 64-bits so that estimator handles faster > > flows. Also increases maximum viewable at user level from ~2.15Gbits/s to > > ~34.35Gbits/s. > > Yep, we are limited from u32 in user space structs. > I have to think how to solve this problem. > > 1gbit => ~1.5 million pps > 10gbit => ~15 million pps > 100gbit => ~150 million pps > > > Signed-off-by: Chris Caputo <ccaputo@alt.net> > > --- > > diff -uprN linux-3.19-rc5-stock/include/net/ip_vs.h linux-3.19-rc5/include/net/ip_vs.h > > --- linux-3.19-rc5-stock/include/net/ip_vs.h 2015-01-18 06:02:20.000000000 +0000 > > +++ linux-3.19-rc5/include/net/ip_vs.h 2015-01-20 08:01:15.548177969 +0000 > > @@ -390,8 +390,8 @@ struct ip_vs_estimator { > > u32 cps; > > u32 inpps; > > u32 outpps; > > - u32 inbps; > > - u32 outbps; > > + u64 inbps; > > + u64 outbps; > > Not sure, may be everything here should be u64 because > we have shifted values. I'll need some days to investigate > this issue... > > Regards > > -- > Julian Anastasov <ja@ssi.bg> Sounds good and thanks! Chris ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/3] IPVS: add wlib & wlip schedulers 2015-01-22 22:06 ` Julian Anastasov 2015-01-23 4:16 ` Chris Caputo @ 2015-01-27 8:36 ` Julian Anastasov 1 sibling, 0 replies; 9+ messages in thread From: Julian Anastasov @ 2015-01-27 8:36 UTC (permalink / raw) To: Chris Caputo; +Cc: Wensong Zhang, Simon Horman, lvs-devel, linux-kernel Hello, On Fri, 23 Jan 2015, Julian Anastasov wrote: > On Tue, 20 Jan 2015, Chris Caputo wrote: > > > My application consists of incoming TCP streams being load balanced to > > servers which receive the feeds. These are long lived multi-gigabyte > > streams, and so I believe the estimator's 2-second timer is fine. As an > > example: > > > > # cat /proc/net/ip_vs_stats > > Total Incoming Outgoing Incoming Outgoing > > Conns Packets Packets Bytes Bytes > > 9AB 58B7C17 0 1237CA2C325 0 > > > > Conns/s Pkts/s Pkts/s Bytes/s Bytes/s > > 1 387C 0 B16C4AE 0 > > Not sure, may be everything here should be u64 because > we have shifted values. I'll need some days to investigate > this issue... For now I don't see hope in using schedulers that rely on IPVS byte/packet stats, due to the slow update (2 seconds). If we reduce this period we can cause performance problems to other users. Every *-LEAST-* (eg. LC, WLC) algorithm needs actual information to take decision on every new connection. OTOH, all *-ROUND-ROBIN-* algorithms (RR, WRR) use information (weights) from user space, by this way kernel performs as expected. Currently, LC/WLC use feedback from the 3-way TCP handshake, see ip_vs_dest_conn_overhead() where the established connections have large preference. Such feedback from real servers is delayed usually with microseconds, up to milliseconds. More time if depends on clients. The proposed schedulers have round-robin function but only among least loaded servers, so it is not dominant and we suffer from slow feedback from the estimator. For load information that is not present in kernel an user space daemon is needed to determine weights to use with WRR. It can take actual stats from real server, for example, it can take into account non-IPVS traffic. As alternative, it is possible to implement some new svc method that can be called for every packet, for example, in ip_vs_in_stats(). It does not look fatal to add some fields in struct ip_vs_dest that only specific schedulers will update, for example, byte/packet counters. Of course, the spin_locks the scheduler must use will suffer on many CPUs. Such info can be also attached as allocated structure in RCU pointer dest->sched_info where data and corresponding methods can be stored. It will need careful RCU-kind of update, especially when scheduler is updated in svc. If you think such idea can work we can discuss the RCU and scheduler changes that are needed. The proposed schedulers have to implement counters, their own estimator and WRR function. Another variant can be to extend WRR with some support for automatic dynamic-weight update depending on parameters: -s wrr --sched-flags {wlip,wlib,...} or using new option --sched-param that can also provide info for wrr estimator, etc. In any case, the extended WRR scheduler will need above support to check every packet. Regards -- Julian Anastasov <ja@ssi.bg> ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 2/3] IPVS: add wlib & wlip schedulers 2015-01-19 23:17 ` Julian Anastasov 2015-01-20 23:21 ` [PATCH 1/3] " Chris Caputo @ 2015-01-20 23:21 ` Chris Caputo 2015-01-22 21:07 ` Julian Anastasov 2015-01-20 23:21 ` [PATCH 3/3] " Chris Caputo 2 siblings, 1 reply; 9+ messages in thread From: Chris Caputo @ 2015-01-20 23:21 UTC (permalink / raw) To: Julian Anastasov; +Cc: Wensong Zhang, Simon Horman, lvs-devel, linux-kernel On Tue, 20 Jan 2015, Julian Anastasov wrote: > > + (u64)dr * (u64)lwgt < (u64)lr * (u64)dwgt || [...] > > + (dr == lr && dwgt > lwgt)) { > > Above check is redundant. I accepted your feedback and applied it to the below, except for this item. I believe if dr and lr are zero (no traffic), we still want to choose the higher weight, thus a separate comparison is needed. Thanks, Chris From: Chris Caputo <ccaputo@alt.net> IPVS wlib (Weighted Least Incoming Byterate) and wlip (Weighted Least Incoming Packetrate) schedulers, updated for 3.19-rc5. Signed-off-by: Chris Caputo <ccaputo@alt.net> --- diff -uprN linux-3.19-rc5-stock/net/netfilter/ipvs/Kconfig linux-3.19-rc5/net/netfilter/ipvs/Kconfig --- linux-3.19-rc5-stock/net/netfilter/ipvs/Kconfig 2015-01-18 06:02:20.000000000 +0000 +++ linux-3.19-rc5/net/netfilter/ipvs/Kconfig 2015-01-20 08:08:28.883080285 +0000 @@ -240,6 +240,26 @@ config IP_VS_NQ If you want to compile it in kernel, say Y. To compile it as a module, choose M here. If unsure, say N. +config IP_VS_WLIB + tristate "weighted least incoming byterate scheduling" + ---help--- + The weighted least incoming byterate scheduling algorithm directs + network connections to the server with the least incoming byterate + normalized by the server weight. + + If you want to compile it in kernel, say Y. To compile it as a + module, choose M here. If unsure, say N. + +config IP_VS_WLIP + tristate "weighted least incoming packetrate scheduling" + ---help--- + The weighted least incoming packetrate scheduling algorithm directs + network connections to the server with the least incoming packetrate + normalized by the server weight. + + If you want to compile it in kernel, say Y. To compile it as a + module, choose M here. If unsure, say N. + comment 'IPVS SH scheduler' config IP_VS_SH_TAB_BITS diff -uprN linux-3.19-rc5-stock/net/netfilter/ipvs/Makefile linux-3.19-rc5/net/netfilter/ipvs/Makefile --- linux-3.19-rc5-stock/net/netfilter/ipvs/Makefile 2015-01-18 06:02:20.000000000 +0000 +++ linux-3.19-rc5/net/netfilter/ipvs/Makefile 2015-01-20 08:08:28.883080285 +0000 @@ -33,6 +33,8 @@ obj-$(CONFIG_IP_VS_DH) += ip_vs_dh.o obj-$(CONFIG_IP_VS_SH) += ip_vs_sh.o obj-$(CONFIG_IP_VS_SED) += ip_vs_sed.o obj-$(CONFIG_IP_VS_NQ) += ip_vs_nq.o +obj-$(CONFIG_IP_VS_WLIB) += ip_vs_wlib.o +obj-$(CONFIG_IP_VS_WLIP) += ip_vs_wlip.o # IPVS application helpers obj-$(CONFIG_IP_VS_FTP) += ip_vs_ftp.o diff -uprN linux-3.19-rc5-stock/net/netfilter/ipvs/ip_vs_wlib.c linux-3.19-rc5/net/netfilter/ipvs/ip_vs_wlib.c --- linux-3.19-rc5-stock/net/netfilter/ipvs/ip_vs_wlib.c 1970-01-01 00:00:00.000000000 +0000 +++ linux-3.19-rc5/net/netfilter/ipvs/ip_vs_wlib.c 2015-01-20 08:09:00.177816054 +0000 @@ -0,0 +1,166 @@ +/* IPVS: Weighted Least Incoming Byterate Scheduling module + * + * Authors: Chris Caputo <ccaputo@alt.net> based on code by: + * + * Wensong Zhang <wensong@linuxvirtualserver.org> + * Peter Kese <peter.kese@ijs.si> + * Julian Anastasov <ja@ssi.bg> + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * Changes: + * Chris Caputo: Based code on ip_vs_wlc.c ip_vs_rr.c. + * + */ + +/* The WLIB algorithm uses the results of the estimator's inbps + * calculations to determine which real server has the lowest incoming + * byterate. + * + * Real server weight is factored into the calculation. An example way to + * use this is if you have one server that can handle 100 Mbps of input and + * another that can handle 1 Gbps you could set the weights to be 100 and 1000 + * respectively. + */ + +#define KMSG_COMPONENT "IPVS" +#define pr_fmt(fmt) KMSG_COMPONENT ": " fmt + +#include <linux/module.h> +#include <linux/kernel.h> + +#include <net/ip_vs.h> + +static int +ip_vs_wlib_init_svc(struct ip_vs_service *svc) +{ + svc->sched_data = &svc->destinations; + return 0; +} + +static int +ip_vs_wlib_del_dest(struct ip_vs_service *svc, struct ip_vs_dest *dest) +{ + struct list_head *p; + + spin_lock_bh(&svc->sched_lock); + p = (struct list_head *)svc->sched_data; + /* dest is already unlinked, so p->prev is not valid but + * p->next is valid, use it to reach previous entry. + */ + if (p == &dest->n_list) + svc->sched_data = p->next->prev; + spin_unlock_bh(&svc->sched_lock); + return 0; +} + +/* Weighted Least Incoming Byterate scheduling */ +static struct ip_vs_dest * +ip_vs_wlib_schedule(struct ip_vs_service *svc, const struct sk_buff *skb, + struct ip_vs_iphdr *iph) +{ + struct list_head *p; + struct ip_vs_dest *dest, *last, *least = NULL; + int pass = 0; + u64 dr, lr = -1; + u32 dwgt, lwgt = 0; + + IP_VS_DBG(6, "%s(): Scheduling...\n", __func__); + + /* We calculate the load of each dest server as follows: + * (dest inbps rate) / dest->weight + * + * The comparison of dr*lwght < lr*dwght is equivalent to that of + * dr/dwght < lr/lwght if every weight is larger than zero. + * + * A server with weight=0 is quiesced and will not receive any + * new connections. + * + * In case of inactivity, highest weight is winner. And if that still makes + * for a tie, round robin is used (which is why we remember our last + * starting location in the linked list). + */ + + spin_lock_bh(&svc->sched_lock); + p = (struct list_head *)svc->sched_data; + last = dest = list_entry(p, struct ip_vs_dest, n_list); + + do { + list_for_each_entry_continue_rcu(dest, + &svc->destinations, + n_list) { + dwgt = (u32)atomic_read(&dest->weight); + if (!(dest->flags & IP_VS_DEST_F_OVERLOAD) && + dwgt > 0) { + spin_lock(&dest->stats.lock); + /* estimator's scaling doesn't matter */ + dr = dest->stats.est.inbps; + spin_unlock(&dest->stats.lock); + + if (!least || + dr * lwgt < lr * dwgt || + (!dr && !lr && dwgt > lwgt)) { + least = dest; + lr = dr; + lwgt = dwgt; + } + } + + if (dest == last) + goto stop; + } + pass++; + /* Previous dest could be unlinked, do not loop forever. + * If we stay at head there is no need for 2nd pass. + */ + } while (pass < 2 && p != &svc->destinations); + +stop: + if (least) + svc->sched_data = &least->n_list; + + spin_unlock_bh(&svc->sched_lock); + + if (least) { + IP_VS_DBG_BUF(6, + "WLIB: server %s:%u activeconns %d refcnt %d weight %d\n", + IP_VS_DBG_ADDR(least->af, &least->addr), + ntohs(least->port), + atomic_read(&least->activeconns), + atomic_read(&least->refcnt), + atomic_read(&least->weight)); + } else { + ip_vs_scheduler_err(svc, "no destination available"); + } + + return least; +} + +static struct ip_vs_scheduler ip_vs_wlib_scheduler = { + .name = "wlib", + .refcnt = ATOMIC_INIT(0), + .module = THIS_MODULE, + .n_list = LIST_HEAD_INIT(ip_vs_wlib_scheduler.n_list), + .init_service = ip_vs_wlib_init_svc, + .add_dest = NULL, + .del_dest = ip_vs_wlib_del_dest, + .schedule = ip_vs_wlib_schedule, +}; + +static int __init ip_vs_wlib_init(void) +{ + return register_ip_vs_scheduler(&ip_vs_wlib_scheduler); +} + +static void __exit ip_vs_wlib_cleanup(void) +{ + unregister_ip_vs_scheduler(&ip_vs_wlib_scheduler); + synchronize_rcu(); +} + +module_init(ip_vs_wlib_init); +module_exit(ip_vs_wlib_cleanup); +MODULE_LICENSE("GPL"); diff -uprN linux-3.19-rc5-stock/net/netfilter/ipvs/ip_vs_wlip.c linux-3.19-rc5/net/netfilter/ipvs/ip_vs_wlip.c --- linux-3.19-rc5-stock/net/netfilter/ipvs/ip_vs_wlip.c 1970-01-01 00:00:00.000000000 +0000 +++ linux-3.19-rc5/net/netfilter/ipvs/ip_vs_wlip.c 2015-01-20 08:09:07.456126624 +0000 @@ -0,0 +1,166 @@ +/* IPVS: Weighted Least Incoming Packetrate Scheduling module + * + * Authors: Chris Caputo <ccaputo@alt.net> based on code by: + * + * Wensong Zhang <wensong@linuxvirtualserver.org> + * Peter Kese <peter.kese@ijs.si> + * Julian Anastasov <ja@ssi.bg> + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * Changes: + * Chris Caputo: Based code on ip_vs_wlc.c ip_vs_rr.c. + * + */ + +/* The WLIP algorithm uses the results of the estimator's inpps + * calculations to determine which real server has the lowest incoming + * packetrate. + * + * Real server weight is factored into the calculation. An example way to + * use this is if you have one server that can handle 10 Kpps of input and + * another that can handle 100 Kpps you could set the weights to be 10 and 100 + * respectively. + */ + +#define KMSG_COMPONENT "IPVS" +#define pr_fmt(fmt) KMSG_COMPONENT ": " fmt + +#include <linux/module.h> +#include <linux/kernel.h> + +#include <net/ip_vs.h> + +static int +ip_vs_wlip_init_svc(struct ip_vs_service *svc) +{ + svc->sched_data = &svc->destinations; + return 0; +} + +static int +ip_vs_wlip_del_dest(struct ip_vs_service *svc, struct ip_vs_dest *dest) +{ + struct list_head *p; + + spin_lock_bh(&svc->sched_lock); + p = (struct list_head *)svc->sched_data; + /* dest is already unlinked, so p->prev is not valid but + * p->next is valid, use it to reach previous entry. + */ + if (p == &dest->n_list) + svc->sched_data = p->next->prev; + spin_unlock_bh(&svc->sched_lock); + return 0; +} + +/* Weighted Least Incoming Packetrate scheduling */ +static struct ip_vs_dest * +ip_vs_wlip_schedule(struct ip_vs_service *svc, const struct sk_buff *skb, + struct ip_vs_iphdr *iph) +{ + struct list_head *p; + struct ip_vs_dest *dest, *last, *least = NULL; + int pass = 0; + u32 dr, lr = -1; + u32 dwgt, lwgt = 0; + + IP_VS_DBG(6, "%s(): Scheduling...\n", __func__); + + /* We calculate the load of each dest server as follows: + * (dest inpps rate) / dest->weight + * + * The comparison of dr*lwght < lr*dwght is equivalent to that of + * dr/dwght < lr/lwght if every weight is larger than zero. + * + * A server with weight=0 is quiesced and will not receive any + * new connections. + * + * In case of inactivity, highest weight is winner. And if that still makes + * for a tie, round robin is used (which is why we remember our last + * starting location in the linked list). + */ + + spin_lock_bh(&svc->sched_lock); + p = (struct list_head *)svc->sched_data; + last = dest = list_entry(p, struct ip_vs_dest, n_list); + + do { + list_for_each_entry_continue_rcu(dest, + &svc->destinations, + n_list) { + dwgt = (u32)atomic_read(&dest->weight); + if (!(dest->flags & IP_VS_DEST_F_OVERLOAD) && + dwgt > 0) { + spin_lock(&dest->stats.lock); + /* estimator's scaling doesn't matter */ + dr = dest->stats.est.inpps; + spin_unlock(&dest->stats.lock); + + if (!least || + (u64)dr * lwgt < (u64)lr * dwgt || + (!dr && !lr && dwgt > lwgt)) { + least = dest; + lr = dr; + lwgt = dwgt; + } + } + + if (dest == last) + goto stop; + } + pass++; + /* Previous dest could be unlinked, do not loop forever. + * If we stay at head there is no need for 2nd pass. + */ + } while (pass < 2 && p != &svc->destinations); + +stop: + if (least) + svc->sched_data = &least->n_list; + + spin_unlock_bh(&svc->sched_lock); + + if (least) { + IP_VS_DBG_BUF(6, + "WLIP: server %s:%u activeconns %d refcnt %d weight %d\n", + IP_VS_DBG_ADDR(least->af, &least->addr), + ntohs(least->port), + atomic_read(&least->activeconns), + atomic_read(&least->refcnt), + atomic_read(&least->weight)); + } else { + ip_vs_scheduler_err(svc, "no destination available"); + } + + return least; +} + +static struct ip_vs_scheduler ip_vs_wlip_scheduler = { + .name = "wlip", + .refcnt = ATOMIC_INIT(0), + .module = THIS_MODULE, + .n_list = LIST_HEAD_INIT(ip_vs_wlip_scheduler.n_list), + .init_service = ip_vs_wlip_init_svc, + .add_dest = NULL, + .del_dest = ip_vs_wlip_del_dest, + .schedule = ip_vs_wlip_schedule, +}; + +static int __init ip_vs_wlip_init(void) +{ + return register_ip_vs_scheduler(&ip_vs_wlip_scheduler); +} + +static void __exit ip_vs_wlip_cleanup(void) +{ + unregister_ip_vs_scheduler(&ip_vs_wlip_scheduler); + synchronize_rcu(); +} + +module_init(ip_vs_wlip_init); +module_exit(ip_vs_wlip_cleanup); +MODULE_LICENSE("GPL"); ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/3] IPVS: add wlib & wlip schedulers 2015-01-20 23:21 ` [PATCH 2/3] " Chris Caputo @ 2015-01-22 21:07 ` Julian Anastasov 0 siblings, 0 replies; 9+ messages in thread From: Julian Anastasov @ 2015-01-22 21:07 UTC (permalink / raw) To: Chris Caputo; +Cc: Wensong Zhang, Simon Horman, lvs-devel, linux-kernel Hello, On Tue, 20 Jan 2015, Chris Caputo wrote: > On Tue, 20 Jan 2015, Julian Anastasov wrote: > > > + (u64)dr * (u64)lwgt < (u64)lr * (u64)dwgt || > [...] > > > + (dr == lr && dwgt > lwgt)) { > > > > Above check is redundant. > > I accepted your feedback and applied it to the below, except for this > item. I believe if dr and lr are zero (no traffic), we still want to > choose the higher weight, thus a separate comparison is needed. ok > + spin_lock_bh(&svc->sched_lock); > + p = (struct list_head *)svc->sched_data; > + last = dest = list_entry(p, struct ip_vs_dest, n_list); > + > + do { > + list_for_each_entry_continue_rcu(dest, > + &svc->destinations, > + n_list) { > + dwgt = (u32)atomic_read(&dest->weight); > + if (!(dest->flags & IP_VS_DEST_F_OVERLOAD) && > + dwgt > 0) { > + spin_lock(&dest->stats.lock); May be there is a way to avoid this spin_lock by using u64_stats_fetch_begin and corresponding u64_stats_update_begin in estimation_timer(). We can even remove this ->lock, it will be replaced by ->syncp. The benefit is for 64-bit platforms where we avoid lock here in the scheduler. Otherwise, I don't see other implementation problems in this patch and I'll check it more carefully this weekend. Regards -- Julian Anastasov <ja@ssi.bg> ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 3/3] IPVS: add wlib & wlip schedulers 2015-01-19 23:17 ` Julian Anastasov 2015-01-20 23:21 ` [PATCH 1/3] " Chris Caputo 2015-01-20 23:21 ` [PATCH 2/3] " Chris Caputo @ 2015-01-20 23:21 ` Chris Caputo 2 siblings, 0 replies; 9+ messages in thread From: Chris Caputo @ 2015-01-20 23:21 UTC (permalink / raw) To: Julian Anastasov; +Cc: Wensong Zhang, Simon Horman, lvs-devel, linux-kernel From: Chris Caputo <ccaputo@alt.net> IPVS wlib (Weighted Least Incoming Byterate) and wlip (Weighted Least Incoming Packetrate) scheduler docs for ipvsadm-1.27. Signed-off-by: Chris Caputo <ccaputo@alt.net> --- diff -upr ipvsadm-1.27-stock/SCHEDULERS ipvsadm-1.27/SCHEDULERS --- ipvsadm-1.27-stock/SCHEDULERS 2013-09-06 08:37:27.000000000 +0000 +++ ipvsadm-1.27/SCHEDULERS 2015-01-17 22:14:32.812597191 +0000 @@ -1 +1 @@ -rr|wrr|lc|wlc|lblc|lblcr|dh|sh|sed|nq +rr|wrr|lc|wlc|lblc|lblcr|dh|sh|sed|nq|wlib|wlip diff -upr ipvsadm-1.27-stock/ipvsadm.8 ipvsadm-1.27/ipvsadm.8 --- ipvsadm-1.27-stock/ipvsadm.8 2013-09-06 08:37:27.000000000 +0000 +++ ipvsadm-1.27/ipvsadm.8 2015-01-17 22:14:32.812597191 +0000 @@ -261,6 +261,14 @@ fixed service rate (weight) of the ith s \fBnq\fR - Never Queue: assigns an incoming job to an idle server if there is, instead of waiting for a fast one; if all the servers are busy, it adopts the Shortest Expected Delay policy to assign the job. +.sp +\fBwlib\fR - Weighted Least Incoming Byterate: directs network +connections to the real server with the least incoming byterate +normalized by the server weight. +.sp +\fBwlip\fR - Weighted Least Incoming Packetrate: directs network +connections to the real server with the least incoming packetrate +normalized by the server weight. .TP .B -p, --persistent [\fItimeout\fP] Specify that a virtual service is persistent. If this option is ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-01-27 8:37 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <Pine.LNX.4.44.0501260832210.17893-100000@nacho.alt.net> [not found] ` <Pine.LNX.4.61.0502010007060.1148@penguin.linux-vs.org> 2015-01-17 23:15 ` [PATCH 1/2] IPVS: add wlib & wlip schedulers Chris Caputo 2015-01-19 23:17 ` Julian Anastasov 2015-01-20 23:21 ` [PATCH 1/3] " Chris Caputo 2015-01-22 22:06 ` Julian Anastasov 2015-01-23 4:16 ` Chris Caputo 2015-01-27 8:36 ` Julian Anastasov 2015-01-20 23:21 ` [PATCH 2/3] " Chris Caputo 2015-01-22 21:07 ` Julian Anastasov 2015-01-20 23:21 ` [PATCH 3/3] " Chris Caputo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).