LKML Archive on lore.kernel.org help / color / mirror / Atom feed
* Re: data loss when doing ls-remote and piped to command [not found] ` <85a103f6-8b3c-2f21-cc0f-04f517c0c9a1@emlix.com> @ 2021-09-16 12:17 ` Rolf Eike Beer 2021-09-16 15:49 ` Mike Galbraith 2021-09-16 17:11 ` Linus Torvalds 0 siblings, 2 replies; 9+ messages in thread From: Rolf Eike Beer @ 2021-09-16 12:17 UTC (permalink / raw) To: git, Linus Torvalds; +Cc: Tobias Ulmer, Junio C Hamano, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1740 bytes --] Am Donnerstag, 16. September 2021, 12:12:48 CEST schrieb Tobias Ulmer: > On 16/09/2021 08:38, Rolf Eike Beer wrote: > ... > > > The redirection seems to be an important part of it. I now did: > > > > git ... 2>&1 | sha256sum > > I've tried to reproduce this since yesterday, but couldn't until now: > > 2>&1 made all the difference, took less than a minute. > > Different repo, different machine, but also running Tumbleweed > 5.14.1-1-default, git 2.33.0 > > while [ "`git --git-dir=$PWD/in/linux/.git ls-remote origin 2>&1 | tee > failed.out | sha1sum`" = "7fa299e589bacdc908395730beff542b0fc684eb -" > ]; do echo -n .; done > .......... > > failed.out has multiple lines like this: > > --8<-- > 4e77f7f1261f65cff06918bc5e66d02a418fc842 refs/tags/v3.10.18^{} > f7b8df0cc81cf82a4ac6834225bddbe46a340455a4a5d52f29d08d923ce8d232b0b497da674d > d2c refs/tags/v3.18 > b2776bf7149bddd1f4161f14f79520f17fc1d71d refs/tags/v3.18^{} > --8<-- > > > Running the same on Archlinux (5.13.13-arch1-1, 2.33.0) doesn't show the > problem. > This may well turn out not to be git, but a kernel issue. Linus, since you have been hacking around in pipe.c recently, I fear this isn't entirely impossible. Have you any idea? For easier reference, the complete thread is at: https://public-inbox.org/git/85a103f6-8b3c-2f21-cc0f-04f517c0c9a1@emlix.com/T/ Eike -- Rolf Eike Beer, emlix GmbH, http://www.emlix.com Fon +49 551 30664-0, Fax +49 551 30664-11 Gothaer Platz 3, 37083 Göttingen, Germany Sitz der Gesellschaft: Göttingen, Amtsgericht Göttingen HR B 3160 Geschäftsführung: Heike Jordan, Dr. Uwe Kracke – Ust-IdNr.: DE 205 198 055 emlix - smart embedded open source [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 313 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: data loss when doing ls-remote and piped to command 2021-09-16 12:17 ` data loss when doing ls-remote and piped to command Rolf Eike Beer @ 2021-09-16 15:49 ` Mike Galbraith 2021-09-17 6:38 ` Mike Galbraith 2021-09-16 17:11 ` Linus Torvalds 1 sibling, 1 reply; 9+ messages in thread From: Mike Galbraith @ 2021-09-16 15:49 UTC (permalink / raw) To: Rolf Eike Beer, git, Linus Torvalds Cc: Tobias Ulmer, Junio C Hamano, linux-kernel On Thu, 2021-09-16 at 14:17 +0200, Rolf Eike Beer wrote: > Am Donnerstag, 16. September 2021, 12:12:48 CEST schrieb Tobias Ulmer: > > On 16/09/2021 08:38, Rolf Eike Beer wrote: > > ... > > > > > The redirection seems to be an important part of it. I now did: > > > > > > git ... 2>&1 | sha256sum > > > > I've tried to reproduce this since yesterday, but couldn't until now: > > > > 2>&1 made all the difference, took less than a minute. > > > > Different repo, different machine, but also running Tumbleweed > > 5.14.1-1-default, git 2.33.0 > > > > while [ "`git --git-dir=$PWD/in/linux/.git ls-remote origin 2>&1 | tee > > failed.out | sha1sum`" = "7fa299e589bacdc908395730beff542b0fc684eb -" > > ]; do echo -n .; done > > .......... > > > > failed.out has multiple lines like this: > > > > --8<-- > > 4e77f7f1261f65cff06918bc5e66d02a418fc842 refs/tags/v3.10.18^{} > > f7b8df0cc81cf82a4ac6834225bddbe46a340455a4a5d52f29d08d923ce8d232b0b497da674d > > d2c refs/tags/v3.18 > > b2776bf7149bddd1f4161f14f79520f17fc1d71d refs/tags/v3.18^{} > > --8<-- > > > > > > Running the same on Archlinux (5.13.13-arch1-1, 2.33.0) doesn't show the > > problem. > > This may well turn out not to be git, but a kernel issue. > > Linus, > > since you have been hacking around in pipe.c recently, I fear this isn't > entirely impossible. Have you any idea? > > For easier reference, the complete thread is at: > > https://public-inbox.org/git/85a103f6-8b3c-2f21-cc0f-04f517c0c9a1@emlix.com/T/ > I use git-daemon (2.33) and reference clones for my local pile of kernel trees (74), so out of curiosity, modified the above ls-remote loop to fit one of them, and tried to reproduce with both master.today (ff1ffd71) and SUSE's stable branch (where Tumbleweed gets source, currently at 5.14.4). Both kernels failed to reproduce given a few minutes each (zzzz) to do so. I'm running Leap-15.3 vs Tumbleweed, but that shouldn't matter. -Mike ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: data loss when doing ls-remote and piped to command 2021-09-16 15:49 ` Mike Galbraith @ 2021-09-17 6:38 ` Mike Galbraith 0 siblings, 0 replies; 9+ messages in thread From: Mike Galbraith @ 2021-09-17 6:38 UTC (permalink / raw) To: Rolf Eike Beer, git, Linus Torvalds Cc: Tobias Ulmer, Junio C Hamano, linux-kernel On Thu, 2021-09-16 at 17:49 +0200, Mike Galbraith wrote: > Both kernels failed to reproduce... Nor did the TW kernel (now 5.14.2-1-default) reproduce, neither in my Leap-15.3 box, nor in a TW KVM set up to play server. 'course that doesn't mean there's no kernel bug lurking, means with certainty only that if there is one, the posted reproducer ain't all that wonderful. -Mike ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: data loss when doing ls-remote and piped to command 2021-09-16 12:17 ` data loss when doing ls-remote and piped to command Rolf Eike Beer 2021-09-16 15:49 ` Mike Galbraith @ 2021-09-16 17:11 ` Linus Torvalds 2021-09-16 20:42 ` Junio C Hamano 1 sibling, 1 reply; 9+ messages in thread From: Linus Torvalds @ 2021-09-16 17:11 UTC (permalink / raw) To: Rolf Eike Beer Cc: Git List Mailing, Tobias Ulmer, Junio C Hamano, Linux Kernel Mailing List On Thu, Sep 16, 2021 at 5:17 AM Rolf Eike Beer <eb@emlix.com> wrote: > > Am Donnerstag, 16. September 2021, 12:12:48 CEST schrieb Tobias Ulmer: > > > The redirection seems to be an important part of it. I now did: > > > > > > git ... 2>&1 | sha256sum > > > > I've tried to reproduce this since yesterday, but couldn't until now: > > > > 2>&1 made all the difference, took less than a minute. So if that redirection is what matters, and what causes problems, I can almost guarantee that the reason is very simple: Your git repository (or more likely your upstream) has some problem, it's getting reported on stderr, and because you mix stdout and stderr with that '2>&1', you get randomly mixed output. Then it depends on timing where the mixing happens. Or rather, it depends on various different factors, like the buffering done internally by stdio (where stdout generally will be block-buffered, while stderr is usually line-buffered, which is why you get odd mixing of the two). But timing can be an effect particularly with "git ls-remote" and friends, because you may get errors from the transport asynchronously. So the different buffering ends up causing the effect of mixing things in the middle of lines, while the timing differences due to the asynchronous nature of the remote access pipeline will likely then cause that odd mixing to be different. End result: corrupted lines, and different sha256sum every time. > > Running the same on Archlinux (5.13.13-arch1-1, 2.33.0) doesn't show the > > problem. > > This may well turn out not to be git, but a kernel issue. Much more likely that the other box just doesn't have the error situation. > since you have been hacking around in pipe.c recently, I fear this isn't > entirely impossible. Have you any idea? Almost certainly not the kernel. Kernel - and other - differences could affect timing, of course, but the whole "2>&1" really is fundamentally bogus. If you don't have any errors, then the "2>&1" doesn't matter. And if you *do* have errors, then by definition the "2>&1" will mix in the errors with the output randomly and piping them together is senseless. Either way, it's wrong. So what I'd suggest Tobias should do is git ... 2> err | sha256sum which will send the errors to the "err" file. Take a look at that file afterwards and see what is in it. Basically, '2&>1" is almost never the right thing to do, unless you explicitly don't care about the output and just want to suppress it. So "2&>1 > /dev/null" is common and natural. Of course, people also use it when they just want to eyeball the errors mixed in, so doing that ... 2&>1 | less thing isn't necessarily *wrong*, but it's somewhat dangerous and confusing. Because when you do it you do need to be very aware of the fact that the errors and output will be *mixed*. And the mixing will not necessarily be at all sensible. Finally: pipes on a low level guarantee certain atomicity constraints, so if you do low-level "write()" calls of size PIPE_BUF or less, the contents will not be interleaved randomly. HOWEVER. That's only true at that "write()" level. The moment you use <stdio> for your IO, you have buffering inside of the standard IO libraries, and if your code isn't explicitly very careful about it, using setbuf() and fflush() and friends, you'll get that random mixing. Anyway. That was a long email just to tell people it's almost certainly user error, not the kernel. Linus ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: data loss when doing ls-remote and piped to command 2021-09-16 17:11 ` Linus Torvalds @ 2021-09-16 20:42 ` Junio C Hamano 2021-09-17 6:59 ` Rolf Eike Beer 0 siblings, 1 reply; 9+ messages in thread From: Junio C Hamano @ 2021-09-16 20:42 UTC (permalink / raw) To: Linus Torvalds Cc: Rolf Eike Beer, Git List Mailing, Tobias Ulmer, Linux Kernel Mailing List Linus Torvalds <torvalds@linux-foundation.org> writes: > On Thu, Sep 16, 2021 at 5:17 AM Rolf Eike Beer <eb@emlix.com> wrote: >> >> Am Donnerstag, 16. September 2021, 12:12:48 CEST schrieb Tobias Ulmer: >> > > The redirection seems to be an important part of it. I now did: >> > > >> > > git ... 2>&1 | sha256sum >> > >> > I've tried to reproduce this since yesterday, but couldn't until now: >> > >> > 2>&1 made all the difference, took less than a minute. > > So if that redirection is what matters, and what causes problems, I > can almost guarantee that the reason is very simple: > ... > Anyway. That was a long email just to tell people it's almost > certainly user error, not the kernel. Yes, 2>&1 will mix messages from the standard error stream at random places in the output, which explains the checksum quite well. I am not sure if it explains the initial report where ls-remote 2>&1 | less produced > 6f38b5d6cfd43dde3058a10c68baae9cf17af912 refs/tags/v5.0-rc2 > 1c7fc5cbc33980acd13ae83d0b416db002fe95601e7f97f64b59514d936 refs/tags/v5.7-rc2^{} > d0709bb6da2ab6d49b11643e98abdf79b1a2817f refs/tags/v5.7-rc3 What we see on the second line is the beginning of peeled v5.0-rc2^{} up to the "acd13" (that is, the first 19 bytes of the line), followed by the full line for peeled v5.7-rc2^{} (which begins with "ae83d"). 12407 bytes in between are missing, which is even more puzzling as it is not a nice round number. I can sort of guess that the progress display during transfer, which comes out on the standard error stream and uses terminal control sequences like "go back to the end of the line without feeding a new line", "erase to the end of the line", etc., would be contributing, but because it is piped to "less", which would make it "visible" (i.e. you do not get the raw escape but see three capital letters ESC in reverse), it does not quite explain how the display was broken. In any case, I do not think the kernel is involved, or more generally I do not think any "loss of output bytes" is happening here. It's just "| less" that failed to show a range about 12k bytes long is mystery to me ;-). ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: data loss when doing ls-remote and piped to command 2021-09-16 20:42 ` Junio C Hamano @ 2021-09-17 6:59 ` Rolf Eike Beer 2021-09-17 19:13 ` Jeff King ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Rolf Eike Beer @ 2021-09-17 6:59 UTC (permalink / raw) To: Linus Torvalds, Junio C Hamano Cc: Git List Mailing, Tobias Ulmer, Linux Kernel Mailing List [-- Attachment #1: Type: text/plain, Size: 2746 bytes --] Am Donnerstag, 16. September 2021, 22:42:22 CEST schrieb Junio C Hamano: > Linus Torvalds <torvalds@linux-foundation.org> writes: > > On Thu, Sep 16, 2021 at 5:17 AM Rolf Eike Beer <eb@emlix.com> wrote: > >> Am Donnerstag, 16. September 2021, 12:12:48 CEST schrieb Tobias Ulmer: > >> > > The redirection seems to be an important part of it. I now did: > >> > > > >> > > git ... 2>&1 | sha256sum > >> > > >> > I've tried to reproduce this since yesterday, but couldn't until now: > >> > > >> > 2>&1 made all the difference, took less than a minute. > > > > So if that redirection is what matters, and what causes problems, I > > can almost guarantee that the reason is very simple: > > ... > > Anyway. That was a long email just to tell people it's almost > > certainly user error, not the kernel. > > Yes, 2>&1 will mix messages from the standard error stream at random > places in the output, which explains the checksum quite well. If there would be any errors. The point is: if I run the command with ">/dev/ null" just to the terminals a hundred times there is never any output on stderr at all. If I pipe stderr into a file it's empty after all of this (yes, I did append, not overwrite). That the particular construct in this case is sort of nonsense is granted, I just hit it because some tool here used some very similar construct and suddenly started failing. "less" isn't the original reproducer, it was just something I started testing with to be able to easily visually inspect the output. What you need is a _fast_ git server. kernel.org or github.com seem to be too slow for this if you don't sit somewhere in their datacenter. Use something in your local network, a Xeon E5 with lot's of RAM and connected with 1GBit/s Ethernet in my case. And the reader must be "somewhat" slow. Using sha256sum works reliably for me. Using "wc -l" does not, also md5sum and sha1sum are too fast as it seems. When I run the whole thing with strace I can't see the effect, which isn't really surprising. But there is a difference between the cases where I run with redirection "2>&1": ioctl(2, TCGETS, 0x7ffd6f119b10) = -1 ENOTTY (Inappropriate ioctl for device) and without: ioctl(2, TCGETS, {B38400 opost isig icanon echo ...}) = 0 AFAICT this is the only place where fd 2 is used at all during the whole time. Regards, Eike -- Rolf Eike Beer, emlix GmbH, http://www.emlix.com Fon +49 551 30664-0, Fax +49 551 30664-11 Gothaer Platz 3, 37083 Göttingen, Germany Sitz der Gesellschaft: Göttingen, Amtsgericht Göttingen HR B 3160 Geschäftsführung: Heike Jordan, Dr. Uwe Kracke – Ust-IdNr.: DE 205 198 055 emlix - smart embedded open source [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 313 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: data loss when doing ls-remote and piped to command 2021-09-17 6:59 ` Rolf Eike Beer @ 2021-09-17 19:13 ` Jeff King 2021-09-17 19:28 ` Linus Torvalds 2021-09-18 6:33 ` Mike Galbraith 2 siblings, 0 replies; 9+ messages in thread From: Jeff King @ 2021-09-17 19:13 UTC (permalink / raw) To: Rolf Eike Beer Cc: Linus Torvalds, Junio C Hamano, Git List Mailing, Tobias Ulmer, Linux Kernel Mailing List On Fri, Sep 17, 2021 at 08:59:07AM +0200, Rolf Eike Beer wrote: > What you need is a _fast_ git server. kernel.org or github.com seem to be too > slow for this if you don't sit somewhere in their datacenter. Use something in > your local network, a Xeon E5 with lot's of RAM and connected with 1GBit/s > Ethernet in my case. One thing that puzzled me here: is the bad output between the server and ls-remote, or between ls-remote and its output pipe? I'd guess it has to be the latter, since otherwise ls-remote itself would barf with an error message. In that case, I'd think "git ls-remote ." would give you the fastest outcome, because it's talking to upload-pack on the local box. But I'm also confused how the speed could matter, as ls-remote reads the entire input into an in-memory array, and then formats it. We do the write using printf(). Is it possible your libc's stdio may drop bytes when the pipe is full, rather than blocking? In general, I'd expect write() to block, so libc doesn't have to care at all. But might there be something in your environment putting the pipe into non-blocking mode, and we get EAGAIN or something? If so, I'd expect stdio to return the error. Maybe patching Git like this would help: diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c index f4fd823af8..5936b2b42c 100644 --- a/builtin/ls-remote.c +++ b/builtin/ls-remote.c @@ -146,7 +146,8 @@ int cmd_ls_remote(int argc, const char **argv, const char *prefix) const struct ref_array_item *ref = ref_array.items[i]; if (show_symref_target && ref->symref) printf("ref: %s\t%s\n", ref->symref, ref->refname); - printf("%s\t%s\n", oid_to_hex(&ref->objectname), ref->refname); + if (printf("%s\t%s\n", oid_to_hex(&ref->objectname), ref->refname) < 0) + die_errno("printf failed"); status = 0; /* we found something */ } > And the reader must be "somewhat" slow. Using sha256sum works reliably for me. > Using "wc -l" does not, also md5sum and sha1sum are too fast as it seems. If a slow pipe is involved, maybe: git ls-remote . | (sleep 5; cat) | sha256sum would help reproduce. Assuming ls-remote's output is bigger than your system pipe buffer (which is another interesting thing to check), then it should block for 5 seconds on write() midway through the output, which you can verify with strace. -Peff ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: data loss when doing ls-remote and piped to command 2021-09-17 6:59 ` Rolf Eike Beer 2021-09-17 19:13 ` Jeff King @ 2021-09-17 19:28 ` Linus Torvalds 2021-09-18 6:33 ` Mike Galbraith 2 siblings, 0 replies; 9+ messages in thread From: Linus Torvalds @ 2021-09-17 19:28 UTC (permalink / raw) To: Rolf Eike Beer Cc: Junio C Hamano, Git List Mailing, Tobias Ulmer, Linux Kernel Mailing List On Thu, Sep 16, 2021 at 11:59 PM Rolf Eike Beer <eb@emlix.com> wrote: > > When I run the whole thing with strace I can't see the effect, which isn't > really surprising. But there is a difference between the cases where I run > with redirection "2>&1": > > ioctl(2, TCGETS, 0x7ffd6f119b10) = -1 ENOTTY (Inappropriate ioctl for device) Ehh. That format of strace implies that you didn't use "strace -f" (which would have the PID in it). Although maybe you edited it out. I think the error output would come from the other process (ssh, or whatever process you use to run "git-upload-pack" on the other end). I still strongly doubt it's about pipes - we've had changes to them, but if they are broken we'd see a lot more breakage than some very incidental use by git. But I can easily see it being timing-dependent. And yes, sadly 'strace' can often end up hiding any timing issues because it obviously slows down the target quite a bit. Doing "strace -o tracefile -f" in a loop would be interesting if you can reproduce it (and then stop when you reproduce it, so that the final 'tracefile' is the one for the case that reproduced it). Linus ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: data loss when doing ls-remote and piped to command 2021-09-17 6:59 ` Rolf Eike Beer 2021-09-17 19:13 ` Jeff King 2021-09-17 19:28 ` Linus Torvalds @ 2021-09-18 6:33 ` Mike Galbraith 2 siblings, 0 replies; 9+ messages in thread From: Mike Galbraith @ 2021-09-18 6:33 UTC (permalink / raw) To: Rolf Eike Beer, Linus Torvalds, Junio C Hamano Cc: Git List Mailing, Tobias Ulmer, Linux Kernel Mailing List On Fri, 2021-09-17 at 08:59 +0200, Rolf Eike Beer wrote: > > What you need is a _fast_ git server. kernel.org or github.com seem to be too > slow for this if you don't sit somewhere in their datacenter. Use something in > your local network, a Xeon E5 with lot's of RAM and connected with 1GBit/s > Ethernet in my case. Even faster: what's coming across that wire should be a constant (is?), variable is only delivery/consumption jitter. If there's really really a pipe problem lurking, you should also be able to trigger by saving the data once, and just catting it, letting interrupts etc provide jitter. Which stdout is left of '|' in a script shouldn't matter one whit to the interpreter/kernel conversation, they're all the same. That said, if I had a reproducer I was confident pointed to the kernel, I'd try to bisect.. boring as hell, but highly effective. -Mike ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2021-09-18 6:33 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <6786526.72e2EbofS7@devpool47> [not found] ` <2279155.Qy0YqsFniq@devpool47> [not found] ` <85a103f6-8b3c-2f21-cc0f-04f517c0c9a1@emlix.com> 2021-09-16 12:17 ` data loss when doing ls-remote and piped to command Rolf Eike Beer 2021-09-16 15:49 ` Mike Galbraith 2021-09-17 6:38 ` Mike Galbraith 2021-09-16 17:11 ` Linus Torvalds 2021-09-16 20:42 ` Junio C Hamano 2021-09-17 6:59 ` Rolf Eike Beer 2021-09-17 19:13 ` Jeff King 2021-09-17 19:28 ` Linus Torvalds 2021-09-18 6:33 ` Mike Galbraith
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).