Hacker News new | past | comments | ask | show | jobs | submit login
DNA jumps between animal species, but no one knows how often (quantamagazine.org)
196 points by rbanffy 11 days ago | hide | past | favorite | 127 comments

I was on one of the teams that refuted the claims of horizontal gene transfer in the original human genome paper. The bar for establishing a true case of horizontal transfer in vertebrates is high. It’s really improbable given the required sequence of events laid out in the article. It’s one thing for some DNA to get picked up by random cells in the organism (happens with viral infection all the time). Getting to the germline cells and becoming inherited is a whole other story given that vertebrates have evolved mechanisms to guard against this specific scenario.

When "vertebrates have evolved mechanisms to guard against this specific scenario", it hardly sounds "improbable."

well, not when the protection is against any form of DNA contamination and not specifically foreign DNA intrusion

the fact that random large mutations typically lead to an inviable zygote should be enough evolutionary pressure, it doesn't need to be specific protection against the entry of external DNA

The sequences we're discussing aren't really random, though. Presumably the chance of viability with such a sequence incorporated, though still low, is much higher than if it were a truly random sequence.

Are you sure?

Having one foreign sequence which have some specific features (to keep the originating organism viable) could have a chance of never being compatible with the target organism.

Having a completely random sequence by definition have some chance of being compatible.

The question is which scenario has a higher chance of success.

This is a case I could see going either way. Random mutations are probably much smaller and closer to the original, and therefore potentially more viable. Yes, it's random, but most of the time it won't have a major effect on the proteins the DNA generates. On the other hand, if we are talking about transferring segments, there's the potential of that DNA to create actively harmful proteins.

I just read the PLOS one paper. The arguments they brought forth were strong. If this had been my paper, I would have been livid if I had been rejected. However, given the fragmented and buggy state of bioinformatics tooling and databases at the time, I can easily imagine how their extraordinary claims did not the cross the "beyond reasonable doubt" threshold. From a reviewer's perspective, a couple matching disulfide bridges and a negative Southern alone might not have convinced me either. Glad it worked out for her in the end though.

The issue with the evidence in that paper is that they used primers to amplify the specific genes of interest. That introduces a strong assumption at the start of their analysis: specifically, that these genes appeared in the genomes by some HGT process instead of independently being duplicated internally in each genome from another gene shared among the species. Whole genome sequences were not available for these species at the time. A modern, more complete analysis would look into homologs across whole genomes and try to reject that hypothesis, which is much less extraordinary than animal germline HGT.

That's precisely why the authors published the new Cell paper https://www.cell.com/action/showPdf?pii=S0168-9525%2821%2900... with stronger evidence from whole genome sequence to support the HGT hypothesis. I'm still trying to wrap my head around Figure 2 there, so I'm on the fence.

The Trends in Genetics (not Cell) paper seems plausible. I don't study fish genetics or evolution. As I remember, fish genomes tend to have more genome-wide duplications and losses in comparison to other vertebrates. One possibility is that some fish lose AFPs because they don't need them – i.e. the observation could be caused by loss of function instead of gain of function due to HGT. I have to admit that the chance of gene losses across multiple fish lineages is pretty tiny but it is at least associated with a known mechanism.

Anyways, an interesting article.

My understanding is that inherited HGT in vertebrates is now an established mainstream position and that it was mainly the low quality of the original sequences that prevented people from refuting this point (specifically in humans). A lot of the stuff published in 2001 about human genomes was later shown to be of dubious quality, massively overstating the value of the data to make strong conclusions.

So did you know about the paper in question and if so how convinced are you of the claims/evidence in this specific case?

It's almost unbelievable how DNA seems to "just work". DNA from one organism breaking apart and slipping into another somehow leads to shared expressions across horizontal species - it's absurd.

It's like taking a random byte sequence from some binary, shoving it randomly into another, and the new binary gets useful new features.

The involvement of even more complex systems like parasites makes it that much more insane to me.

> It's like taking a random byte sequence from some binary, shoving it randomly into another, and the new binary gets useful new features.

If you think about it for a moment, our genetic code is kind of designed to work that way.

You get half of your genetic code from your mom, the other half from your dad, and somehow, all of these genes "just work" together. It's kind of miraculous when you think that there are very many genes that encode how your brain works, and how your liver works, your muscles, etc. Somehow, provided the baby can be born, a mishmash of genes from two different individuals almost always works out.

Probably because they do not really encode how anything works and because, probably by necessity, the growth of organisms is a swarm intelligence that is quite self-healing.

In particular with coinjoined twins, it's quite remarkable how much the systems for body development still produce something that connects the inner workings, which was obviously not it's “purpose”,but the self-healing growth mechanisms that corrects for errors simply leads to that.

Consider the Hensel Twins who have two mouths but their digestive system at some point merges in a way that is capable of digesting. The “tubes” of their digestive tract actually merge at one point, but they have two stomachs.

"a mishmash of genes from two different individuals almost always works out" => different individuals of the _same_ species (which btw is how a "species" is defined).

The evolution of organisms that gene mishmash (aka sexual reproduction) is thought to be the result of an ongoing arms race between gene sequences that "try" to stay unchanged (in higher level species) and gene sequences that "try" to "free ride" (from viruses etc.) Being able to build members of your species from "mishmash of genes from two different individuals" has the effect of scrambling the DNA of each species member which makes attack harder.

Organisms that do not do this and reproduce via cloning (aka Parthenogenesis) are often entirely wiped out once a pathogen figures out how to target their DNA -- hence the bananas types we eat change over time.

ps: Similar evasion is used by some computer viruses: https://www.trendmicro.com/vinfo/us/security/definition/Poly...

> The evolution of organisms that gene mishmash (aka sexual reproduction) is thought to be the result of an ongoing arms race between gene sequences that "try" to stay unchanged (in higher level species) and gene sequences that "try" to "free ride" (from viruses etc.)

Sexual reproduction means your species has a very large gene pool, and individuals with new combinations of genes can be produced very quickly. That's not just an advantage against viruses. It's also very useful for adapting rapidly and competing against other species when your environment changes. New threats (and new opportunities) show up all the time, be it dwindling or changing availability of food, climate change (e.g. new ice age), new predators or new preys, and also a group of individuals migrating to a new region of the world with a different climate.

> all of these genes "just work" together

That's a little bit tautological since if the genes didn't work together they wouldn't be here after all these years, right? Fascinating nonetheless.

That seems to be the point the parent is making. Tautology is the the only way we can explain the way life happens to be -- it's because it's advantageous for it to be that way.

It should also be pointe out that about two thirds of human conceptions result into early embryonic death, so evidently it is not as smooth a ride as suggested.

I'm just picking nits, but :%s/designed/evolved/g

Do very much agree it's miraculous. Biological organisms are robust to error and chance in ways no designed system comes close to matching. It's awe-inspiring

In genetics it’s very common to say that things are designed a certain way without invoking a creator.

Yep, the designer is evolutionary pressures, not necessarily an inteligence. It's shorthand, not religious invokation.

I didn't know this was common usage, thanks! I stand corrected

No worries, you're more right now than you were before. :)

Maybe genes are declarative, not procedural har har har

It's less so when you know about some evolutionary programming techniques, such as using Lisp with a subset of the code that defines behavior in a tree structure that allows for parts of that tree to be swapped in and out from other programs using the same design whike still yielding an executing program. Combined with a fitness function you can "breed" programs for a task.

As I understand it there are attributes of Lisp and attributes of the program structure (such as putting much of the logic in that tree structure with defined split points) which makes this much more feasible than otherwise.

My guess is that DNA has evolved similarly, where the ways in which it splits and the mechanism in which it is interpreted and executed help, and also the organisms we're talking about (us and the other complex ones) have iterated to a design that's more amenable to bits being swapped. That is large chunks of us may be more similar to a lisp program with attributes that make it easy to swap parts than to a bunch of object bytecode with absolute and relative jumps all over.

Note: A lot of this is poorly remembered from a survey of AI class two decades ago, so it bears someone with a stronger background verifying I'm not making a complete hash of it.

Well, survivorship bias. We only see the results where it worked, we don't see the enormous number of attempts where it didn't.

This is crazy from the perspective of "old fashioned binaries", but less so in the context of neural networks. You can do all sorts of splicing and dicing of the bits in neural networks (their weights) and end up with useful networks. Dropout, for example, specifically trains a network to be resilient to having swaths of the network removed, and makes individual features in the network resilient to having a random selection of other features present or not-present. If I remember right, the original dropout paper even analogizes this to how genes have evolved to be resilient to this type of random pairing.

Don't try to think about genomics in programming terms. At best you'll only confuse yourself; at worst, others also. Both a computer program and a genome encode information, but that's about where the similarities end.

I can only use the models that I have, accurate or otherwise. I hope I'm not confusing others, I'm not pretending to be an expert in DNA.

> Both a computer program and a genome encode information, but that's about where the similarities end.

FWIW this is a very significant similarity to me.

One might with equal merit say that, because I know English orthography, I can also read Linear A.

I think a working analogy here would be that source code (dna) gets compiled (physics/chemistry) into an executable (living organism with even more dna) which gets executed by the os (phys/chem again) to produce changes on some data (organism interacts with the environment), and on and on..

The details are of course endless and they aren’t interchangeable between the two fields, but the analogy is still there..

Sure, but is it meaningful? What predictions does it enable that are sufficiently borne out by reality to make it seem likely that less easily testable predictions on the same basis may likewise prove sound?

I mean, I can as well say that chalk and cheese are alike in that both have mass, occupy space, and leave a streak behind when you rub them on something. It is a true statement, but what does it help me predict about either?

I think the only real takeaways on the coding/biology comparison are applying base-level informatic systems ideas to explain some biological developments, and in reverse looking at biological system mechanics as inspiration for designed systems.

I don't think the 'chemistry is an OPERATING SYSTEM' level of handwaving is sufficient to glean insights, but understanding general systems-level interactome patterns of how proteins interact does help provide knowledge about how natural and designed systems can self-regulate, how they fail, how they can be structured, etc.

Sure, at that level it makes sense. The trouble seems to be that in order to know that that's the level at which it makes sense, you need to know considerably more about informatics than is the default among programmers who like to indulge in this kind of speculation. Kind of a Dunning-Kruger problem, maybe; there certainly was a time when I likewise didn't know what I didn't know.

I'm not sure what it is you think I'm trying to say, but much of your point seems to be "don't talk about things you don't understand", which I have no interest in abiding. I like talking about things I don't understand, and I've enjoyed the posts from you and others on the topic, even if I'll only ever be a layman.

Talking about things you don't understand is no problem by me! What I'm trying to point to here is the hazard of making assumptions about something one doesn't understand, and then trying to reason about the thing based on those assumptions.

Oh sure, yeah I mean it's all in good fun. I wouldn't try to actually establish any serious thought other than "this is wild".

>Don't try to think about genomics in programming terms.

At some point, some Newton person will figure it. It always happen.

As for now, it might be interesting to understand why exactly the analogy between genomics and programming fails. It might bring interesting insights into both fields.

So why not try to think about?

Because the only way to imagine a useful comparison between these fields can be made is to be profoundly ignorant of at least one of them.

I'm profoundly ignorant in neither (PhD in biophysics, software engineer for 20 years). Genomics and programming analogies are cool, but the most important thing is that understanding that molecular structures can encode information in a replicable way, and the discovery of application of entropy to data storage and transmission, demonstrates that information is a universal concept, that the genome is a data storage system, and the enzymes that operate it are operating on information, in a computational way. To me that's a pretty useful comparison.

Rather than assert the negative, can you state some positive facts about one or the other that makes this point clear?

Software changes over spans of minutes to decades; genomes change over spans of millions of years. Software is written; genomes are not. The complexity of software is constrained by programmers' ability to comprehend it; the complexity of genomes is not. The environment in which software functions is determined by humans; the environment in which genomes function is not.

Those are trivial surface level differences relative to the central idea of encoding, storing, replicating, editing digital information, which interfaces with other digital and analog systems.

Not that there's much point to saying so, since you appear to be here for no other reason than to assert that my argument is false because you would prefer it be so, but here's another: software is digital; genomes are not.

FWIW all of these differences still feel extremely surface level. I'm no expert but I certainly am, so far, aware of everything you've said with regards to how they differ - I'm kinda hoping for more, given the strong assertion you made that one can not relate the two without being fundamentally ignorant of either topic.

I also think it's somewhat ironic that you're accusing them of only being here to say "you're wrong" but that's what you've done in this thread? I only bring this up because I think we're all after the same thing here - to understand an incredibly interesting topic.

I suspect most of us are really here to learn and discuss. You seem like you have a background in the area, I'm sure we would all benefit from learning about the differences.

If it's the case that the similar is that DNA and code both encode information, and the differences are based on how they do so, it's hard to see why you think they can't be related at all. You've been relating the two.

If I've given the impression that the difference is merely a question of varying encodings, then I have to agree my arguments have thus far been lacking.

The idea that a genome as expressed in nucleic acid is purely, and only, an informational medium, is fundamentally in error. It does encode information in the sequence of base pairs, this is true. But it is also a physical structure in its own right, and properties of that structure incidental to the encoded information have what recently looks to be at least as important a role in the process of transcription as the sequence itself.

There are, for example, some sequences which will cause a ribosome to transcribe the surrounding genes differently or with varying frequency, due to the physical interaction between the molecules involved. (I recently discussed this here in the context of recent research on causes of eye color; it should not be too far back in my comment history.) We also see, for example, that both viral and eukaryotic DNA can be and often are transcribed in ways that produce different proteins from the same sequence, again as a result of physical constraints affecting the interaction with the ribosome. This is one reason why "junk DNA" is a bit of a misnomer, and why we more recently see the term fall out of use in favor of "noncoding DNA" - these regions carry no information in their own right, but nonetheless can strongly affect the outcome of transcription because transcription is not only an informatic process. This isn't true of software; there is no general case in which two programs varying only in nonsyntactic ways will be evaluated differently under otherwise identical conditions - we create programming languages as we do in part to ensure that won't happen, and it's also part of the reason why we use transistors instead of vacuum tubes or relays: in order to engineer that kind of variance as much as we can out of existence. What is therefore an accidental property in software is an essential one in gene expression, and cannot be overlooked without reaching an inaccurate conception of how the latter process works.

That's just one example, and it's true that processes like these can be modeled in software to variously imperfect degrees of fidelity and that information-theoretical models can be useful in understanding some aspects of how they work. But that's not the same thing as them working similarly enough that understanding one very well suffices to reason about the other. I definitely can see how it's easy to assume otherwise! It's an assumption I shared, before my own yearlong exposure to the field at a sufficient level of detail to start to understand what I hadn't understood about it before, and considerable reading and study thereafter.

Unfortunately, I was there to provide engineering support to people doing that work, not to do it myself, and the knowledge I've derived from that experience apparently does not extend so far as producing a concise and positive statement of the fundamental difference between the two fields of study - I spent considerably more time teaching informaticists how to program, formally and otherwise, than I spent learning about bioinformatics. That leaves me able to recommend little beyond seeking out similar experience of your own, which I do recommend if the depth of your interest suffices -although I do also have to say working in academia as a nonacademic has very little else to recommend it.

I know there are some folks on HN with formal knowledge and training greatly exceeding my own, and some of whom have probably also had experience teaching the basics in an accessible way. Perhaps one of them might give a more useful answer here than I've been able to.

>some sequences which will cause a ribosome to transcribe the surrounding genes differently

Not to be a negative nancy here, but if we're being precise, ribosomes do not transcribe. They translate.

Under the fairly reductive central dogma of biology: DNA -> RNA (Transcription) RNA -> Protein (Translation)

Transcription and translation are separate mechanics that don't occur in the same area of the cell, and both use very different complexes to mediate the rates of each in different physical environments.

I don't disagree with any of the substantive points being made, but I think the proper terminology only adds to your argument so I found it strange that it was left out.

It's one of the drawbacks of being an autodidact; I pretty much always have to check to be sure I'm not confusing these two similar terms, and I didn't stop to check this time. Thanks for the correction.

Thanks, this was much more interesting to read, and educational for someone with a software background, which I think kind of goes to show that discussing analogs is actually a reasonable way to approach the unknown :)

Again I agree with you, because I had a similar experience. But, again, my conclusion is different than yours.

You write that we should not talk about biochemistry as computation, as far as I understand. Instead I'd say that we have not studied enough how nature does computation without programmers or even human friendly semantics.

Is still computation, involving space and physics. Too complex to efficiently simulate it (for now) but not big enough so that the emerging behaviour is simple, like for a gas.

ribosomes don't transcribe genes.

Genomes are absolutely digital. GATC is no different from 1 and 0. It's just using a different base (pun intended).

Files on disks have end of file markers, just like the start and stop sequences in DNA. Operating systems have cron jobs (themselves digital) that control when other programs execute.

You mean "DNA sequences are digital" in that base pairs map to a sequence of enumerations.

However, genomes aren't digital. They're 3D structures with a ton of attributes that are not trivially representable digitally.

In the same way software code is digital but the hard drives that hold them are not?

Genomes are much more than just their sequence. Their spatial organisation, their methylation, their fiolding, their packing etc, have no equivalents in a filesystem.

You're talking about a digital<->analog interface. Take a digitally encoded audio file, read it out and turn it into sound waves using a digital analog converter, play it out on physical speakers, record it back with a microphone, use that information to control a robotic arm with a magnet that will swipe over the physical medium... etc. They are absolutely analogous.

> software is digital; genomes are not

False by definition: Digital data is "information represented as a string of discrete symbols each of which can take only one of a finite number of values"


I agree with you here but I get to a happy conclusion. The (self- or culturally imposed) constraint on computation to be semantically meaningful for humans does not apply for genomes. But this is already useful, because it means we at least have a hint about where to dig more in programming.

There is Theory of Computation and there is Theory of Programming. Your arguments apply to TOP but not to TOC.


This all seems like minor differences.

Plenty of software is neither written nor comprehensible I can assure you of that.

Like I don't think your necessarily wrong, but pointing out the literal differences between the two topics doesn't explain to me why the analogy is wrong and therefore doesn't support your argument.

It's like saying "I'm nothing like my mother; I don't even have long hair"

I think the environment is the confounding factor rather than programmer working life-span.

An OS is just so much simpler than dynamically constrained energetic replicators in an always and everywhere collapsing wave function.

I love this. It's a little black and white, but the comparison is as between video game worlds and the real world. Only enough to fool the willing eye.

I use a variation of this form as 'persons whos science and religions conflict don't know enough about either one'.

As a bioinformatician, I cannot wait to use this quote on someone.

As an enthusiastically former staff engineer at a bioinformatics institute, I'm happy to have been of help! Please feel free to do so without attribution; if nothing else, it'd be a shame at this late date to have my opinions of the caste system in academia disturbed by the novel experience of receiving credit for my contributions to the work of people with letters after their names. :D

Department chairs > PIs >~ profs > visiting profs > assistant profs >~ visiting asst profs > postdocs > grad students > employees > undergrads > high-school interns

Dedicated grant-writing staff are gold, literally and figuratively.

(I worked at a biomedical informatics shop.)

A program has to run those sequences mostly in order. Rather than swapping around blobs of binary it's more like each gene being its own small program, and things working is much less surprising in that context.

What I'm very interested in ATM(just now after reading this topic) is how the process of evolution really works. Not the selection so much, but the actual mutations.

The last I ever learned about it, and perhaps the common belief, is that random-ish gene mutations account for it. 4 billion years doesn't seem like enough time to account for all that unless changes are heavily weighted towards doing something somewhat useful. Like there is a system at play.. Lego blocks vs bits. IDK.

If you think evolution is just selection and mutations, I can see how you'd think it's not enough (even though selection is usually a very strong force that "locks good mutations in place", which can create a compounding effect in chances of an organism's fitness, so a long time of mutations + locking good ones in place should be almost enough to convince you if you do the math). It's been some time since I studied this so I'm just going to write whatever comes to mind.

Maybe a key part of what's missing in your understanding is not so much "micro" genetics but biogeography and population genetics. You'd also want to check simulations and comparisons with real-world data on some models to see how a population evolves to see that it "really works". It's important to understand that there are different models, and for each one there's a set of "forces" and important parameters.

The bigger picture is, it's the whole interplay between mutation, selection, genetic drift, gene flow; things like differences in population size over time and space, migrations, isolation and reconnection, etc. that makes it work. You might also want to take a look into genetic/functional/morphological modularity. I've just skimmed these articles, but they seem relevant:



There's much more but my memory is murky. Ideally you would want to take a course or read a textbook on evolution. A few popsci books are ok (Dawkins, E. O. Wilson).

TLDR: What the other commenter said -- 4 billion years is a very, very, very long time.

4 billion years is a very, very, very long time.

Yeah but 4 billion is nothing when talking about a measly 256bit space, much less a 3 million base DNA strand.

The closest programming analogy for a new gene is probably dropping a new listener/sender on a message bus. It can send messages independently in response to messages that were already on the bus before it arrived. If there's a little bit of a shared language (which there is here, since the bus is chemistry itself), that can lead to new behaviors of the system without necessarily breaking anything.

I think the binary example is incredibly poor and makes understanding this much harderr.

Genes code for proteins (and promoters, etc.) and wind up in a chemical soup in flux. They're going to bounce around and do things.

Their presence will be more akin to new kinds of cars or trucks entering a highway, and they'll have different impacts to traffic (kinetics, thermodynamics).

As someone with no business even commenting on this topic, I'll ask anyway: is it possible that a virus could do this? My (again, naive) understanding of CRISPR is that it uses a virus to inject DNA fragments into an organism's cells in a way that they become usable. Is there any chance that a naturally occurring virus injected this sequence into the fish in such a way that they both incorporate it into their offspring, at which point natural selection takes over?

In which direction? Host-to-virus and virus-to-host (germline) HGT.

Virus-to-virus HGT: happens all of the time.

Retrovirus-to-host: endogenous retroviruses are ~8% of the human genome.

Host-to-virus: I don't know.

The other issue would be the size of the payload.

It seems like a big stretch, but so is life.

I doubt any of this is realistically-possible except in externally-fertilized species where either something weird happened between gametes of different species or a retrovirus infected the gametes. Hybridization may also be an explanation.

I guess host-to-virus and virus-to-host is what I meant with the question. Virus evolves in the herring incorporating the AFP gene. Then infects the smelt or gametes and injects the AFP gene. A small number of smelt incorporate the gene and then nature selects for their offspring.

> But it is very surprising, even weird, that both fish do so with the same AFP gene

Maybe that constellation of a gene is the "obvoius solution" and both fish will likely develop it by chance? Why assume the genes jump over ...

It isn't an assumption; the null hypothesis here is that that doesn't happen. Genes coding for cryoprotective proteins have indeed, as you suggest, evolved independently among various species. The resulting genes, despite all producing proteins similar enough to do the necessary job, are "radically different" and "highly diverse." [1]

What's different in this case is that, in three otherwise very distantly related species of fish, we find their antifreeze proteins are coded for by the same genes:

> But, the isolated occurrence of three very similar type II AFPs in three distantly related species (herring, smelt and sea raven) cannot be explained by this mechanism. These globular, lectin-like AFPs have a unique disulfide-bonding pattern, and share up to 85% identity in their amino acid sequences, with regions of even higher identity in their genes. A thorough search of current databases failed to find a homolog in any other species with greater than 40% amino acid sequence identity. [1]

In light of the fact that all other genes known to code for these proteins are very distinct both from this one and from one another, that three species should have a near-identical sequence coding for a near-identical protein suggests rather strongly that this version of the gene arose in one species and was then acquired by the other two, i.e., that horizontal gene transfer has occurred among these vertebrates.

[1] https://pubmed.ncbi.nlm.nih.gov/18612417/

Also, the article mentions that the introns - the "junk" DNA around the DNA that encodes the actual protein - is 95% similar.

> that three species should have a near-identical sequence coding for a near-identical protein suggests rather strongly that this version of the gene arose in one species and was then acquired by the other two

We'd strongly expect the amino acid sequence to be similar both by "convergent evolution" (each case evolved independently with the same motivation) and "lateral transfer" (one case evolved and then shared DNA across species), so this wouldn't typically distinguish those two cases.

The sibling answer about structure of introns and exons is a more convincing answer, in my opinion. I don't think we would expect to see that in convergent evolution, but we would in a copy-paste job.

On what basis do you hold any such expectation? The paper explicitly contrasts its subject with several examples of convergent evolution producing functionally equivalent, but proteomically and genomically highly distinct, outcomes - which is typical of convergent evolution in general.

That said, I agree that the similarity of adjacent noncoding sequence is also a strong indicator that convergent evolution isn't causative here.

> On what basis do you hold any such expectation?... The paper explicitly contrasts its subject with several examples of convergent evolution producing functionally equivalent, but proteomically and genomically highly distinct, outcomes

On the basis that the protein is the function here. (antifreeze protein). There might only be one good, or best local maximum, solution for this problem at the protein level. So, we would expect natural selection might converge on that one solution. And, the results of two runs would not be nearly as different as they are in cases where natural selection is optimizing for a system process.

Obligatory coding comparison:

If I asked two programmers to code a webshop, I would expect the underlying code to look substantially different - if the code looked the same, I'd take it as evidence of copying.

If I asked two programmers to code "If A then B", I would expect the underlying code to look substantially the same, whether or not they copied.

A specific antifreeze protein is the second case: both the code and the outcome. It's not part of a system which would have more freedom of variation in its solutions.

Preventing crystallization of water is the function. And again, on what basis so presume? Trivial literature review would have sufficed to reveal that there is a whole, mostly very nonhomologous, class of these proteins, not just the one. [1] It is precisely for this reason that near identity observed in the proteins used by these three unrelated fish species is surprising.

As I have already noted this morning, it is at best pointless to attempt to reason out genomics based on first principles drawn from computing. Thank you for taking the time to demonstrate the kind of error that invariably results!

[1] https://en.m.wikipedia.org/wiki/Antifreeze_protein

Even with all this 'trivial literature review', there still remains the possibility three fish might have randomly walked [or non-randomly walked] into the same solution with the same local maximum, which couldn't be distinguished from lateral transfer just by looking at the protein structure.

"A doesn't always happen this way" isn't evidence, at all, for B happening. Your logic is faulty.

Thank you for appreciating my sense of humour. As someone who has worked in a genomics lab, I think coding analogies are perfectly fine. The analogy is not in error.

Happily, the paper does not only do that! Too, there are several comments peripheral to this thread which discuss the paper's findings outside the proteome.

Far be it from me to suggest that anyone in a Hacker News thread has failed to do even the most basic of reading in a field outside their own, but I will say that the paper is linked in one of my earlier comments, should you perhaps like to renew your acquaintance with its contents.

> Happily, the paper does not only do that!

Yes, happily! Since, as I was saying in my first comment: I didn't agree with this part of the paper's abstract being relevant evidence, or your take on it; but I agreed with it in other aspects.

Yes, and your disagreement appears to proceed from an attempt to reason purely from first principles, with no sign of apprehending either the clear evidence that convergent evolution on proteins which prevent water from crystallizing into ice in no other case has produced anything like such genomic or proteomic similarity as in the case under discussion, or the infinitesimal probability of that happening by coincidence.

I'm not averse to the idea that I may be wrong on any of those points, but thus far I'm not seeing anything substantive to suspect I am likely to be so. These are just assertions that you're making, and while your reasoning itself is not unsound, the premises from which it follows as yet lack anything resembling substantiation, which is sorely needed given that those premises so contradict all available evidence.

...and, in response to your prior edit, this is coming from someone who has also worked in a genomics lab. Even if I hadn't, what point to claiming authority on that basis?

> no sign of apprehending

I apprehended it perfectly well; I'm still in disagreement, since my argument is unaffected.

> so contradict all available evidence

It doesn't, and that's what you have missed. What I said is logically harmonious with all available evidence.

By observing three fish with the same solution for antifreeze, we know that three fish have the same solution for antifreeze. This immediately contradicts any claim that all unrelated species have different solutions for antifreeze, which makes them worthy of study. It's a "black swan".

As such, whatever mechanism has caused this has not been seen to work this way elsewhere. Therefore, saying "this mechanism is not seen to work this way elsewhere" is not remarkable as evidence.

It's now a neutral statement which matches our expectation, and can't therefore be evidence against the mechanism. It's certainly not evidence for another mechanism.

I could just as well say "I have only observed horizontal transfer in N other cases, and this is not one of those N cases, therefore it is not horizontal transfer". That would be wrong, but has equal logical merit as your claim.

All of which still ignores how wildly unlikely it is that such a high degree of similarity occurs by chance.

The paper doesn't claim causality either, but only argues, in my view pretty convincingly, that lateral gene transfer is a likelier explanation for the observed similarity than any other including convergent evolution. You haven't argued otherwise, but only that convergent evolution in this case is not implausible - which is true, but answers no claim that anyone is actually making.

There's no point in that that I can see, so if you want to keep on doing it, I'm afraid you'll need to do so in the absence of an interlocutor, or at least of an interlocutor who is me.

> All of which still ignores how wildly unlikely it is that such a high degree of similarity occurs by chance.

It is wildly unlikely that I should exist through the process of evolution, to waste my afternoon on this argument, and yet: here I am :) Have a nice day.

> Even if I hadn't, what point to claiming authority on that basis?

Oh, this was a direct response to the fact that you repeatedly implied that I was ignorant and hadn't done basic reading in the field. You were wrong about that as well.

Someone disagreeing with you is not always a sign of ignorance.

It's not that you disagree with me that leads me to that surmise, but how. You have a good one too!

There are a few reasons convergent evolution is really unlikely here.

1) This particular gene isn't the obvious solution - there are many, highly diverse antifreeze proteins, not to mention other mechanisms of freeze resistance (glycerol production, for example).

2) Even if it were, the genetic code is redundant, meaning that there are often several 3-base codons that code for a given amino acid. So even if the exact amino acid sequence is what mattered, the odds of using the exact same coding to obtain that sequence is unlikely.

3) The similarity extends beyond the coding region. It includes stretches of DNA in between and flanking the gene's code itself. These are stretches of DNA that normally mutate at a much higher rate than the coding region itself, and they aren't under the selective pressure of making a working protein, so there's no real evolutionary explanation for how they'd end up so similar.

Do you think these scientists are stupid?

Part of science is questioning other peoples hypothesis, results, and analysis. This is how science works.

Yes, but speculation about the most obvious questions from someone who hasn't done any work to investigate whether it's already been addressed doesn't progress science either.

Here's the 2nd paragraph from the linked article (which is already a source someone created to help non-experts understand the main ideas):

> It isn’t surprising, then, that herrings and smelts, two groups of fish that commonly roam the northernmost reaches of the Atlantic and Pacific Oceans, both make AFPs. But it is very surprising, even weird, that both fish do so with the same AFP gene — particularly since their ancestors diverged more than 250 million years ago and the gene is absent from all the other fish species related to them.

edit: I am more sympathetic to this behavior when the topic is more politically contentious, since it may be unreasonably difficult for a layman to know the biases of the authors and the source may indeed be trying to slide something under the rug. But here we're talking about fish genetics. There's no culture war or red vs. blue divide here (I hope!)

I can’t see the scientists being harmed by a layperson’s curious engagement. Every scientist was once a curious layperson.

I didn't say it harms scientists. I said this example doesn't progress science. Curious laypersons are well and good. I would recommend they start by reading the linked Quanta article :)

Not my field but another seemingly plausible explanation, at least to me, exists. The common ancestor did have this gene and most other descendants lost the gene because it wasn’t needed and was selected against.

It is, but this is pretty much the first thing the article asks, and the first thing anyone would ask. The comment is almost acting like the scientists haven’t considered that hypothesis

Do you think this comment was helpful?


I was amazed when I learnt that viruses could move DNA from host to host. But then it makes perfect sense. It must make evolution so much more efficient.

It made me wonder whether viruses (or similar participants) would be vital to complex life evolving on other planets?

You don’t need to look that far: Syncytin, a protein involved in placental development, is derived from an ancient retrovirus[1]. In other words, a virus made mammals possible.

[1]: https://whyy.org/segments/the-placenta-went-viral-and-protom...

EDIT: If I recall correctly, endogenous retroviruses are involved in brain development as well.

Not having access to the journal article, I wonder how much of this can be explained by retroviruses inserting their DNA, which is well established to be inheritable (e.g. when the retrovirus managed to infect sperm or egg in humans).

Retroviruses could be a mechanism for the DNA jump - though we'd have to ask how they got a portion of an host's DNA - or they could be an alternative mechanism which would explain why the surrounding 'junk' DNA is identical without requiring a speculative 'DNA jump', all three fish species could have been infected by the same retrovirus.

> Graham thinks that these sequences are “definitive proof” that a small chunk of a herring chromosome made its way into a smelt’s. “If anybody wants to dispute this,” she said, “you know, I don’t see how they possibly could.”

I think the problem here is she is presenting something that is unfalsifiable and therefore problematic. I think it would then be on her and her team (or someone else who cares enough) to prove that it is possible somehow. Devise an experiment (a very clever one I'm sure) of some type that proves that DNA can be passed on this way somehow.

What's the other explanation for the sequence, including the junk around, to be almost identical in those two species?

Chance. Do we see it in other species? There’s a lot of life - shouldn’t we see this at least another time?

Distant past hybridization or would there be an accumulation of mutations unless that segment were well-conserved?

I'm no professional biologist like some of you guys, but the coincidence of marsupialism across so many otherwise quite different species has long led me to presume that marsupial features were conferred across many species by a retrovirus... I actually thought that was the consensus. Mightn't this be a similar case?

Some info on antifreeze proteins from a structural biology point of view: https://pdb101.rcsb.org/motm/120

I know too little about biology, but given the sheer number of species in the world, does this one single instance really prove this? How likely is it to see this between any two (unrelated) species in the world?

https://www.pnas.org/content/111/18/6672 - an example from the plant world. I suspect (no evidence!) it's one of those things where the more you look the more you find - in plants, at least.

It’s very unlikely, but not impossible.

Here’s a plant-to-insect example, discussed a few months ago: https://news.ycombinator.com/item?id=26600298

The article sites several instances.

Germline hxfer? Maaaaaybe a faint possibility because of how most fish procreate but never gonna happen in critters that have direct sexual contact.

viruses and bacteria doing crispr like stuff since the dawn of time. there's a whole microbiome that could affect human embryos

The alternative hypothesis is that the phylogenetic tree is wrong. Maybe those two fish actually share a recent common ancestor.

That explains why Chinese animals and humans have similar eyes. I thought it must have been some bacteria that caused the change.

"When herring are exuberantly spawning, the surrounding water can turn milky with the amount of sperm they release."

You must cut down the mightiest tree in the forest with...

How do we know that the other lineages didn't all lose the gene through convergent evolution?

Mr. Hands was on to something

This is why we can't nuke mosquitos. They are invaluable in H gene transfer and aid our evolution and aid evolving our immune systems to handle many pathogens. People don't know this and want to nuke mosquitos because itchy. But it will ruin many things.

? I don't see the function of mosquitos except as niche opportunists to increase the universe's entropy faster by breeding and killing other critters. They may be incidentally useful as food, but the biomass can be filled by less harmful to higher lifeforms insects.

Speaking of mosquitos, with all of the intense rain, southeast Texas is more akin to the deep South with enormous quantities of flying insects, i.e., moths, beetles, and tons of mosquitos. I setup the largest bug zapper I could find for one night, and it decimated about 2 lbs / 1 kg of insects in a pile so large, it clogged it and left the table it was on completely full of carcasses. IOW: the area needs more birds, if the cats and previous lack of flying insects would stop killing them.

The horror.... You massacred them. Please stop killing the poor little bugs... :'(

I guess you're going to volunteer to feed them the human blood they need to reproduce, then?

Willingly make my donation nightly.

The mosquito "needle" is highly evolved. Less painful than a thumb blood sample. Don't push them off prematurely and you get less itch, IMHO. Don't scratch and you get less itch, too.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact