the fact that random large mutations typically lead to an inviable zygote should be enough evolutionary pressure, it doesn't need to be specific protection against the entry of external DNA
Having one foreign sequence which have some specific features (to keep the originating organism viable) could have a chance of never being compatible with the target organism.
Having a completely random sequence by definition have some chance of being compatible.
The question is which scenario has a higher chance of success.
That's precisely why the authors published the new Cell paper https://www.cell.com/action/showPdf?pii=S0168-9525%2821%2900... with stronger evidence from whole genome sequence to support the HGT hypothesis. I'm still trying to wrap my head around Figure 2 there, so I'm on the fence.
Anyways, an interesting article.
It's like taking a random byte sequence from some binary, shoving it randomly into another, and the new binary gets useful new features.
The involvement of even more complex systems like parasites makes it that much more insane to me.
If you think about it for a moment, our genetic code is kind of designed to work that way.
You get half of your genetic code from your mom, the other half from your dad, and somehow, all of these genes "just work" together. It's kind of miraculous when you think that there are very many genes that encode how your brain works, and how your liver works, your muscles, etc. Somehow, provided the baby can be born, a mishmash of genes from two different individuals almost always works out.
In particular with coinjoined twins, it's quite remarkable how much the systems for body development still produce something that connects the inner workings, which was obviously not it's “purpose”,but the self-healing growth mechanisms that corrects for errors simply leads to that.
Consider the Hensel Twins who have two mouths but their digestive system at some point merges in a way that is capable of digesting. The “tubes” of their digestive tract actually merge at one point, but they have two stomachs.
The evolution of organisms that gene mishmash (aka sexual reproduction) is thought to be the result of an ongoing arms race between gene sequences that "try" to stay unchanged (in higher level species) and gene sequences that "try" to "free ride" (from viruses etc.) Being able to build members of your species from "mishmash of genes from two different individuals" has the effect of scrambling the DNA of each species member which makes attack harder.
Organisms that do not do this and reproduce via cloning (aka Parthenogenesis) are often entirely wiped out once a pathogen figures out how to target their DNA -- hence the bananas types we eat change over time.
ps: Similar evasion is used by some computer viruses: https://www.trendmicro.com/vinfo/us/security/definition/Poly...
Sexual reproduction means your species has a very large gene pool, and individuals with new combinations of genes can be produced very quickly. That's not just an advantage against viruses. It's also very useful for adapting rapidly and competing against other species when your environment changes. New threats (and new opportunities) show up all the time, be it dwindling or changing availability of food, climate change (e.g. new ice age), new predators or new preys, and also a group of individuals migrating to a new region of the world with a different climate.
That's a little bit tautological since if the genes didn't work together they wouldn't be here after all these years, right? Fascinating nonetheless.
Do very much agree it's miraculous. Biological organisms are robust to error and chance in ways no designed system comes close to matching. It's awe-inspiring
As I understand it there are attributes of Lisp and attributes of the program structure (such as putting much of the logic in that tree structure with defined split points) which makes this much more feasible than otherwise.
My guess is that DNA has evolved similarly, where the ways in which it splits and the mechanism in which it is interpreted and executed help, and also the organisms we're talking about (us and the other complex ones) have iterated to a design that's more amenable to bits being swapped. That is large chunks of us may be more similar to a lisp program with attributes that make it easy to swap parts than to a bunch of object bytecode with absolute and relative jumps all over.
Note: A lot of this is poorly remembered from a survey of AI class two decades ago, so it bears someone with a stronger background verifying I'm not making a complete hash of it.
> Both a computer program and a genome encode information, but that's about where the similarities end.
FWIW this is a very significant similarity to me.
The details are of course endless and they aren’t interchangeable between the two fields, but the analogy is still there..
I mean, I can as well say that chalk and cheese are alike in that both have mass, occupy space, and leave a streak behind when you rub them on something. It is a true statement, but what does it help me predict about either?
I don't think the 'chemistry is an OPERATING SYSTEM' level of handwaving is sufficient to glean insights, but understanding general systems-level interactome patterns of how proteins interact does help provide knowledge about how natural and designed systems can self-regulate, how they fail, how they can be structured, etc.
At some point, some Newton person will figure it. It always happen.
As for now, it might be interesting to understand why exactly the analogy between genomics and programming fails. It might bring interesting insights into both fields.
So why not try to think about?
I also think it's somewhat ironic that you're accusing them of only being here to say "you're wrong" but that's what you've done in this thread? I only bring this up because I think we're all after the same thing here - to understand an incredibly interesting topic.
I suspect most of us are really here to learn and discuss. You seem like you have a background in the area, I'm sure we would all benefit from learning about the differences.
If it's the case that the similar is that DNA and code both encode information, and the differences are based on how they do so, it's hard to see why you think they can't be related at all. You've been relating the two.
The idea that a genome as expressed in nucleic acid is purely, and only, an informational medium, is fundamentally in error. It does encode information in the sequence of base pairs, this is true. But it is also a physical structure in its own right, and properties of that structure incidental to the encoded information have what recently looks to be at least as important a role in the process of transcription as the sequence itself.
There are, for example, some sequences which will cause a ribosome to transcribe the surrounding genes differently or with varying frequency, due to the physical interaction between the molecules involved. (I recently discussed this here in the context of recent research on causes of eye color; it should not be too far back in my comment history.) We also see, for example, that both viral and eukaryotic DNA can be and often are transcribed in ways that produce different proteins from the same sequence, again as a result of physical constraints affecting the interaction with the ribosome. This is one reason why "junk DNA" is a bit of a misnomer, and why we more recently see the term fall out of use in favor of "noncoding DNA" - these regions carry no information in their own right, but nonetheless can strongly affect the outcome of transcription because transcription is not only an informatic process. This isn't true of software; there is no general case in which two programs varying only in nonsyntactic ways will be evaluated differently under otherwise identical conditions - we create programming languages as we do in part to ensure that won't happen, and it's also part of the reason why we use transistors instead of vacuum tubes or relays: in order to engineer that kind of variance as much as we can out of existence. What is therefore an accidental property in software is an essential one in gene expression, and cannot be overlooked without reaching an inaccurate conception of how the latter process works.
That's just one example, and it's true that processes like these can be modeled in software to variously imperfect degrees of fidelity and that information-theoretical models can be useful in understanding some aspects of how they work. But that's not the same thing as them working similarly enough that understanding one very well suffices to reason about the other. I definitely can see how it's easy to assume otherwise! It's an assumption I shared, before my own yearlong exposure to the field at a sufficient level of detail to start to understand what I hadn't understood about it before, and considerable reading and study thereafter.
Unfortunately, I was there to provide engineering support to people doing that work, not to do it myself, and the knowledge I've derived from that experience apparently does not extend so far as producing a concise and positive statement of the fundamental difference between the two fields of study - I spent considerably more time teaching informaticists how to program, formally and otherwise, than I spent learning about bioinformatics. That leaves me able to recommend little beyond seeking out similar experience of your own, which I do recommend if the depth of your interest suffices -although I do also have to say working in academia as a nonacademic has very little else to recommend it.
I know there are some folks on HN with formal knowledge and training greatly exceeding my own, and some of whom have probably also had experience teaching the basics in an accessible way. Perhaps one of them might give a more useful answer here than I've been able to.
Not to be a negative nancy here, but if we're being precise, ribosomes do not transcribe. They translate.
Under the fairly reductive central dogma of biology:
DNA -> RNA (Transcription)
RNA -> Protein (Translation)
Transcription and translation are separate mechanics that don't occur in the same area of the cell, and both use very different complexes to mediate the rates of each in different physical environments.
I don't disagree with any of the substantive points being made, but I think the proper terminology only adds to your argument so I found it strange that it was left out.
You write that we should not talk about biochemistry as computation, as far as I understand. Instead I'd say that we have not studied enough how nature does computation without programmers or even human friendly semantics.
Is still computation, involving space and physics. Too complex to efficiently simulate it (for now) but not big enough so that the emerging behaviour is simple, like for a gas.
Files on disks have end of file markers, just like the start and stop sequences in DNA. Operating systems have cron jobs (themselves digital) that control when other programs execute.
However, genomes aren't digital. They're 3D structures with a ton of attributes that are not trivially representable digitally.
False by definition: Digital data is "information represented as a string of discrete symbols each of which can take only one of a finite number of values"
There is Theory of Computation and there is Theory of Programming. Your arguments apply to TOP but not to TOC.
Plenty of software is neither written nor comprehensible I can assure you of that.
Like I don't think your necessarily wrong, but pointing out the literal differences between the two topics doesn't explain to me why the analogy is wrong and therefore doesn't support your argument.
It's like saying "I'm nothing like my mother; I don't even have long hair"
An OS is just so much simpler than dynamically constrained energetic replicators in an always and everywhere collapsing wave function.
I use a variation of this form as 'persons whos science and religions conflict don't know enough about either one'.
Dedicated grant-writing staff are gold, literally and figuratively.
(I worked at a biomedical informatics shop.)
The last I ever learned about it, and perhaps the common belief, is that random-ish gene mutations account for it. 4 billion years doesn't seem like enough time to account for all that unless changes are heavily weighted towards doing something somewhat useful. Like there is a system at play.. Lego blocks vs bits. IDK.
Maybe a key part of what's missing in your understanding is not so much "micro" genetics but biogeography and population genetics. You'd also want to check simulations and comparisons with real-world data on some models to see how a population evolves to see that it "really works". It's important to understand that there are different models, and for each one there's a set of "forces" and important parameters.
The bigger picture is, it's the whole interplay between mutation, selection, genetic drift, gene flow; things like differences in population size over time and space, migrations, isolation and reconnection, etc. that makes it work. You might also want to take a look into genetic/functional/morphological modularity. I've just skimmed these articles, but they seem relevant:
There's much more but my memory is murky. Ideally you would want to take a course or read a textbook on evolution. A few popsci books are ok (Dawkins, E. O. Wilson).
TLDR: What the other commenter said -- 4 billion years is a very, very, very long time.
Genes code for proteins (and promoters, etc.) and wind up in a chemical soup in flux. They're going to bounce around and do things.
Their presence will be more akin to new kinds of cars or trucks entering a highway, and they'll have different impacts to traffic (kinetics, thermodynamics).
Virus-to-virus HGT: happens all of the time.
Retrovirus-to-host: endogenous retroviruses are ~8% of the human genome.
Host-to-virus: I don't know.
The other issue would be the size of the payload.
It seems like a big stretch, but so is life.
I doubt any of this is realistically-possible except in externally-fertilized species where either something weird happened between gametes of different species or a retrovirus infected the gametes. Hybridization may also be an explanation.
Maybe that constellation of a gene is the "obvoius solution" and both fish will likely develop it by chance? Why assume the genes jump over ...
What's different in this case is that, in three otherwise very distantly related species of fish, we find their antifreeze proteins are coded for by the same genes:
> But, the isolated occurrence of three very similar type II AFPs in three distantly related species (herring, smelt and sea raven) cannot be explained by this mechanism. These globular, lectin-like AFPs have a unique disulfide-bonding pattern, and share up to 85% identity in their amino acid sequences, with regions of even higher identity in their genes. A thorough search of current databases failed to find a homolog in any other species with greater than 40% amino acid sequence identity. 
In light of the fact that all other genes known to code for these proteins are very distinct both from this one and from one another, that three species should have a near-identical sequence coding for a near-identical protein suggests rather strongly that this version of the gene arose in one species and was then acquired by the other two, i.e., that horizontal gene transfer has occurred among these vertebrates.
We'd strongly expect the amino acid sequence to be similar both by "convergent evolution" (each case evolved independently with the same motivation) and "lateral transfer" (one case evolved and then shared DNA across species), so this wouldn't typically distinguish those two cases.
The sibling answer about structure of introns and exons is a more convincing answer, in my opinion. I don't think we would expect to see that in convergent evolution, but we would in a copy-paste job.
That said, I agree that the similarity of adjacent noncoding sequence is also a strong indicator that convergent evolution isn't causative here.
On the basis that the protein is the function here. (antifreeze protein). There might only be one good, or best local maximum, solution for this problem at the protein level. So, we would expect natural selection might converge on that one solution. And, the results of two runs would not be nearly as different as they are in cases where natural selection is optimizing for a system process.
Obligatory coding comparison:
If I asked two programmers to code a webshop, I would expect the underlying code to look substantially different - if the code looked the same, I'd take it as evidence of copying.
If I asked two programmers to code "If A then B", I would expect the underlying code to look substantially the same, whether or not they copied.
A specific antifreeze protein is the second case: both the code and the outcome. It's not part of a system which would have more freedom of variation in its solutions.
As I have already noted this morning, it is at best pointless to attempt to reason out genomics based on first principles drawn from computing. Thank you for taking the time to demonstrate the kind of error that invariably results!
"A doesn't always happen this way" isn't evidence, at all, for B happening. Your logic is faulty.
Thank you for appreciating my sense of humour. As someone who has worked in a genomics lab, I think coding analogies are perfectly fine. The analogy is not in error.
Far be it from me to suggest that anyone in a Hacker News thread has failed to do even the most basic of reading in a field outside their own, but I will say that the paper is linked in one of my earlier comments, should you perhaps like to renew your acquaintance with its contents.
Yes, happily! Since, as I was saying in my first comment: I didn't agree with this part of the paper's abstract being relevant evidence, or your take on it; but I agreed with it in other aspects.
I'm not averse to the idea that I may be wrong on any of those points, but thus far I'm not seeing anything substantive to suspect I am likely to be so. These are just assertions that you're making, and while your reasoning itself is not unsound, the premises from which it follows as yet lack anything resembling substantiation, which is sorely needed given that those premises so contradict all available evidence.
...and, in response to your prior edit, this is coming from someone who has also worked in a genomics lab. Even if I hadn't, what point to claiming authority on that basis?
I apprehended it perfectly well; I'm still in disagreement, since my argument is unaffected.
> so contradict all available evidence
It doesn't, and that's what you have missed. What I said is logically harmonious with all available evidence.
By observing three fish with the same solution for antifreeze, we know that three fish have the same solution for antifreeze. This immediately contradicts any claim that all unrelated species have different solutions for antifreeze, which makes them worthy of study. It's a "black swan".
As such, whatever mechanism has caused this has not been seen to work this way elsewhere. Therefore, saying "this mechanism is not seen to work this way elsewhere" is not remarkable as evidence.
It's now a neutral statement which matches our expectation, and can't therefore be evidence against the mechanism. It's certainly not evidence for another mechanism.
I could just as well say "I have only observed horizontal transfer in N other cases, and this is not one of those N cases, therefore it is not horizontal transfer". That would be wrong, but has equal logical merit as your claim.
The paper doesn't claim causality either, but only argues, in my view pretty convincingly, that lateral gene transfer is a likelier explanation for the observed similarity than any other including convergent evolution. You haven't argued otherwise, but only that convergent evolution in this case is not implausible - which is true, but answers no claim that anyone is actually making.
There's no point in that that I can see, so if you want to keep on doing it, I'm afraid you'll need to do so in the absence of an interlocutor, or at least of an interlocutor who is me.
It is wildly unlikely that I should exist through the process of evolution, to waste my afternoon on this argument, and yet: here I am :) Have a nice day.
Oh, this was a direct response to the fact that you repeatedly implied that I was ignorant and hadn't done basic reading in the field. You were wrong about that as well.
Someone disagreeing with you is not always a sign of ignorance.
1) This particular gene isn't the obvious solution - there are many, highly diverse antifreeze proteins, not to mention other mechanisms of freeze resistance (glycerol production, for example).
2) Even if it were, the genetic code is redundant, meaning that there are often several 3-base codons that code for a given amino acid. So even if the exact amino acid sequence is what mattered, the odds of using the exact same coding to obtain that sequence is unlikely.
3) The similarity extends beyond the coding region. It includes stretches of DNA in between and flanking the gene's code itself. These are stretches of DNA that normally mutate at a much higher rate than the coding region itself, and they aren't under the selective pressure of making a working protein, so there's no real evolutionary explanation for how they'd end up so similar.
Here's the 2nd paragraph from the linked article (which is already a source someone created to help non-experts understand the main ideas):
> It isn’t surprising, then, that herrings and smelts, two groups of fish that commonly roam the northernmost reaches of the Atlantic and Pacific Oceans, both make AFPs. But it is very surprising, even weird, that both fish do so with the same AFP gene — particularly since their ancestors diverged more than 250 million years ago and the gene is absent from all the other fish species related to them.
edit: I am more sympathetic to this behavior when the topic is more politically contentious, since it may be unreasonably difficult for a layman to know the biases of the authors and the source may indeed be trying to slide something under the rug. But here we're talking about fish genetics. There's no culture war or red vs. blue divide here (I hope!)
It made me wonder whether viruses (or similar participants) would be vital to complex life evolving on other planets?
EDIT: If I recall correctly, endogenous retroviruses are involved in brain development as well.
Retroviruses could be a mechanism for the DNA jump - though we'd have to ask how they got a portion of an host's DNA - or they could be an alternative mechanism which would explain why the surrounding 'junk' DNA is identical without requiring a speculative 'DNA jump', all three fish species could have been infected by the same retrovirus.
I think the problem here is she is presenting something that is unfalsifiable and therefore problematic. I think it would then be on her and her team (or someone else who cares enough) to prove that it is possible somehow. Devise an experiment (a very clever one I'm sure) of some type that proves that DNA can be passed on this way somehow.
Here’s a plant-to-insect example, discussed a few months ago: https://news.ycombinator.com/item?id=26600298
Speaking of mosquitos, with all of the intense rain, southeast Texas is more akin to the deep South with enormous quantities of flying insects, i.e., moths, beetles, and tons of mosquitos. I setup the largest bug zapper I could find for one night, and it decimated about 2 lbs / 1 kg of insects in a pile so large, it clogged it and left the table it was on completely full of carcasses. IOW: the area needs more birds, if the cats and previous lack of flying insects would stop killing them.
The mosquito "needle" is highly evolved. Less painful than a thumb blood sample. Don't push them off prematurely and you get less itch, IMHO. Don't scratch and you get less itch, too.