Reader Eliot Morrison, a protein biochemist, has been looking for the longest English word found in the human proteome — the full set of proteins that can be expressed by the human body. Proteins are chains composed of amino acids, and the most common 20 are represented by the letters A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, and Y. “These amino acids have different chemical properties,” Eliot writes, “and the sequence influences how the whole chain folds in three dimensions, which in turn determines the structural and functional properties of the protein.”
The longest English word he’s found is TARGETEER, at nine letters, in the uncharacterized protein C12orf42. The whole sequence of C12orf42 is:
MSTVICMKQR EEEFLLTIRP FANRMQKSPC YIPIVSSATL WDRSTPSAKH IPCYERTSVP CSRFINHMKN FSESPKFRSL HFLNFPVFPE RTQNSMACKR LLHTCQYIVP RCSVSTVSFD EESYEEFRSS PAPSSETDEA PLIFTARGET EERARGAPKQ AWNSSFLEQL VKKPNWAHSV NPVHLEAQGI HISRHTRPKG QPLSSPKKNS GSAARPSTAI GLCRRSQTPG ALQSTGPSNT ELEPEERMAV PAGAQAHPDD IQSRLLGASG NPVGKGAVAM APEMLPKHPH TPRDRRPQAD TSLHGNLAGA PLPLLAGAST HFPSKRLIKV CSSAPPRPTR RFHTVCSQAL SRPVVNAHLH
And there are more: “There are also a number of eight-letters words found: ASPARKLE (Uniprot code: Q86UW7), DATELESS (Q9ULP0-3), GALAGALA (Q86VD7), GRISETTE (Q969Y0), MISSPEAK (Q8WXH0), REELRALL (Q96FL8), RELASTER (Q8IVB5), REVERSAL (Q5TZA2), and SLAVERER (Q2TAC2).” I wonder if there’s a sentence in us somewhere.