Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Advances in Protein Molecular and Structural Biology Methods
Advances in Protein Molecular and Structural Biology Methods
Advances in Protein Molecular and Structural Biology Methods
Ebook2,210 pages22 hours

Advances in Protein Molecular and Structural Biology Methods

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Advances in Protein Molecular and Structural Biology Methods offers a complete overview of the latest tools and methods applicable to the study of proteins at the molecular and structural level. The book begins with sections exploring tools to optimize recombinant protein expression and biophysical techniques such as fluorescence spectroscopy, NMR, mass spectrometry, cryo-electron microscopy, and X-ray crystallography. It then moves towards computational approaches, considering structural bioinformatics, molecular dynamics simulations, and deep machine learning technologies. The book also covers methods applied to intrinsically disordered proteins (IDPs)followed by chapters on protein interaction networks, protein function, and protein design and engineering.

It provides researchers with an extensive toolkit of methods and techniques to draw from when conducting their own experimental work, taking them from foundational concepts to practical application.

  • Presents a thorough overview of the latest and emerging methods and technologies for protein study
  • Explores biophysical techniques, including nuclear magnetic resonance, X-ray crystallography, and cryo-electron microscopy
  • Includes computational and machine learning methods
  • Features a section dedicated to tools and techniques specific to studying intrinsically disordered proteins
LanguageEnglish
Release dateJan 14, 2022
ISBN9780323902656
Advances in Protein Molecular and Structural Biology Methods

Related to Advances in Protein Molecular and Structural Biology Methods

Related ebooks

Biology For You

View More

Related articles

Reviews for Advances in Protein Molecular and Structural Biology Methods

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Advances in Protein Molecular and Structural Biology Methods - Timir Tripathi

    Chapter 1: Strategies to improve the expression and solubility of recombinant proteins in E. coli

    Niharika Nag⁎; Heena Khan⁎; Timir Tripathi    Molecular and Structural Biophysics Laboratory, Department of Biochemistry, North-Eastern Hill University, Shillong, India

    * Equal contributors.

    Abstract

    With the development of recombinant DNA technology in the 1980s, the heterologous expression of proteins has emerged as a valuable tool for researchers and pharmaceutical scientists, as it is generally challenging to obtain satisfactory yields from natural sources. Several prokaryotic and eukaryotic systems, including bacteria, yeast, insect, and mammalian platforms, have been developed to produce native-like proteins of an organism on a laboratory-scale and industrial-scale settings. E. coli has been the most widely used bacteria for the production of recombinant proteins due to its low cost, well-established cellular biochemistry and genetics, rapid growth, and good productivity, and it has become the most popular expression platform. However, optimizing the protocols for adequate expression and solubility remains a major disadvantage of this system. In this chapter, we discuss the approaches and methodologies to optimize the protein expression and solubility in E. coli and examine the troubleshooting practices for obtaining large quantities of soluble and stable recombinant proteins.

    Keywords

    Recombinant protein expression; E. coli; Plasmid; Inclusion bodies; Tags; Host cell; Vector

    1: Introduction

    The production of recombinant proteins in microbial systems has revolutionized research in biochemistry and bioprocessing. These proteins are encoded by recombinant DNA that is cloned in an expression vector. Recombinant proteins have applications in drug discovery and development, vaccines, and diagnostic reagents.¹ In a basic science laboratory, these proteins are used to produce enzymes for the structural determination of drug targets, which are then exploited by structural and computational biologists for structure-based drug discovery. The rapid expression and large-scale purification of recombinant proteins allow for their biochemical and biophysical characterization and use in industrial set-ups for the development of biopharmaceutics. The commonly used hosts for producing recombinant proteins are bacteria, which account for about 30% of biopharmaceuticals in the market. Currently, around 100 monoclonal antibodies, 25 hormones, 16 clotting factors, 10 enzymes, and vaccines have been approved for protein-based therapeutic applications.²,³ Recent advances made in the various areas of recombinant DNA technologies and bioprocessing are being utilized to develop effective protocols and processes for producing recombinant proteins.

    The production of recombinant protein in microbial systems has been a common approach, with the bacterial system being among the most widely used. Despite the significant developments achieved by prokaryotic and eukaryotic expression systems, Escherichia coli remains the most popular recombinant protein expression system and is widely used due to its various advantages compared to other hosts. The culture conditions for the growth of E. coli are inexpensive, and the genetics is well known, which can be used to construct variants and strains using molecular tools. The major recombinant protein production goals, such as obtaining stable, full-length heterologous globular protein and membrane proteins, can be achieved in E. coli. Features such as simple and fast growth kinetics conditions, conveniently achieved high-density cell culture, and easy transformation of exogenous DNA make the E. coli cell an attractive recombinant protein production machinery. However, it should be noted that in several cases, the prokaryotic expression system poses severe restrictions for successful heterologous protein production due to a lack of posttranslational machinery and thereby lack of protein glycosylation, disulfide bond formation, phosphorylation, or proteolytic processing, leading to low expression, insolubility or nonfunctional nature of the expressed protein. During heterologous expression, the rapid bacterial expression of protein often gives rise to misfolded or unfolded proteins, as it requires adequate time and sometimes other components such as molecular chaperones for correct folding. The high reductive condition in the cytosol and lack of posttranslational modifications leads to insoluble protein expression/inclusion bodies, inactivity of the protein, and low protein yield.⁴,⁵ Fortunately, new developments and progress in experimental protocols are continuously being made to overcome these problems. These include attaching specific tags for expression and solubility, using mutated/modified strains, etc.⁴,⁶In addition, using glycoengineering approaches, recently, E. coli has also been modified for producing recombinant proteins with posttranslational modifications.⁷,⁸ In this chapter, we discuss recent advances and protocols that can be used to overcome the solubility and expression issues of proteins during their expression in E. coli. We also provide a troubleshooting guide that will come in handy while dealing with difficult-to-express proteins.

    2: Before starting with protein expression

    The gene sequence (DNA or cDNA) to be expressed first needs to be cloned into an appropriate cloning vector. The corresponding construct should then be subcloned into an appropriate expression vector having transcriptional and translational machinery to express the target protein. The choice of expression vector is based on a combination of promoters, regulatory sequence, origin of replication, Shine–Dalgarno box, multiple cloning sites replicons, selection markers, and fusion protein removal strategies. The most commonly used vectors are the pET (Novagen, USA), pQE (Qiagen, USA), and pUC series (Takara Bio, Japan) vectors. The promoter widely used is the lac promoter, usually the T7 or T5 promoter. The T7 promoter is based on the bacteriophage system, promoting high levels of transcription, and the T5 promoter is based on the T5 bacteriophage early promoter and the lac operon. Both T5 and T7 promotors are induced by IPTG. Other promoters can also be used, such as the araBAD promoter, which is induced by l-arabinose. The pET series of plasmid vectors usually have the T7 promoter system, whereas the pQE plasmids use the T5 promoter system. A table containing the common promoters used in the production of recombinant proteins is shown in Table 1.

    Table 1

    Tags are often used to aid the expression, purification, and solubility of recombinant proteins. Solubility tags are mostly small, stable, and highly soluble proteins, which enhance the solubility of both the final protein and folding intermediates. The common solubility tags used include maltose-binding protein (MBP), small ubiquitin-related modifier (SUMO), glutathione-S-transferase (GST), N-utilization substance (NusA), thioredoxin (TrxA), and the Fh8 tag. MBP promotes the solubility of the protein of interest by exhibiting chaperone intrinsic activity.⁹,¹⁰ SUMO is a small protein that is used as an N-terminal solubility enhancer due to its chaperoning effects. GST is an N-terminal solubility tag,¹¹ which also acts as an affinity fusion partner for the purification of the target protein¹²; though, it is relatively a poorer solubility tag than other tags.¹³,¹⁴ NusA confers solubility and stability to the target protein by slowing down the translation at the transcriptional pauses, thus providing more time for the protein to fold.¹⁵,¹⁶ TrxA improves the solubility of the target protein¹⁷ and helps avoid the formation of inclusion bodies by using its intrinsic oxidoreductase activity, which reduces disulfide bonds. The Fh8 tag uses a modified haloalkane dehalogenase protein which helps improve the solubility of the protein of interest.¹⁸

    Affinity purification tags can be divided into two types, viz. peptides and proteins that bind to a small ligand immobilized on a solid phase. Affinity tags are chosen based on the size of the protein and also the cost of the purification. A list of commonly used tags and associated purification matrices are shown in Table 2. One of the most widely used purification tags is the polyhistidine (6XHis) affinity tag or the His-tag. The tag constitutes a number (commonly six) of consecutive histidine residues that can coordinate with transition metals like Ni²  + or Co²  + through the histidine imidazole ring. The metals are immobilized on matrix beads or a resin for immobilized metal affinity chromatography (IMAC), such as nitrilotriacetic acid agarose or carboxymethylasparte agarose. In general, the presence of His-tags does not affect the structure and function of the protein. However, larger protein-based tags are often removed after purification since they may potentially interfere with the structure and function of the protein. This can be achieved by enzymatic cleavage, where site-specific proteases are used, or by chemical cleavage, like using formic acid, which is cheaper but also less specific. In rare cases, the His-tags are also cleaved and removed if they impair the protein structure and function.

    Table 2

    The BL21(DE3) strain of E. coli and its derivatives are the preferred hosts used to express recombinant proteins. BL21 cells are deficient in OmpT, which cleaves T7RNA polymerase and the Lon protease that degrades foreign proteins.¹⁹ Though BL21(DE3) and its derivatives are most commonly used for protein expression, derivatives of the K-12 lineage are also used. The M15 cells strain of E. coli carries the deletion mutation at M15, which lies in the lacZ gene.²⁰ This strain includes the pREP4 plasmid, which provides resistance to kanamycin and constitutively expresses the lac repressor protein. It causes the cis-repression of the T5 promoter and is inducible by IPTG. This strain is primarily used for the expression of toxic proteins, and it cannot be infected by phages. Table 3 shows various E. coli strains used for the heterologous expression of recombinant proteins. A schematic diagram depicting the overall process of recombinant protein expression in E. coli is shown in Fig. 1.

    Table 3

    Fig. 1

    Fig. 1 Schematic diagram showing the overall process of heterologous protein expression in E. coli .

    3: Materials required

    (1)Cells: plasmid vectors (e.g., pET-23, pET-28, etc.), recombinant E. coli expression cells (e.g., CodonPlus,C41,BL21, etc.).

    (2)Chemicals:

    Media for culture growth: Luria Bertani broth, Luria Bertani agar, Terrific broth, 2  × YT broth, etc.

    Antibiotics: (100 μg/mL ampicillin, 50 μg/mL kanamycin, etc.) depending upon the antibiotic resistance offered by the vector containing the gene insert.

    Buffers:

    Tris buffer:1 M stock concentration; pH depending on the pI of the protein. Working concentration:50 mM tris containing 300 mM NaCl.

    Phosphate buffer:1 M stock concentration; pH depending on the pI of the protein. Working concentration:20–50 mM phosphate and 300 mM NaCl.

    TGS buffer:10  × stock concentration; 1  × working concentration.

    0.25 M tris

    1.92 M glycine

    1% SDS

    Dialysis buffer:

    20 mM tris/phosphate buffer

    150 mM NaCl

    Isopropyl β-d-1-thiogalactopyranoside (IPTG):1 M stock concentration; 1 mM working concentration.

    Protease inhibitor cocktail: As per the manufacturer’s suggestion.

    Ni-NTA agarose matrix in a column

    Imidazole:3 M stock concentration.

    Glycerol

    EDTA:0.5 M stock concentration.

    (3)Instruments: Weighing balance, autoclave, sterile laminar hood, shaker incubator, centrifuge, sonicator, vortex mixer, magnetic stirrer, SDS-PAGE running unit, UV–Vis spectrophotometer.

    (4)Glassware and other equipment: Petri dishes, vials, conical flasks, beakers, measuring cylinders, pipettes, pipette tips, centrifuge tubes,0.2, and 0.45 μM polyvinylidene difluoride (PVDF) membrane filters, magnetic beads, etc.

    4: Standard protocol for recombinant protein expression in E. coli

    Step 1: The gene of interest, after being cloned in a suitable plasmid vector (pET-23, pET-28), is transformed into a competent E. coli cell (BL21, CodonPlus, C41, etc.). It is then plated on a suitable antibiotic containing LB agar plate and grown overnight at 37°C.

    Step 2: The next day, a few transformed colonies are selected from the plate and inoculated into 5 mL LB media containing appropriate antibiotics and grown overnight at 37°C and 180 rpm. This is the primary bacterial cell culture.

    Step 3: Next morning, 5 μL inoculum from the primary cell culture is inoculated into 5 mL LB media containing antibiotic(s) and grown at 37°C at 180 rpm until the optical density at 600 nm (OD600)of the culture reaches 0.5–0.6. At this point, the culture is placed in a fridge or cold room to stall growth for 15 min. After the culture is cooled, it is induced by 1 mM IPTG and grown again at 37°C at 180 rpm for 4 h. One of the culture vials is kept un-induced as a control to monitor the expression of the protein.

    Step 4: The cells are then harvested by centrifugation at 8000 rpm for 10 min. The pellet is collected, and SDS-PAGE samples are prepared by resuspending the pellet into the buffer (50 mM tris and 300 mM NaCl), 20 μL of 10% SDS, and 20 μL of 5  × protein loading dye. It is then boiled for 15–20 min at 100°C.

    Step 5: The sample is then loaded on an SDS gel of appropriate acrylamide percentage depending upon the size of the protein of interest, and the gel is run in 1  × TGS buffer along with the protein size marker/protein ladder.

    4.1: For protein solubilization

    Step 6: To check the solubility of the expressed protein,100 mL of LB media containing the relevant antibiotic is inoculated with 100μLof the primary culture and grown with continuous stirring at 180 rpm at 37°C until the OD600 reaches 0.5–0.6. As earlier, on reaching the desired OD, the protein expression is induced with 1 mM IPTG. The culture is grown again at 37°C at 180 rpm for 4 h.

    Step 7: The cells are harvested at 8000 rpm for 10 min, and the pellet is collected. The pellet is resuspended in a buffer containing 50 mM tris/phosphate and 300 mM NaCl along with protease inhibitors. The cells are then lysed by sonication at pulse rest cycles of 30 s (on/off) at an amplitude of 50%.

    Step 8: The lysed cells are centrifuged at 12000 rpm for 30 min, after which samples of both the supernatant and the pellet are made for SDS-PAGE, as mentioned in step 4. The sample is run on an SDS gel and analyzed for the solubility of the protein. A soluble protein should be present in the supernatant sample.

    4.2: For protein purification

    Step 9: After expression and solubilization, the next step is purification. 500 mL of LB media is inoculated with 5 mL of primary inoculum and grown with continuous stirring at 180 rpm and 37°C until the OD600 reaches 0.5–0.6. As the culture reaches the required O·D, it is cooled and then induced with 1 mM IPTG and grown at 37°C at 180 rpm for 4 h.

    Step 10: The cells are harvested and lysed, as mentioned in step 7.

    Step 11: The cells are then centrifuged at 12000 rpm for 30 min, and the supernatant is collected. The supernatant is passed through a Ni-NTA matrix that is charged with Ni²  +  and equilibrated with 10–12 times the volume of the matrix by equilibrium buffer (50 mM tris/phosphate and 300 mM NaCl). Before use, all the buffers are filtered using a 0.2 μM PVDF filter, and the supernatant is filtered using a 0.45 μM PVDF filter.

    Step 12: Increasing concentrations of imidazole in the buffer (50 mM tris/phosphate and 300 mM NaCl) are passed through the column, and the purified protein is eluted with 300 mM imidazole. Each fraction, starting from supernatant that was passed through the column till the purified protein, is collected, and samples are prepared as mentioned in step 4 and analyzed by SDS-PAGE.

    Step 13: The purified protein is poured into a dialysis bag and dialyzed in the dialysis buffer at 4°C for 4 h. The buffer is then refreshed, and the protein is dialyzed again overnight at 4°C. The concentration of the protein is then determined using the Bradford assay.

    Step 14: The oligomerization status of the protein is evaluated on a size exclusion chromatography column using an ÄKTA FPLC.

    5: Troubleshooting strategies

    5.1: Handling protein expression and solubility issues

    5.1.1: DNA sequence and codon bias issues

    The low expression level of the recombinant protein could be due to certain DNA sequences that encode elements that interfere with either transcription or translation; these sequences should be checked and modified. Plasmid storage in a strain that expresses the protein may also cause host stress and possible mutations, resulting in low protein expression. It is best to store the plasmid in a strain that cannot express the target protein. Codon bias may be another reason for the low expression of the protein. Optimizing the codon by site-directed mutagenesis of silent mutation may be a solution to overcome codon biases.

    5.1.2: Varying host strain

    Different strains of E. coli can be used for expressing different heterologous proteins. E. coli B21, its mutants, and modified variants are used for different concerns. BL21-RP, Rosetta, BL21-RIL strains encode tRNA for genes having rare codons. Rosetta-gami strain can be used for expressing a protein that has disulfide bonds, which enhances the solubility of the protein. Other strains like BL21-AI and pLys can be used for the expression of toxic proteins. Table 3 shows commonly used E. coli strains used to express heterologous proteins, their characteristics, and applications. Another reason for the low expression of protein could also be the use of older (older than 3–4 weeks) transformed bacterial cells. The use of freshly transformed cells could help solve the issue.

    5.1.3: Isopropyl β-d-1-thiogalactopyranoside (IPTG)

    IPTG is one of the most commonly used protein expression-inducing agents for T7 and T5 promoter systems. It is a structural analog of allolactose which is not metabolizable. An E. coli strain containing the T7RNA polymerase inserted into a plasmid, on induction with IPTG, overexpresses the protein of interest, producing a high yield of the protein. However, the concentration of IPTG as an inducer needs to be standardized according to the protein of interest to be expressed. The optimal concentration is standardized to get maximum expression by balancing the decreasing yield of recombinant cells following induction with the increasing cellular level of the target protein. The most commonly preferred concentration of IPTG usually lies in the range of 0.5 and 1 mM. Higher concentrations of IPTG might lead to cell toxicity, known as metabolic load, and low yield of expressed protein.²¹–²³ Sometimes, 0.5–1 mM IPTG concentration leads to the very high expression of recombinant proteins that form insoluble aggregates and pellet out in the insoluble fraction. In such cases, the IPTG concentration is gradually lowered up to 0.1–0.2 mM, and the exact IPTG concentration is optimized, keeping a balance between protein expression and solubility.

    5.1.4: Temperature

    The normal growth temperature of E. coli is generally 37°C. This temperature might not always be advantageous for the growth of recombinant E. coli cells harboring the gene of interest to be expressed. In several cases, there could be a significant reduction in the protein solubility at 37°C, and the expressed proteins can form inclusion bodies. Lower temperatures (anywhere between 15 and 30°C) have been shown to improve the solubility of the protein. Additionally, at lower temperatures, the proteolytic degradation of protease-sensitive proteins can also be avoided as most proteases are relatively less active at low temperatures.²⁴,²⁵ By lowering the temperature, the cellular processes also slow down, resulting in reduced transpiration and translation rates. Slowing the protein expression by lowering the temperature helps reduce protein aggregation that might form due to hydrophobic interaction favored at high temperatures. However, the protein expression at lower temperatures of15–30°C needs a longer expression time to obtain a good protein yield (anywhere between 8 and 16 h).⁴,²⁶

    5.1.5: Culture media

    Luria broth (LB) is the most commonly used standard complex media used for the culture of E. coli cells. It is composed of tryptone, yeast extract, and sodium chloride that provide all the nutrients required for the growth of E. coli cells. Other media such as minimal M9 media can also be used as it lacks amino acids. It is used for selective labeling of proteins, such as isotype labeling for the NMR study of the protein. Terrific broth (TB), a rich media, can also be used as it is aids in good expression and enhances the solubility and yield of the protein.²⁷ 2xYT broth is a nutrient-rich microbial media that contains amino acids, peptides, and water-soluble vitamins in a low-salt formulation. It provides nitrogen and other growth factors required to grow recombinant E. coli strains infected with the M13 bacteriophage without exhausting the host cells. These standard media can be supplemented with glycerol, glucose, or other chemicals such as ethanol to enhance protein expression and solubility. The supply of divalent cation like MgSO4 in mM concentration and yeast or peptone increase can also result in better cell densities.

    5.1.6: Buffer additives

    In some cases, the general lysis buffer may likely lead to protein aggregation or the formation of inclusion bodies. Additives such as glycerol, arginine, CHAPS, MgCl2 into the buffer can provide stability to the proteins being expressed and increase their solubility giving better yield. The pH of the buffer also plays an important role, and thus the pH range should be optimized according to the pI of the protein for better solubility. Solubility of proteins can also be enhanced using salts to the cell lysis buffer during the induction of protein.²⁸ Osmolytes protect the protein from denaturation by thermally stabilizing the protein. The most commonly used osmolyte salts are glycine, proline, and trehalose to increase the solubility and reduce the formation of inclusion bodies. Arginine can also be used as a protein stabilizer, as it can stabilize and prevent protein aggregation.²⁹ Other metal ion salts such as K2PO4 and CuCl2 have also been shown to increase the stability of the proteins during aggregation.

    5.1.7: Glycerol

    Glycerol is supplemented in the growth media for enhancing the stability and solubility of the protein. The increase in solubility by an osmolyte such as glycerol results in an increase in the activity of some proteins such as GST. Glycerol acts as an amphiphilic interface, interacting between large patches of contiguous hydrophobic surfaces and the polar solvent. It induces compactness in the native proteins. Glycerol-induced protein compaction mainly originates from electrostatic interactions that influence the orientation of glycerol molecules at the protein surface such that glycerol is further excluded. It has been shown that glycerol inhibits protein unfolding and hinders the formation of aggregation intermediates.³⁰ The sugar-based co-solvent glycerol has a high viscosity, and thus the addition of glycerol to the media slows the microbial growth, thereby requiring higher culture time.

    5.1.8: Detergents

    Detergents are also used to solubilize proteins, in particular, membrane proteins.³¹ Ionic detergents such as sodium dodecyl sulfate act as a good solubilizing agent but are mostly denaturing. Therefore, it is advisable to use nonionic detergents such as Triton X-100 or zwitterionic detergents such as CHAPS/sulfobetaine in the appropriate pH range. The determination of the detergent content or concentration that is the critical micelle concentration (CMC), is an important criterion while using detergents. Inappropriate concentrations might lead to denaturation, difficulty in protein binding in IMAC during purification, and also for further studies of the protein like crystallization. Different concentrations and types of detergent are required for different proteins of interest.

    5.1.9: Ethanol

    A simple method that helps increase the expression of poorly expressed or nonexpressed recombinant proteins is by adding ethanol.³² It has been shown that the addition of ~  3% ethanol (v/v) in the media causes an enhancement in the recombinant protein expression in E. coli (Fig. 2).³³–³⁵ The addition of ethanol mimics heat-shock response in the cells, which also helps in increasing the solubility of proteins.³⁶,³⁷ Ethanol may also help stabilize the native state of the proteins, promoting their solubility.³⁸–⁴⁰

    Fig. 2

    Fig. 2 Comparison of proteins expressed in the absence and presence of 3% ethanol. (A) MEX67 protein. (B) RPB5 protein. In both panels, Lane 1 represents the molecular weight marker; Lane 2 represents un-induced control of the respective protein; Lane 3 represents induced protein in the presence of 3% ethanol (+), while Lane 4 represents induced protein in the absence of 3% ethanol (−). The samples were separated by 12% SDS-PAGE and stained with Coomassie brilliant blue (CBB). ³²

    5.1.10: Co-expression of chaperones

    Co-expression of molecular chaperones along with the required protein increases the expression and solubility of the protein.¹⁶ In heterologous protein expression, the protein folds at a slower rate and may require protein chaperones as folding catalysts to reduce the aggregate formation and facilitate proper folding of the protein of interest. Generally, the expression of small proteins do not require chaperone co-expression, but larger proteins may do so. Co-expression of the protein with chaperones such as GroEL and GroES yields a high expression level and increased production of the soluble target protein. Vectors have been developed which contain E. coli chaperones and allow chaperones to be co-expressed with the protein of interest.⁴¹ Molecular chaperones hinder protein aggregation by interaction with the hydrophobic regions of the unfolded protein and influencing proper folding. Additionally, chaperones are also used as a protein folding agent in vitro.⁴²

    5.1.11: Folding modulators or fusion partner proteins

    Fusion partners are proteins or peptides that are genetically fused with the protein of interest to improve protein folding, solubility, and yield. Several tags are available that helps increase the solubility of the protein. The maltose-binding protein (MBP) tag helps with the expression, purification, and especially the solubility of a protein on interest.¹⁴,⁴³ It helps promote the folding of the target protein¹⁰,⁴⁴ and is more efficient at the N-terminal of the gene construct than the C-terminal.⁴⁵ The small protein thioredoxin (Trx) is another preferred fusion choice for increasing the solubility of target proteins, particularly if the protein contains disulfide bonds. The Trx tag can be inserted at both the N- and C-terminals but is more effective when at the N-terminal of the protein of interest.¹⁴,⁴⁶ A disadvantage with this tag is that it has to be expressed in combination with a small affinity tag to aid in purification. To make the target protein more stable and soluble, the N-utilization substance (NusA) tag can also be used, which provides more time for protein folding by slowing down translation at the transcriptional pauses.¹⁵,¹⁶ NusA requires an additional affinity tag for the purification of the protein. The small ubiquitin-related modifier (SUMO) protein is also used to increase the solubility of the target protein, possibly by its chaperone effects to promote the proper folding of the protein of interest.⁴⁷ An advantage of using the SUMO protein is the availability of its own specific protease (Ulp), which works by recognizing the tertiary structure of SUMO instead of an amino acid sequence. The glutathione-S-transferase (GST) tag, which is used as an affinity tag for purification, can also be used to promote protein solubility.¹² This tag is more efficient at the N-terminal, and it protects the target protein from degradation and stabilizes it in the soluble fraction.⁴⁸ However, the tag is comparatively poor at enhancing solubility⁴⁹,⁵⁰ and can also cause aggregation and precipitation.⁴⁸ The Fh8 tag is another tag that can be used for enhancing solubility¹⁸ and aiding purification.⁵¹ Fusion partners are also used to produce toxic proteins, like the production of antimicrobial peptides (AMPs) with cellulose-binding modules as the fusion partner.⁵²,⁵³ To determine which tag gives the highest yield of the recombinant protein, usually, multiple fusion tags need to be tested.⁵⁴ Often, a combination of tags is used to aid both the expression and purification of the protein.⁵⁵,⁵⁶The tags can have differing effects when inserted either on the N- or C-terminal, but in general, an N-terminal is preferred as that provides a context for efficient initiation of translation and also for efficient tag removal as most endoproteases cleave at or close to the C-terminal of their recognition sites.⁵⁷

    5.2: Handling of inclusion bodies

    Expression of heterologous protein in E. coli often results in the formation of insoluble aggregates or inclusion bodies. Inclusion bodies can be avoided by adjusting the expression condition so that the protein yield may be compromised, but the solubility is achieved. For larger amounts, a decrease in growth temperatures helps in several cases. Another method is to grow the culture to a higher cell density before IPTG induction, and after induction, the expression period is kept to a minimum. The IPTG concentration can also be reduced by 90%–95%. In some cases, the inclusion bodies may form because of a particular strain not tolerating the protein. Thus, a change in the host strain may help increase the expression of the target protein before it forms inclusion bodies. One may also add metals ions to the culture as many proteins require metal cofactors for their solubility.

    Solubilizing the inclusion bodies and refolding the protein into a biologically active state is another method. Here, after cell lysis and centrifugation at low speed, the pellet is solubilized using denaturants, detergents, buffers of extreme pH, etc. Refolding of the protein is then done using slow dialysis or gel filtration chromatography. Solubilizing and refolding inclusion bodies are the crucial stages in recovering active protein from the inclusion bodies. However, it should be noted that 100% refolding of a protein is not possible in most cases. Common chaotropic agents such as guanidine hydrochloride and urea are used for solubilizing inclusion bodies. Both the denaturants induce protein unfolding in a concentration-dependent manner. High pH buffers along with urea are used for solubilization. Modulation of pH during protein expression affects the formation of inclusion bodies. Proteins attain a more denatured structure when exposed to extreme pH values.

    After solubilization, the refolding of the protein can be achieved by dialysis.⁵⁸ The denatured protein is step-wise dialyzed using a lower denaturant concentration in each step. With the decrease in denaturant concentration in the refolding buffer, the refolding is induced. The refolded protein can then be purified by gel filtration of IMAC.⁵⁹ It may be noted that in many cases, including a refolding step makes the recovery of the protein poor.

    5.3: Handling protein leakage

    Promoters that are not tightly regulated show some degree of protein expression before adding an inducer. Such leaky expression of the protein can be due to the negative control of the lac promoter. If the leaky protein is toxic to the cell, it may result in poor culture growth. Promoters like araBAD have lower background expression levels since they rely on positive control.⁶⁰,⁶¹To overcome leaky expression by T7 RNA polymerase, compatible pLysS and pLysE plasmids having bacteriophage T7 lysozyme can be used. T7 lysozyme interacts with T7 RNA polymerase and inhibits basal level transcription of the gene of interest. Another approach is the insertion of an operator lac O downstream to the T7 promoter, a T7/Lac hybrid promoter.⁴,⁵ To reduce the leaky expression of toxic protein and maintain plasmid stability, incubation should be carried out at lower temperatures (23–30°C).

    5.4: Handling of toxic proteins

    The slow growth of the target gene containing strain may be due to the expression of a toxic protein. To counteract protein toxicity, it is important to suppress protein leakage in the absence of an inducer. It does occur that the cells with the plasmid are outgrown by cells with mutant plasmids which do not express the toxic proteins, so it is advisable to keep the number of generations before induction to be kept at a minimum. In case of unstable expression constructs, colonies from fresh plates should be used to inoculate small starter cultures and only grown for about 3 h till a mid-log phase is reached; overnight starter cultures must be avoided. The small starter cultures can then be diluted from about 20- to 50-folds in fresh warm media and grown till an OD600 of ~  0.5 is achieved. This is then followed by induction.

    A protein of interest with hydrophobic regions also causes toxicity due to association and/or incorporation into membrane systems. Therefore, the sequences encoding signal peptides or transmembrane domains are best removed from DNA inserts before they are cloned, unless they are of interest. To express proteins that contain signal peptides for the periplasmic space, the growth temperature must be lowered (to 20–25°C) before induction.

    Modified E. coli strains C41(DE3), C43(DE3) have shown promising results to manage protein toxicity issues during expression. These strains reduce or prevent cell lysis during recombinant protein expression. Due to the mutation in the −  10 region, Lac UV5 promoter is reverted into weaker, wild type one leading to more tolerance to the cell.⁵ For highly toxic proteins, the pQE-80L series of expression vectors can be used in the M15[pREP4] strain of E. coli, which strongly suppresses any expression before induction.

    5.5: Handling of unstable proteins

    Unstable protein expression leads to protein aggregation and inactive proteins. The expression host strain can be changed, wherein instead of BL21(DE3), its variant BL21star (DE3) can be used as it has improved mRNA stability and protein expression ability. Protein quality is also better when induced at a lower temperature, grown for a shorter period of time, in a host strain deficient in one or more proteases. Smaller proteins tend to be unstable in E. coli and can be expressed with a fusion partner such as dihydrofolate reductase (DHFR) protein. In addition, various stabilizing additives like CTAB, K2PO4, glycine, glycerol, l-arginine, etc., can be added to the lysis buffer to increase the stability of the expressed protein.

    5.6: Posttranslational modifications

    Proper disulfide bond formation, which is required for the correct folding, stability, and biological function of proteins, requires a redox environment and foldases found in the periplasmic space of E. coli.⁶² To ensure this, signal peptides can be incorporated at the N-terminal of the protein of interest, which can translocate the heterologous recombinant protein to the periplasm or the extracellular environment of the cells.⁶³ This is done through the secretory pathway (SEC)-dependent pathway of the E. coli., or the signal recognition particle (SRP)-dependent translocation machinery. Recently, E. coli has also been modified through host cell engineering to enable the simple glycosylation of recombinant proteins.⁷ This was enabled by transferring the N-glycosylation system of Campylobacter jejuni to E. coli. C. jejuni is the first prokaryote in which N-linked protein glycosylation was observed. By incorporating the pgl pathway of C. jejuni, which is a cluster of genes first identified in C. jejuni, close to 25 mg/mL of bacterial glycoproteins have been expressed in E. coli.⁶⁴ These strategies can be employed in the expression of N-glycosylated recombinant proteins in a bacterial system.

    6: Conclusion and future perspectives

    The demand for recombinant proteins for application in both research &development and commercial markets is increasing every day. Protein expression in E. coli is the preferred choice as this cost-effective microbial factory is simple and easy to handle. Large-scale protein expression trials showed more than 50% of bacterial proteins and more than 15% of nonbacterial proteins can be expressed in the soluble form in the E. coli system.⁶⁵ Despite the popularity of the system, limitations exist, and challenges are encountered; adequate expression and solubility being the major issues. Maximizing the production of heterologous proteins for commercial application is an art. With time, researchers learned and understood the critical factors influencing heterologous expression in E. coli. Here, we discussed approaches to resolve the commonly encountered issues to produce recombinant proteins with better solubility, stability, and higher yield. Many of the solutions offered require multiple optimizations and are used in combinations to determine the best protocol for a particular protein. Our experience suggests that most optimizations are lab-specific and not always reported in research articles and remain in students’ notebooks. A collective knowledge sharing of simple protocol modifications will permit rationalize our approach for process optimization of recombinant protein production.

    References

    1 Singh D.B., Tripathi T. Frontiers in Protein Structure, Function, and Dynamics. Singapore: Springer Nature; 2020.

    2 Mullard A. FDA approves 100th monoclonal antibody product. Nat Rev Drug Discov. 2021;20:491–495.

    3 Walsh G. Biopharmaceutical benchmarks 2018. Nat Biotechnol. 2018;36(12):1136–1145.

    4 Francis D.M., Page R. Strategies to optimize protein expression in E. coli. Curr Protoc Protein Sci. 2010;61(1):5.24.1–5.24.29.

    5 Rosano G.L., Ceccarelli E.A. Recombinant protein expression in Escherichia coli: advances and challenges. Front Microbiol. 2014;5(172).

    6 Sørensen H.P., Mortensen K.K. Soluble expression of recombinant proteins in the cytoplasm of Escherichia coli. Microb Cell Fact. 2005;4(1):1.

    7 Valderrama-Rincon J.D., Fisher A.C., Merritt J.H., et al. An engineered eukaryotic protein glycosylation pathway in Escherichia coli. Nat Chem Biol. 2012;8(5):434–436.

    8 Gupta S.K., Shukla P. Advanced technologies for improved expression of recombinant proteins in bacteria: perspectives and applications. Crit Rev Biotechnol. 2016;36(6):1089–1098.

    9 Bach H., Mazor Y., Shaky S., et al. Escherichia coli maltose-binding protein as a molecular chaperone for recombinant intracellular cytoplasmic single-chain antibodies11Edited by R. Huber. J Mol Biol. 2001;312(1):79–93.

    10 Kapust R.B., Waugh D.S. Escherichia coli maltose-binding protein is uncommonly effective at promoting the solubility of polypeptides to which it is fused. Protein Sci. 1999;8(8):1668–1674.

    11 Malhotra A. Chapter 16 Tagging for protein expression. In: Burgess R.R., Deutscher M.P., eds. Methods in Enzymology. Academic Press; 2009:239–258.

    12 Smith D.B., Johnson K.S. Single-step purification of polypeptides expressed in Escherichia coli as fusions with glutathione S-transferase. Gene. 1988;67(1):31–40.

    13 Hammarström M., Hellgren N., van den Berg S., Berglund H., Härd T. Rapid screening for improved solubility of small human proteins produced as fusion proteins in Escherichia coli. Protein Sci. 2002;11(2):313–321.

    14 Dyson M.R., Shadbolt S.P., Vincent K.J., Perera R.L., McCafferty J. Production of soluble mammalian proteins in Escherichia coli: identification of protein features that correlate with successful expression. BMC Biotechnol. 2004;4(1):32.

    15 Davis G.D., Elisee C., Newham D.M., Harrison R.G. New fusion protein systems designed to give soluble expression in Escherichia coli. Biotechnol Bioeng. 1999;65(4):382–388.

    16 De Marco V., Stier G., Blandin S., de Marco A. The solubility and stability of recombinant proteins are increased by their fusion to NusA. Biochem Biophys Res Commun. 2004;322(3):766–771.

    17 Yasukawa T., Kanei-Ishii C., Maekawa T., Fujimoto J., Yamamoto T., Ishii S. Increase of solubility of foreign proteins in Escherichia coli by coproduction of the bacterial Thioredoxin. J Biol Chem. 1995;270(43):25328–25331.

    18 Costa S.J., Almeida A., Castro A., Domingues L., Besir H. The novel Fh8 and H fusion partners for soluble protein expression in Escherichia coli: a comparison with the traditional gene fusion technology. Appl Microbiol Biotechnol. 2013;97(15):6779–6791.

    19 Gottesman S. Proteases and their targets in Escherichia coli. Annu Rev Genet. 1996;30(1):465–506.

    20 Villarejo M.R., Zabin I. Beta-galactosidase from termination and deletion mutant strains. J Bacteriol. 1974;120(1):466–474.

    21 Dvorak P., Chrast L., Nikel P.I., et al. Exacerbation of substrate toxicity by IPTG in Escherichia coli BL21(DE3) carrying a synthetic metabolic pathway. Microb Cell Fact. 2015;14:201.

    22 Briand L., Marcion G., Kriznik A., et al. A self-inducible heterologous protein expression system in Escherichia coli. Sci Rep. 2016;6(1):33037.

    23 Donovan R.S., Robinson C.W., Glick B.R. Review: optimizing inducer and culture conditions for expression of foreign proteins under the control of the lac promoter. J Ind Microbiol. 1996;16(3):145–154.

    24 Baneyx F., Ayling A., Palumbo T., Thomas D., Georgiou G. Optimization of growth conditions for the production of proteolytically-sensitive proteins in the periplasmic space of Escherichia coli. Appl Microbiol Biotechnol. 1991;36(1):14–20.

    25 Chesshyre J.A., Hipkiss A.R. Low temperatures stabilize interferon α-2 against proteolysis in Methylophilus methylotrophus and Escherichia coli. Appl Microbiol Biotechnol. 1989;31(2):158–162.

    26 Gopal G.J., Kumar A. Strategies for the production of recombinant protein in Escherichia coli. Protein J. 2013;32(6):419–425.

    27 Sahdev S., Khattar S.K., Saini K.S. Production of active eukaryotic proteins through bacterial expression systems: a review of the existing biotechnology strategies. Mol Cell Biochem. 2008;307(1–2):249–264.

    28 Leibly D.J., Nguyen T.N., Kao L.T., Hewitt S.N., Barrett L.K., Van Voorhis W.C. Stabilizing additives added during cell lysis aid in the solubilization of recombinant proteins. PLoS One. 2012;7(12):e52482.

    29 Tsumoto K., Umetsu M., Kumagai I., Ejima D., Philo J.S., Arakawa T. Role of arginine in protein refolding, solubilization, and purification. Biotechnol Prog. 2004;20(5):1301–1308.

    30 Vagenende V., Yap M.G.S., Trout B.L. Mechanisms of protein stabilization and prevention of protein aggregation by glycerol. Biochemistry. 2009;48(46):11084–11096.

    31 Prince C.C., Jia Z. Detergent quantification in membrane protein samples and its application to crystallization experiments. Amino Acids. 2013;45(6):1293–1302.

    32 Chhetri G., Kalita P., Tripathi T. An efficient protocol to enhance recombinant protein expression using ethanol in Escherichia coli. MethodsX. 2015;2:385–391.

    33 Chhetri G., Pandey T., Chinta R., Kumar A., Tripathi T. An improved method for high-level soluble expression and purification of recombinant amyloid-beta peptide for in vitro studies. Protein Expr Purif. 2015;114:71–76.

    34 Chhetri G., Pandey T., Kumar B., Akhtar M.S., Tripathi T. Recombinant expression, purification and preliminary characterization of the mRNA export factor MEX67 of Saccharomyces cerevisiae. Protein Expr Purif. 2015;107:56–61.

    35 Chhetri G., Ghosh A., Chinta R., Akhtar S., Tripathi T. Cloning, soluble expression, and purification of the RNA polymerase II subunit RPB5 from Saccharomyces cerevisiae. Bioengineered. 2015;6(1):62–66.

    36 Thomas J.G., Baneyx F. Protein misfolding and inclusion body formation in recombinant Escherichia coli cells overexpressing heat-shock proteins. J Biol Chem. 1996;271(19):11141–11147.

    37 Kusano K., Waterman M.R., Sakaguchi M., Omura T., Kagawa N. Protein synthesis inhibitors and ethanol selectively enhance heterologous expression of P450s and related proteins in Escherichia coli. Arch Biochem Biophys. 1999;367(1):129–136.

    38 Samuel D., Ganesh G., Yang P.-W., et al. Proline inhibits aggregation during protein refolding. Protein Sci. 2000;9(2):344–352.

    39 Blackwell J.R., Horgan R. A novel strategy for production of a highly expressed recombinant protein in an active form. FEBS Lett. 1991;295(1–3):10–12.

    40 Bolen D.W., Baskakov I.V. The osmophobic effect: natural selection of a thermodynamic force in protein folding. J Mol Biol. 2001;310(5):955–963.

    41 Kyratsous C.A., Silverstein S.J., DeLong C.R., Panagiotidis C.A. Chaperone-fusion expression plasmid vectors for improved solubility of recombinant proteins in Escherichia coli. Gene. 2009;440(1–2):9–15.

    42 Wingfield P.T. Overview of the purification of recombinant proteins. Curr Protoc Protein Sci. 2015;80(1):6.1.1–6.1.35.

    43 Niiranen L., Espelid S., Karlsen C.R., et al. Comparative expression study to increase the solubility of cold adapted Vibrio proteins in Escherichia coli. Protein Expr Purif. 2007;52(1):210–218.

    44 Nallamsetty S., Waugh D.S. Solubility-enhancing proteins MBP and NusA play a passive role in the folding of their fusion partners. Protein Expr Purif. 2006;45(1):175–182.

    45 Sachdev D., Chirgwin J.M. [20] Fusions to maltose-binding protein: control of folding and solubility in protein purification. In: Methods in Enzymology. Academic Press; 2000:312–321.

    46 Terpe K. Overview of tag protein fusions: from molecular and biochemical fundamentals to commercial systems. Appl Microbiol Biotechnol. 2003;60(5):523–533.

    47 Costa S., Almeida A., Castro A., Domingues L. Fusion tags for protein solubility, purification and immunogenicity in Escherichia coli: the novel Fh8 system. Front Microbiol. 2014;5(63).

    48 Kaplan W., Erhardt J., Sluis-Cremer N., Dirr H., Hüsler P., Klump H. Conformational stability of pGEX-expressed Schistosoma japonicum glutathione S-transferase: a detoxification enzyme and fusion-protein affinity tag. Protein Sci. 1997;6(2):399–406.

    49 Esposito D., Chatterjee D.K. Enhancement of soluble protein expression through the use of fusion tags. Curr Opin Biotechnol. 2006;17(4):353–358.

    50 Brown B.L., Hadley M., Page R. Heterologous high-level E. coli expression, purification and biophysical characterization of the spine-associated RapGAP (SPAR) PDZ domain. Protein Expr Purif. 2008;62(1):9–14.

    51 Costa S.J., Coelho E., Franco L., Almeida A., Castro A., Domingues L. The Fh8 tag: a fusion partner for simple and cost-effective protein purification in Escherichia coli. Protein Expr Purif. 2013;92(2):163–170.

    52 Guerreiro C.I.P.D., Fontes C.M.G.A., Gama M., Domingues L. Escherichia coli expression and purification of four antimicrobial peptides fused to a family 3 carbohydrate-binding module (CBM) from clostridium thermocellum. Protein Expr Purif. 2008;59(1):161–168.

    53 Ramos R., Moreira S., Rodrigues A., Gama M., Domingues L. Recombinant expression and purification of the antimicrobial peptide magainin-2. Biotechnol Prog. 2013;29(1):17–22.

    54 Peti W., Page R. Strategies to maximize heterologous protein expression in Escherichia coli with minimal cost. Protein Expr Purif. 2007;51(1):1–10.

    55 Nilsson B., Moks T., Jansson B., et al. A synthetic IgG-binding domain based on staphylococcal protein a. Protein Eng. 1987;1(2):107–113.

    56 Routzahn K.M., Waugh D.S. Differential effects of supplementary affinity tags on the solubility of MBP fusion proteins. J Struct Funct Genomics. 2002;2(2):83–92.

    57 Malhotra A. Tagging for protein expression. Methods Enzymol. 2009;463:239–258.

    58 Singh A., Upadhyay V., Upadhyay A.K., Singh S.M., Panda A.K. Protein recovery from inclusion bodies of Escherichia coli using mild solubilization process. Microb Cell Fact. 2015;14(1):41.

    59 Singh A., Upadhyay V., Panda A.K. Solubilization and refolding of inclusion body proteins. Methods Mol Biol. 2015;1258:283–291.

    60 Siegele D.A., Hu J.C. Gene expression from plasmids containing the araBAD promoter at subsaturating inducer concentrations represents mixed populations. Proc Natl Acad Sci U.S.A. 1997;94(15):8168.

    61 Guzman L.M., Belin D., Carson M.J., Beckwith J. Tight regulation, modulation, and high-level expression by vectors containing the arabinose PBAD promoter. J Bacteriol. 1995;177(14):4121–4130.

    62 Merdanovic M., Clausen T., Kaiser M., Huber R., Ehrmann M. Protein quality control in the bacterial periplasm. Annu Rev Microbiol. 2011;65:149–168.

    63 de Marco A. Strategies for successful recombinant expression of disulfide bond-dependent proteins in Escherichia coli. Microb Cell Fact. 2009;8(1):26.

    64 Ihssen J., Kowarik M., Dilettoso S., Tanner C., Wacker M., Thöny-Meyer L. Production of glycoprotein vaccines in Escherichia coli. Microb Cell Fact. 2010;9:61.

    65 Braun P., LaBaer J. High throughput protein production for functional proteomics. Trends Biotechnol. 2003;21(9):383–388.

    Chapter 2: Advances in heterologous protein expression strategies in yeast and insect systems

    Meenakshi Singha; Smita Guptab; Arun Kumar Rawatc; Sudhir Kumar Singhb    a Department of Medicinal Chemistry, Institute of Medical Sciences, Banaras Hindu University, Varanasi, India

    b Department of Microbiology, Institute of Medical Sciences, Banaras Hindu University, Varanasi, India

    c Biochemistry Department, Institute of Science, Banaras Hindu University, Varanasi, India

    Abstract

    The production of recombinant proteins has very high commercial and therapeutic values, especially with the increasing need for immune therapeutics and vaccines. However, none of the expression hosts can guarantee high yields of recombinant products. The eukaryotic expression system, yeast and insect cell lines, with easily accessible genetic tools, rapid growth, high cell density, and simple and inexpensive culture media, offer a better alternative to bacterial and mammalian cell expression systems. Moreover, they are harnessed to achieve a greater yield of heterologous protein with proper post-translational modifications with better protein quality due to their conserved cellular and metabolic processes to humans and other mammals. In this chapter, we discuss the established and emerging synthetic biology tools for engineered strain development of yeast and advances made towards the insect cell lines and baculovirus expression vectors that have been successfully used to express difficult-to-express proteins over the last couple of decades.

    Keywords

    Yeast expression platform (YEP); Saccharomyces cerevisiae; Pichia pastoris; Baculovirus; Insect cell line; Baculovirus Expression Vector System (BEVS); Glycosylation; Secretory proteins; Heterologous proteins; Recombinant protein expression

    Acknowledgment

    MS is grateful to the Institute of Eminence, Banaras Hindu University, Varanasi, India, for a seed grant. SG and AKR would like to thank the University Grants Commission (UGC), New Delhi, India, for providing Dr. D. S. Kothari Postdoctoral Fellowship. SKS is grateful to the Institute of Eminence, Banaras Hindu University, Varanasi, India, for providing the Malaviya Postdoctoral Fellowship.

    1: Introduction

    Recombinant protein production in different expression platforms such as bacteria, algae, yeast, insect, and mammalian cell lines uses many molecular tools and protocols, a plethora of expression plasmids, and numerous engineered strains and strategies.¹ The production of heterologous (HL) protein is, on the one hand, useful for fundamental research on protein biochemistry, while on the other hand, they are essential for commercial production, therapeutics applications, and vaccines for human health.² In the recent past, continuous progress is being made in HL protein production. The optimum protein expression is typically achieved by modulating several parameters such as manipulating gene copy number expression to get an increased amount of recombinant protein, host/strain selection, codon optimization, promoter selection, inducer optimization, co-expression of folding factors and increasing secretion capacity, etc. Following the correct expression and production protocol for a specific application is vital. Several parameters such as protein solubility, protein yield, protein quality/functionality, purification speed, cost-effectiveness, scalability, achievable cell density/biomass, doubling time, and localization of proteins are often crucial factors to contemplate when choosing a host/expression system.³ Besides, other important factors such as membrane-bound, solubility propensity due to the presence of different amino acids, the occurrence of single or multidomain protein, and size (molecular weight) are also important in considering the selection of an expression system. This chapter will shed light on the recent advances and strategies in recombinant protein expression and purification in yeast and insect cell lines that have been successful for expressing difficult-to-express proteins over the last couple of decades.

    2: Heterologous protein expression strategies in yeast systems

    2.1: Introduction

    For the recombinant production of a specific protein, the most suitable expression system should be first identified and further optimized both on the genetic and fermentation levels to optimize the yield and minimize the endogenous impurities and other toxic proteins. A yeast expression platform (YEP) consists of different yeast strains used to produce large amounts of proteins for research or industrial applications.⁴ Due to their eukaryotic nature, different yeasts differ in their productivity, capabilities to secrete, process, and modify proteins. Owing to their simpler eukaryotic assembly, YEP offers relatively easy genetic manipulation and reaches higher cell densities on inexpensive media that ultimately leads to a higher amount of recombinant protein production. One of the main advantages of using yeasts over bacteria is that they can produce glycosylated and other complex modified proteins that are identical or very similar to native mammalian and plant proteins. The advantages of using YEP for HL protein production are mentioned in Table 1.

    Table 1

    S. cerevisiae is the first eukaryotic organism completely sequenced,⁵ which further turned into the most potent genetic model for eukaryotic study and evolutionary genetics. The first YEP for recombinant protein production was based on the baker’s yeast Saccharomyces cerevisiae due to the extensive research of its cellular and molecular biology.⁴ Due to the inherent property of yeast to grow on different carbon sources and not just on glucose, a variety of other YEP was studied and developed for recombinant protein production using genetic engineering tools. YEP are considered non-pathogenic and non-toxigenic, and their products are generally accepted as safe substances (GRAS) approved by Food and Drug Administration (FDA, White Oak, MD, USA).⁶ The YEP has the exceptional ability for the secretory expression of HL proteins and the production of complex eukaryotic proteins that require post-translation modifications (PTM) for proper folding. The common YEPs used for recombinant protein expression mainly include non-methylotrophic yeast Saccharomyces cerevisiae and non-conventional methylotrophic yeast such as Pichia pastoris (known as Komagataella phaffii), Kluyveromyces lactis, Yarrowia lipolytica, Hansenula polymorpha, and Arxula adeninivorans. Since the last decade, non-conventional yeasts have received more popularity due to several enhanced physiological properties,⁷ such as attaining higher biomass yields in fermentation processes for greater protein yield as they are Crabtree-negative and favor respiration over fermentation, utilize a broad range of carbon sources⁸ and lesser extent of hypermannosylation as compared to S. cerevisiae.⁹ Several strategies have been employed for achieving the higher yield of HL protein using the YEP. An overview of different strategies implemented for the HL protein production using YEP is summarized in Fig. 1. Different strategies for recombinant protein production using YEP are discussed in the following sections.

    Fig. 1

    Fig. 1 Protein expression strategies using YEP.

    2.2: Synthetic gene optimization

    For gene expression, different organisms have distinct subsets of codons, also known as codon usage. To express the HL gene sequences in the YEP, codon optimization is done to match the host codon bias for effective transcription and translation to attain a higher yield. The optimized codon usage with appropriate codon combinations and specific codons leads to mRNA stability, modulates the ribosome speed, and enables proper protein folding.¹⁰ The expression of human papillomavirus type 58 L1 in S. cerevisiae was enhanced by its codon optimization for large-scale vaccine production.¹¹ Another strategy involves using codon-pair context (CC) bias in the HL gene, where the specific preference of codons for a given host is incorporated to improve the protein expression.¹² A recent study showed that protein expression could also be optimized by generating multiple variants of the same gene using a probabilistic algorithm to increase the protein expression.¹³ Several other approaches suggest amending translational initiation site, and structural conformation of mRNA leads to increased recombinant protein expression using YEP.¹⁴ Additionally, balancing A  +  T/G  +  C content also plays an important role in codon optimization and translational efficiency for the HL protein overexpression, e.g., the expression of human glucocerebrosidase and diphtheria toxin was increased in the P. pastoris by the deletion of AT-rich region fostering early transcription termination.¹⁵

    2.3: Expression optimization by controlling gene copy number

    Two distinct strategies have been used for the HL gene expression using YEP: self-replicating plasmid-based or via integration of the gene into the yeast genomic DNA. The expression for HL gene construct for the production of recombinant proteins in yeast is facilitated by three types of vectors, i.e., episomal plasmid (yEP), centromeric plasmid (yCP), and integration plasmid (IP).¹⁶ The YEPs have a high copy number (5–30 copies) and are based on the 2 μ origin of replication. The yCPs contain both autonomously replicating sequence (ARS) and yeast centromeric sequence (CEN) and have very low copy numbers (1–2 copies). Although high copy number plasmid yEP enables the cell for strong gene expression due to its segregation instability in further generations, the final protein yield is affected in the large-scale industrial set-up; hence, integration of the desired gene into the host genome is preferred over the plasmid-based expression.¹⁷ The yIPs are used to integrate the gene expression cassette into the genomic DNA to maintain stable expression. It further enables the cell to avoid any kind of selective pressure. A study suggests that multiple integrations are required to increase the expression of the desired gene.¹⁸ However, random integration of expression cassette led to altered gene expression due to the interruption in regulatory elements and disruption of open reading frames. Hence, to avoid unwanted side effects, the expression cassette is either integrated into the gene locus that is not required for yeast growth or in the non-coding regions. A study involving the integration of Aspergillus niger eng1 into the S. cerevisiae chromosome HO locus led to higher secretory production of endo-1,4-β-glucanase.¹⁹ Similarly, higher expression of HL proteins was observed by integrating the genes into different yeast chromosomal loci such as sigma element,²⁰ delta terminal repeat,²¹ and yeast ribosomal repeat.²² For the increased expression of the HL protein, multiple copies of the expression cassette are required; however, it is not always helpful.²³ For example, a study involving the increased gene copy number of human serum albumin led to saturation of protein yield in P. pastoris and H. polymorpha.²⁴,²⁵ Hence, the approach of using controllable multiple integration vectors exploiting the defective selection and an auxotrophic marker is used in some studies to maneuver the copy number of the gene for optimum expression in H. polymorpha, Y. lipolytica, and S. cerevisiae.²⁶,²⁷

    2.4: Optimization of promoters

    The choice of the promoter is a critical step for the overexpression of HL protein as efficient transcription leads to optimum gene expression. Many well-characterized inducible or constitutive YEP promoters are reported to achieve overproduction of recombinant proteins²⁸ (Table 2). For the optimum protein expression, different type of promoters has been employed based on the experimental evidence. The constitutive promoters provide a constant level of expression, while inducible promoters control gene expression levels in the presence of an inductor molecule, and induction can be modulated by the concentration of the inductor.²⁹ Several studies suggest that using the strong promoter in some cases, specifically secretory proteins, leads to improper folding inside ER, resulting in a lower yield of the HL protein due to its aggregation propensity.³⁰ Therefore, to attain a good yield of recombinant protein from YEPs, proper promoter selection is important, depending upon the familiarity with the different yeast host, its metabolic characteristics, HL protein toxicity, and demands on protein folding and protein size.

    Table 2

    In recent years, considerable efforts have been made to engineer the constitutive promoters for a wide range of transcriptional activities. Using randomized oligonucleotides, random mutagenesis, and error-prone PCR, the range of constitutive promoters in S. cerevisiae and P. pastoris has been increased to fine-tune the gene expression for HL therapeutic proteins.³¹–³³ For inducible promoters, they offer an advantage in controlling the gene expression levels of the desired HL protein. In S. cerevisiae, the galactose-induced GAL1 and GAL10 promoters are frequently used, while in P. pastoris, methanol-induced AOX1 promoter is most commonly used for secretory protein production. Recently, synthetic promoters and hybrid promoters are reported for inducible promoters with novel characteristics. In yeast Y. lipolytica, two-hybrid promoters, UAS1B8/16-TEF and UAS1Bn-Leum, have been designed by incorporating an upstream activating sequence (UAS) before its endogenous promoter.³⁴ For P. pastoris, methanol is used for the induction of AOX1 promoter, but due to its toxicity to the cells and flammable nature, there is a safety concern for the scale-up process for the desired recombinant protein production at the industrial level. Several synthetic promoters with increased strength and altered methanol-free regulation have been developed.³⁵ Recently, synthetic AOX1 promoter variants are developed in P. pastoris, where the expression level is based upon the mechanism of repression/de-repression and is governed by alternate carbon sources, i.e., glycerol and glucose.³⁶–³⁸

    2.5: Engineering yeast secretion pathway

    The YEP is used to express the proteins of animal origin due to the similarity of the secretion pathway of yeast with higher eukaryotes.³⁹ Moreover, for the scale-up industrial protein production, YEP is advantageous due to the limited secretion of endogenous proteins, and hence they offer an efficient and cost-effective downstream protein purification strategy. The pathway of secretory proteins starting from ER, then Golgi apparatus, and ending to PM is shown in Fig. 2. Several limitations of YEPs, such as improper protein folding, proteolytic degradation, hyperglycosylation, and inefficient secretion, evoke the need for different engineering steps of the protein secretory pathway of YEP to improve the quality and yield of HL proteins.⁴⁰ Several advanced strategies have been developed for the optimum HL protein production for YEP that are mentioned below.

    Fig. 2

    Fig. 2 Diagram representation of the yeast secretory pathway. (Adapted from Thak, EJ, Yoo, SJ, Moon, HY, Kang, HA. Yeast synthetic biology for designed cell factories producing secretory recombinant proteins. FEMS Yeast Res 2020;20(2):foaa009.)

    2.5.1: Polypeptide targeting and translocation

    The yeast secretory pathway involves the secretory signal peptide sequence (SP) directed translocation of the protein polypeptide into the ER-Golgi body pathway. The secretory pathway follows two mechanisms, i.e., post-translational and co-translational mechanisms. In the post-translational translocation, several cytosolic chaperones act on the hydrophobic binding sites and export the partially folded polypeptide to the ER membrane, while in the case of the co-translational translocation, the transmembrane and hydrophobic polypeptide sequence is translated and further bound to the signal-recognition particle (SRP) for targeting to the SRP receptor into ER. The

    Enjoying the preview?
    Page 1 of 1