Updating the sequence based classification of glycosyl hydrolases
As a measure of the distance between the codon usage of a gene (), we calculated the Hamming distance following a procedure similar to the one used for predicting protein structural classes (Chou and Zhang 1995 ).
Each gene therefore corresponds to a vector or a point in the 61-D space whose coordinates are the relative frequencies of use of the 61 codons.
The host provides heat, moisture, and food, while the microorganisms contribute protein as microbial biomass and by-products of digestion such as volatile fatty acids that the animal uses (Madigan, Martinko, and Parker 1997 ).
Anatomically, the most complex specialization is found among the ruminants, such as cattle or sheep, with elaborate, multicompartmentalized stomachs specialized for a herbivorous diet.
For analytical comparisons, we used gene sequences from strains that were homologous to some of the sequences from strain S85.We imported the protein sequences and identified the catalytic domains from the information available in the Swiss Prot database or from local alignments against crystallized sequences.For sequences with two GH family 11 catalytic domains, we used both domains.The reiterated sequence conserved in rumen fungal cellulases differs from clostridial dockerins in size and amino acid sequence, and there is no detectable sequence similarity between them (Beguin and Lemaire 1996 ).No known GH gene from rumen fungi has any introns, and sequences of these genes are very similar to those of GH bacterial genes.