1D 1H NMR is a common technique applied to metabolomic studies, being well suited to untargeted analysis of complex biofluids. It has been successfully applied to the classification and diagnosis of a number of diseases including [ref].
There are a number of important steps that must be applied prior 1D 1H NMR data to statistical analysis, all gearded towards ensuring that the variation observed in the data is representative of real biological variation and not a artifact of sample handling, metabolite extraction or NMR acquisition. These steps are described briefly below and approached in turn in the following method. It is important to familiarise yourself with both the purpose of these steps, and any associated caveats or limits, so that you can correctly apply them to your own data.
To process raw NMR data we will be making use of of the
nmrglue library. In
addition we will make use of features from the
libraries, which you should already have installed.
To make sure you can run the following command on the terminal:
pip install numpy scipy nmrglue matplotlib
The demo data for this example is available for download.
Download and unzip the file into your data folder.
Loading Bruker FID spectra
Create a new Jupyter notebook using the Python 3 kernel, and in the first cell
enter and run the following. This will import all the neccessary libraries, as
well as using the
%matplotlib magic to display output figures in the notebook.
%matplotlib inline import matplotlib.pyplot as plt import nmrglue as ng import numpy as np import scipy as sp import os plt.style.use('ggplot')
Bruker-format data is processed using a digital filter before saving as FID, this
filter must be removed for subsequent analysis to work. To open a given spectra,
pass the parent folder of the
dic, data = ng.fileio.bruker.read('87') data = ng.bruker.remove_digital_filter(dic, data)
We can plot the spectra using
fig = plt.figure(figsize=(12,4)) ax = fig.add_subplot(1,1,1) ax.plot(np.arange(0, data.shape), data) # <1>
- We don't have the x-axis ppm scale yet, so we generate a linear
scale of the same size as the data. Here
.shapeis the length of the fid along the first (and only) axis of the 1D spectra.
Fourier transform (FT)
A fourier transform is a signal transformation that decomposes a signal into it's constituent frequencies. In the case of NMR data, this decomposition is to a series of peaks, that represent the resonance of chemical subgroups. These peaks contain both our identification (frequency) and quantification (amplitude) information.
nmrglue package provides a fast fourier transform (FFT) for this purpose
data_fft = ng.proc_base.fft(data) fig = plt.figure(figsize=(12,4)) ax = fig.add_subplot(1,1,1) ax.plot(np.arange(0, data_fft.shape), data_fft)
You'll notice that the spectra is out of phase, i.e. the peaks are not symmetrical and well defined. We will fix this shortly, but first lets load a complete set of spectra so we can view them all at once.
Batch loading + FFT for multiple spectra
root_folder = '.' zero_fill_size = 32768 data =  # <1> for folder in os.listdir(root_folder): dic, fid = ng.fileio.bruker.read(os.path.join(root_folder,folder)) fid = ng.bruker.remove_digital_filter(dic, fid) fid = ng.proc_base.zf_size(fid, zero_fill_size) # <2> fid = ng.proc_base.rev(fid) # <3> fid = ng.proc_base.fft(fid) data.append(fid) data = np.array(data)
- We build as a list, then transform to an array for speed.
- Zero-filling to a power of 2 speeds up FFT.
- Reverse the data to match common representation (for convenience only).
The data is output in a 2D array, with samples along the first axis.
data.shape (17, 32768)
We want to be able to plot multiple spectra on the same plot, so we'll write a simple function to do this.
def plotspectra(ppms, data, start=None, stop=None): if start: # <1> ixs = list(ppms).index(start) ppms = ppms[ixs:] data = data[:,ixs:] if stop: ixs = list(ppms).index(stop) ppms = ppms[:ixs] data = data[:,:ixs] fig = plt.figure(figsize=(12, 4)) ax = fig.add_subplot(1,1,1) for n in range(data.shape): ax.plot(ppms, data[n,:]) ax.set_xlabel('ppm') ax.invert_xaxis() return fig
- This block allows us to select regions of the spectra to view by filtering on the supplied ppms.
We can use this function to look at all spectra following fourier transform:
ppms = np.arange(0, data.shape) fig = plotspectra(ppms, data)
We can take a close up view of the TMSP peak, which is currently on the left hand side of the spectra in the region 28000-30000.
ppms = np.arange(0, data.shape) fig = plotspectra(ppms, data, start=28000, stop=30000)
Clearly the spectra are all out of phase and poorly aligned. We will fix the phasing problem first, but lets start off by getting the correct ppm values for the spectra.
Calculating ppm values
The method for calculating ppm values is rather complicted for Bruker format files. However, the following will give the correct output:
offset = (float(dic['acqus']['SW']) / 2) - (float(dic['acqus']['O1']) / float(dic['acqus']['BF1'])) start = float(dic['acqus']['SW']) - offset end = -offset step = float(dic['acqus']['SW']) / zero_fill_size ppms = np.arange(start, end, -step)[:zero_fill_size]
We can now plot the spectra with the correct ppms.
fig = plotspectra(ppms, data)
The next step is to phase correct all the spectra. The package
a few automated algorithms that do a reasonable job with most normal, good quality
spectra. Thankfully, that's what we have here.
for n in range(0, data.shape): data[n, :] = ng.proc_autophase.autops(data[n,:], 'acme') fig = plotspectra(ppms, data)