full_dia package¶
Subpackages¶
- full_dia.alphatims package
- Submodules
- full_dia.alphatims.bruker module
PrecursorFloatErrorTimsTOFTimsTOF.accumulation_timesTimsTOF.acquisition_modeTimsTOF.as_dataframe()TimsTOF.bin_intensities()TimsTOF.calculate_global_calibrated_mz_values()TimsTOF.calibrated_mz_max_valueTimsTOF.calibrated_mz_min_valueTimsTOF.calibrated_mz_valuesTimsTOF.convert_from_indices()TimsTOF.convert_to_indices()TimsTOF.cycleTimsTOF.dia_mz_cycleTimsTOF.dia_precursor_cycleTimsTOF.directoryTimsTOF.estimate_strike_count()TimsTOF.fragment_framesTimsTOF.frame_max_indexTimsTOF.framesTimsTOF.index_precursors()TimsTOF.intensity_correctionsTimsTOF.intensity_max_valueTimsTOF.intensity_min_valueTimsTOF.intensity_valuesTimsTOF.is_compressedTimsTOF.max_accumulation_timeTimsTOF.meta_dataTimsTOF.mobility_max_valueTimsTOF.mobility_min_valueTimsTOF.mobility_valuesTimsTOF.mz_max_valueTimsTOF.mz_min_valueTimsTOF.mz_valuesTimsTOF.precursor_indicesTimsTOF.precursor_max_indexTimsTOF.precursorsTimsTOF.push_indptrTimsTOF.quad_indptrTimsTOF.quad_mz_max_valueTimsTOF.quad_mz_min_valueTimsTOF.quad_mz_valuesTimsTOF.raw_quad_indptrTimsTOF.rt_max_valueTimsTOF.rt_valuesTimsTOF.sample_nameTimsTOF.save_as_hdf()TimsTOF.save_as_spectra()TimsTOF.scan_max_indexTimsTOF.set_cycle()TimsTOF.tof_indicesTimsTOF.tof_max_indexTimsTOF.use_calibrated_mz_values_as_default()TimsTOF.versionTimsTOF.zeroth_frame
add_intensity_to_bin()calculate_dia_cycle_mask()centroid_spectra()convert_slice_key_to_float_array()convert_slice_key_to_int_array()filter_indices()filter_spectra_by_abundant_peaks()filter_tof_to_csr()get_dia_push_indices()indptr_lookup()init_bruker_dll()open_bruker_d_folder()parse_decompressed_bruker_binary_type1()parse_decompressed_bruker_binary_type2()parse_keys()process_frame()read_bruker_binary()read_bruker_sql()save_as_mgf()save_as_spectra()set_precursor()trim_spectra()valid_precursor_index()valid_quad_mz_values()
- full_dia.alphatims.utils module
- Module contents
Submodules¶
full_dia.assemble module¶
- full_dia.assemble.assemble_pep_to_pg(df_input, q_cut_infer, run_or_global)[source]¶
Assemble peps to pgs.
- Parameters:
df_input (pd.DataFrame) – Must have columns: protein_id and simple_seq/strip_seq
q_cut_infer (float) – Q-value cutoff to select peptides to assembly.
run_or_global ({'run', 'global'}) – Assemble on run or global level.
- Returns:
df – Copy of df_input with a new column: protein_group.
- Return type:
pd.DataFrame
- full_dia.assemble.assemble_pep_to_pg_core(graph)[source]¶
Perform IDPicker algorithm on pep-protein bipartite graph.
- Parameters:
graph (nx.Graph) – The bipartite graph of protein and peptide that needs assignment.
- Returns:
- protein_vlist
The proteins after assignment.
- peptide_vlist of list
The peptides after assignment.
- Return type:
tuple
full_dia.calib module¶
- full_dia.calib.cal_turning_point(y_data, y_pred)[source]¶
Determine the tolerance corresponding to the elbow point from the tolerance–coverage curve. See https://stackoverflow.com/questions/2018178/finding-the-best-trade-off-point-on-a-curve
- Return type:
float
- full_dia.calib.calib_im(df_tol, df_lib)[source]¶
Fit IM from iIM to real IM based on df_tol, then update the IM in df_lib.
- Parameters:
df_tol (pd.DataFrame) – Columns: ‘score_deep’, ‘pred_iim’, ‘pred_im’, ‘measure_im’.
df_lib (
DataFrame) – Columns: ‘pred_iim’.
- Returns:
- df_tolpd.DataFrame
df_tol with ‘pred_im’ and bias_im will less than tolerance.
- df_libpd.DataFrame
Updated ‘pred_im’.
- Return type:
tuple
- full_dia.calib.calib_mz(df_seed, ms)[source]¶
Fit m/z and update the measured m/z values.
- Parameters:
df_seed (pd.DataFrame) – Columns: ‘score_deep’, ‘measure_pr_mz’, ‘pr_mz’.
ms (tims.Tims) – Save the raw measured m/z values.
- Returns:
df_seed – Nothing new to df_seed.
- Return type:
pd.DataFrame
- full_dia.calib.calib_rt(df_seed, df_lib)[source]¶
Fit RT from iRT to real RT based on df_seed, then update the RT in df_lib.
- Parameters:
df_seed (pd.DataFrame) – Columns: ‘simple_seq’, ‘locus’, ‘score_deep’, ‘pred_irt’, ‘measure_rt’.
df_lib (
DataFrame) – Columns: ‘pred_irt’.
- Returns:
- dfpd.DataFrame
df_seed with ‘pred_rt’ and bias_rt will less than tolerance.
- df_libpd.DataFrame
Add a new column ‘pred_rt’.
- Return type:
tuple
- full_dia.calib.fit_by_lowess(x, y, frac)[source]¶
Perform lowesss fit on x and y arrays using frac value.
- Return type:
tuple
- full_dia.calib.plot_fit_im(y_measure, y_pred_before, y_pred_after, xout, yout, bias_old, bias, fname)[source]¶
For developing.
- full_dia.calib.plot_fit_mz(x1, y1, x2, y2, x_fit, y_fit, bias_old, bias, fname)[source]¶
For developing.
- full_dia.calib.plot_fit_rt(x, y, x1, y1, x11, y11, x_fit, y_fit, tol_rt, bias, fname)[source]¶
For developing.
- full_dia.calib.polish_ends(x_screen2, y_screen2, tol_bins)[source]¶
From Calib-RT algorithm: https://doi.org/10.1093/bioinformatics/btae417
- Return type:
tuple
- full_dia.calib.screen_by_graph(x_screen1, y_screen1)[source]¶
From Calib-RT algorithm: https://doi.org/10.1093/bioinformatics/btae417
- Return type:
tuple
- full_dia.calib.screen_by_hist(x_data, y_data, bins)[source]¶
From Calib-RT algorithm: https://doi.org/10.1093/bioinformatics/btae417
- Return type:
tuple
full_dia.cfg module¶
full_dia.cross module¶
- full_dia.cross.drop_batches_mismatch(df)[source]¶
Delete the decoy peptides if they are same to target peptides. Also delete the duplicated decoy peptides.
- Return type:
DataFrame
- full_dia.cross.drop_runs_mismatch(df)[source]¶
Delete the decoy peptides if they are same to target peptides. Delete the duplicated decoy/target peptides.
- Return type:
DataFrame
- full_dia.cross.group_by_species(species, ratio_cut=0.05)[source]¶
Determine the species that over the ratio_cut ratio.
- Return type:
list
- full_dia.cross.perform_global(lib, top_k_fg, top_k_pr, multi_ws)[source]¶
Compute the q_pr_global; Assemble pep to pg; Compute the q_pg_global; Compute the pg quantification.
- Parameters:
lib (library.Library) – Provide the pep and protein map info.
top_k_fg (int) – How many frag ions will be used to compute pep quantification values.
top_k_pr (int) – How many peps will be used to compute protein quantification values.
multi_ws (list) – Paths of multiple .d files
Returns
----------
df_global (pd.DataFrame) –
- Columns:
[pr_id, decoy, cscore_pr_run_x] [cscore_pr_global_first, q_pr_global_first] [proteotypic, protein_id, protein_name, protein_group] [cscore_pg_global_first, q_pg_global_first] [quant_pr_0, quant_pr_1, …, quant_pr_N] [quant_pg_0, quant_pg_1, …, quant_pg_N]
- Return type:
DataFrame
- full_dia.cross.quant_pr_autoencoder(df_global, top_k_fg)[source]¶
Quantify peptides by summing fragment-ion intensities smoothed by an autoencoder.
- Parameters:
df_global (pd.DataFrame) – Provide the quantification values and SA values of fragment ions.
top_k_fg (int) – How many fragment ions to sum to a pep quantification value.
- Returns:
df_global – Add columns: quant_pr_raw, quant_pr_deep, quant_pr_mix
- Return type:
pd.DataFrame
full_dia.dataloader module¶
full_dia.decoy module¶
- full_dia.decoy.convert_seq_to_mass(simple_seq)[source]¶
A fast method to convert simple sequence to mass list.
- Parameters:
simple_seq (pd.Series) – Each element is a stripped sequence.
- Returns:
- masslist
The mass list from stripped seq.
- seq_len_cumsumnp.array
The cumulative length from simple_seq.
- Return type:
tuple
- full_dia.decoy.gpu_cal_fg_mz(n, fg_num, mass_v, seq_len_cumsum_v, fg_type_m, fg_len_m, fg_charge_m, result_fg_mz)[source]¶
Calculate the fragment ion m/z values of decoys. Each thread is for an ion of a pr.
- full_dia.decoy.make_decoys(df_target, fg_num, method, value=1)[source]¶
Generate decoys with modified fragment m/z values only.
- Parameters:
df_target (pd.DataFrame) – Columns: simple_seq, pr_id.
fg_num (int) – 12 by default.
method (str) – “reverse” | “mutate” | “shift”.
value (int) – Decoy is 1; shadow is 2.
- Returns:
df_decoy – Copy of df_target with modified fragment m/z values only.
- Return type:
pd.DataFrame
full_dia.deepmall module¶
- full_dia.deepmall.extract_mall(df_batch, map_gpu_ms1, map_gpu_ms2, tol_im, tol_ppm)[source]¶
Extract top-12 fragment ions mall from ms.
- Parameters:
df_batch (pd.DataFrame) – Provide the pr info.
map_gpu_ms1 (dict) – Provide the MS1 data.
map_gpu_ms2 (dict) – Provide the MS2 data.
tol_im (float) – Tolerance of ion mobility.
tol_ppm (float) – Tolerance of ppm.
- Returns:
Mall – Contain the fragment ions info: pred_heights, xics, ppms, bias_ims, fg_type, SA, areas, snr
- Return type:
torch.Tensor
- full_dia.deepmall.scoring_mall(model_mall, df_input, map_gpu_ms1, map_gpu_ms2, tol_im, tol_ppm)[source]¶
Extract and score the Malls for elution groups.
- Parameters:
model_mall (torch.nn.Module) – The trained DeepMall model.
df_input (pd.DataFrame) – Provide the pr info.
map_gpu_ms1 (dict) – Provide the MS1 data.
map_gpu_ms2 (dict) – Provide the MS2 data.
tol_im (float) – Tolerance of ion mobility.
tol_ppm (float) – Tolerance of ppm.
- Returns:
- prednp.ndarray
The scores by DeepMall.
- featurenp.ndarray
The features by DeepMall.
- Return type:
tuple
full_dia.deepmap module¶
- full_dia.deepmap.extract_maps(df_batch, idx_start_m, locus_num, cycle_num, map_im_size, map_gpu_ms1, map_gpu_ms2, tol_ppm, tol_im_map, im_gap, neutron_num)[source]¶
Extrac maps for multi elution groups of a pr.
- Parameters:
df_batch (pd.DataFrame) – Provide pr info.
idx_start_m (np.ndarray) – Cycle start index.
locus_num (int) – How many locus to extract for a pr.
cycle_num (int) – How many cycle to extract for a pr. Default 13.
map_im_size (float) – Default 50.
map_gpu_ms1 (dict) – Provide the MS1 data.
map_gpu_ms2 (dict) – Provide the MS2 data.
tol_ppm (float) – The tolerance of ppm.
tol_im_map (float) – The tolerance of im.
im_gap (float) – The bin width of im in a map.
neutron_num (int) – Specify the neutron num.
- Returns:
Maps – The extracted maps.
- Return type:
torch.Tensor
- full_dia.deepmap.extract_scoring_big(model_center, model_big, df_input, map_gpu_ms1, map_gpu_ms2, cycle_num, map_im_gap, map_im_dim, ppm_tolerance, im_tolerance)[source]¶
Extrac and scoring Maps using DeepProfile-14 and DeepProfile-56.
- Parameters:
model_center (torch.nn.Module) – The DeepProfile-14 model.
model_big (torch.nn.Module) – The DeepProfile-56 model.
df_input (pd.DataFrame) – Provide the pr info.
map_gpu_ms1 (dict) – Provide the MS1 data.
map_gpu_ms2 (dict) – Provide the MS2 data.
cycle_num (int) – How many cycle to extract for a pr. Default 13.
map_im_gap (float) – The bin width of im in a map.
map_im_dim (float) – The dimension of im in a map.
ppm_tolerance (float) – The ppm tolerance.
im_tolerance (float) – The mobility tolerance.
- Returns:
- pred_vlist[np.ndarray]
The deep scores for [14-left, 14-center, 14-1H, 14-2H, 56-total]
- feature_vlist[np.ndarray]
The deep features for [14-left, 14-center, 14-1H, 14-2H, 56-total]
- Return type:
tuple
- full_dia.deepmap.find_first_index(scan_mz, query_left, query_right)[source]¶
Find first index that match the query value
- Parameters:
scan_mz (cuda.array) – MS data of a cycle with m/z ascending order.
query_left (float) – The target m/z value with - ppm.
query_right (float) – The target m/z value with + ppm.
- Returns:
best_j – The index of the first m/z that matches the query.
- Return type:
int
- full_dia.deepmap.gpu_bin_map(n, cycle_num, idx_start_v, ms1_scan_seek_idx, ms1_scan_im, ms1_scan_mz, ms1_scan_height, ms2_scan_seek_idx, ms2_scan_im, ms2_scan_mz, ms2_scan_height, query_mz_m, ppm_tolerance, query_im_v, im_tolerance, im_gap, result_maps, ms1_ion_num)[source]¶
Each CUDA thread generates a map (cycle + mobility + intensity) of an elution group. When multiple signals fall into the same bin, retain only the one with the highest intensity.
- full_dia.deepmap.gpu_bin_maps(n, locus_num, cycle_num, idx_start_m, ms1_scan_seek_idx, ms1_scan_im, ms1_scan_mz, ms1_scan_height, ms2_scan_seek_idx, ms2_scan_im, ms2_scan_mz, ms2_scan_height, query_mz_m, tol_ppm, query_im_v, tol_im_map, im_gap, result_maps, ms1_ion_num)[source]¶
maps: [n_pr, n_locus, n_ion, n_cycle, n_im_bin] Each thread generates maps for multi elution groups of a pr When multiple signals fall into the same bin, retain only the one with the highest intensity.
- full_dia.deepmap.load_models(dir_center=None, dir_big=None)[source]¶
Load DeepProfile-14 and DeepProfile-56 models.
- full_dia.deepmap.scoring_maps(model, df_input, map_gpu_ms1, map_gpu_ms2, cycle_num, map_im_gap, map_im_dim, ppm_tolerance, im_tolerance, neutron_num, return_feature=True)[source]¶
Extract and score the Profile-14 maps.
- Parameters:
model (torch.nn.Module) – DeepProfile-14
df_input (pd.DataFrame) – Provide the pr info.
map_gpu_ms1 (dict) – Provide the MS1 data.
map_gpu_ms2 (dict) – Provide the MS2 data.
cycle_num (int) – How many cycle to extract for a pr. Default 13.
map_im_gap (float) – The bin width of im in a map.
map_im_dim (float) – The dimension of im in a map.
ppm_tolerance (float) – The tolerance of ppm.
im_tolerance (float) – The tolerance of im.
neutron_num (int) – Specify the neutron num.
return_feature (bool, default=True) – Whether to return feature or not.
- Returns:
- predtorch.Tensor
The scores by DeepProfile-14.
- featuresnp.ndarray
The features by DeepProfile-14.
- Return type:
tuple
full_dia.fdr module¶
- full_dia.fdr.adjust_rubbish_q(df, batch_num)[source]¶
If the #ids are less than 5000, then we can set the rubbish_q_cut to 0.75 to save more #ids.
- Return type:
None
- full_dia.fdr.cal_q_pg(df_input_raw, q_pr_cut, run_or_global)[source]¶
Calculate the q values of pgs based on the assigned peptides.
- Parameters:
df_input_raw (pd.DataFrame) – Provide columns: “strip_seq”, “cscore_pr”, “decoy”, “q_pr”.
q_pr_cut (float) – The q value cut to select which peps will be used to calculate the cscore of a pg.
run_or_global (str) – Specify the calculation on “run” or “global” level.
- Returns:
df_res – Add new columns: “cscore_pg”, “q_pg”
- Return type:
pd.DataFrame
- full_dia.fdr.cal_q_pr_batch(df, batch_size, n_model, model_trained=None, scaler=None)[source]¶
Calculate the q values of peptides in a batch.
- Parameters:
df (pd.DataFrame) – Provide columns: “pr_id”, “score_”, “decoy”.
batch_size (int) – The batch size for training an MLP.
n_model (int) – The number of models to ensemble.
model_trained (list[MLPClassifier], default=None) – The trained ensemble models or None.
scaler (StandardScaler, default=None) – The trained scaler or None.
- Returns:
- dfpd.DataFrame
Add new columns: “cscore_pr_run”, “group_rank”, “q_pr_run”
- model_trainedlist[MLPClassifier]
The trained ensemble models.
- scalerStandardScaler
The trained scaler.
- Return type:
tuple
- full_dia.fdr.cal_q_pr_combined(df, batch_size, n_model)[source]¶
Calculate the q values of peptides across the batches using DIA-NN’s model.
- Parameters:
df (pd.DataFrame) – Provide columns: “pr_id”, “score_”, “decoy”.
batch_size (int) – The batch size for training an MLP.
n_model (int) – The number of models to ensemble.
- Returns:
df – Add new columns: “cscore_pr_run”, “q_pr_run”
- Return type:
pd.DataFrame
- full_dia.fdr.cal_q_pr_core(df, run_or_global)[source]¶
The core function to calculate the q values of peps using DIA-NN’s model.
- Parameters:
df (pd.DataFrame) – Provide “cscore” values of peptides.
run_or_global (str) – Specify the calculation on “run” or “global” level.
- Returns:
df – Add a new column: ‘q_pr’.
- Return type:
pd.DataFrame
full_dia.fxic module¶
- full_dia.fxic.cal_coelution_by_gaussion(xics, window_points, valids_num)[source]¶
Coelution sa scores by sliding windows methods.
- Parameters:
xics (numba.cuda.devicearray.DeviceNDArray) – The extracted xics on GPU device.
window_points (int) – Fixed to 7 cycles when computing the SA scores.
valids_num (int) – The number of fragment ions + 2 (pr and unfragmented pr)
- Returns:
- scorestorch.Tensor
The mean SA coelution scores for peak groups.
- scores_rawtorch.Tensor
The raw SA coelution scores for peak groups.
- Return type:
tuple
- full_dia.fxic.cal_measure_im(locus_ims, locus_sas, good_cut=0.5)[source]¶
Calculate the measure_im for each locus, weighting with the sa values.
- Parameters:
locus_ims (np.ndarray) – Ion mobility values for locus. Dimension: [n_locus, n_ion]
locus_sas (np.ndarray) – SA scores for locus. Dimension: [n_locus, n_ion]
good_cut (float, default=0.5) – Only considering the ion with good_cut threshold
- Returns:
locus_im – The weighted mean ion mobility values. Dimension: [n_locus]
- Return type:
np.ndarray
- full_dia.fxic.concat_nonzero_locus(locus, scores_sa, scores_sa_m)[source]¶
After screening locus by sa, sa_input has much zero values. Select and concat the nonzero values to vectors.
- Parameters:
locus (np.ndarray) – The locus of extracted xics. Dimension: [n_pep, n_locus]
scores_sa (torch.Tensor) – The SA locus scores of extracted xics. Dimension: [n_pep, n_locus]
scores_sa_m (torch.Tensor) – The SA ion scores of extracted xics. Dimension: [n_pep, n_ion, n_locus]
- Returns:
- locus_vnp.ndarray
The candidate locus after screening in a vector.
- locus_numnp.ndarray
Indicate how many locus retained after screening for a peptide.
- locus_sa_vnp.ndarray
The SA locus scores of candidate locus.
- locus_sasnp.ndarray
The SA ion scores of candidate locus.
- Return type:
tuple
- full_dia.fxic.estimate_xic_boundary(xics, sa_gausion_m)[source]¶
Exstimate the boundary of an elution group in cycles.
- Parameters:
xics (torch.Tensor) – Dimension: [n_pep, n_ion, 13]
sa_gausion_m (torch.Tensor) – Dimension: [n_pep, n_ion]
- Returns:
- left_idx_1dnp.ndarray
The start index for locus.
- right_idx_1dnp.ndarray
The end index for locus.
- Return type:
tuple
- full_dia.fxic.extract_xics(df, map_gpu_ms1, map_gpu_ms2, ppm_tolerance, im_tolerance, rt_tolerance=None, cycle_num=None, scope='center', only_xic=False, by_pred=True)[source]¶
Extrac XICs from centroid ms data.
- Parameters:
df (pd.DataFrame) – Provide the info of locus and peptide.
map_gpu_ms1 (dict) – MS1 data.
map_gpu_ms2 (dict) – MS2 data.
ppm_tolerance (float) – The tolerance of ppm.
im_tolerance (float) – The tolerance of mobility.
rt_tolerance (float, default=None) – The tolerance of rt. If None, from gradient start to end.
cycle_num (int, default=None) – The cycle num of xics. If None, from cycle start to end.
scope (str, default="center") – Determine which ions to extract. “center”: pr, pr_unfrag, fragment ions. “big”: “center” and the corresponding isotope ions. “top6”: top-6 fragment ions.
only_xic (bool, default=False) – Whether return xics with or without im/m/z
by_pred (bool, default=True) – Use measure_im or pred_im when extracting xics.
- Returns:
- cycles_idxnp.ndarray
Each XIC time point corresponds to a cycle index.
- rtsnp.ndarray
Each XIC signal point corresponds to a retention time.
- imsnp.ndarray
Each XIC signal point corresponds to a ion mobility.
- mzsnp.ndarray
Each XIC signal point corresponds to a measured m/z value.
- xicsnumba.cuda.devicearray.DeviceNDArray
The extracted xics on GPU device.
- Return type:
tuple
- full_dia.fxic.find_maximum(scan_im, scan_mz, scan_height, query_left, query_right, query_im_left, query_im_right)[source]¶
Find the maximum intensity value with tol for query in centroided data
- full_dia.fxic.gpu_cal_sa(v)[source]¶
Calculate the sa between V and Gaussian Vector: [0.0044, 0.054, 0.242, 0.399, 0.242, 0.054, 0.0044]
- full_dia.fxic.gpu_extract_xics(n, cycle_nums, idx_start_v, ms1_scan_seek_idx, ms1_scan_im, ms1_scan_mz, ms1_scan_height, ms2_scan_seek_idx, ms2_scan_im, ms2_scan_mz, ms2_scan_height, query_mz_m, ppm_tolerance, query_im_v, im_tolerance, ms1_ion_num, result_im, result_mz, result_xic, only_xic)[source]¶
Extract xics from MS data for target ions. Each thread works for an ion and make a xic (profile).
- full_dia.fxic.gpu_sa_gausion_core(block_num, xics, scores, window_points, valids_num)[source]¶
Using share-memory to calculate the sa for each locus
- full_dia.fxic.gpu_simple_smooth(input_xics)[source]¶
Smooth the xics (n_pep * n_ion * n_cycle) extracted from raw MS data.
- full_dia.fxic.gpu_simple_smooth_core(n, input_xics, output)[source]¶
The core of gpu_simple_smooth using a weighted mean method.
- full_dia.fxic.reserve_sa_maximum(x)[source]¶
If x > x-1 and x > x+1, x is local maximum will be saved. If not, assign 0
- Parameters:
x (torch.Tensor) – SA raw values with dimension: [n_pep, n_cycle]
- Returns:
x – SA values after suppression with dimension: [n_pep, n_cycle]
- Return type:
torch.Tensor
- full_dia.fxic.screen_locus_by_deep(df_batch, locus_num, top_deep_q)[source]¶
Screen locus of a pr by deep scores.
- Parameters:
df_batch (pd.DataFrame) – Provide columns: “pr_id”, “seek_score_deep”, “seek_score_sa_x_deep” n_pep * n_locus rows
top_deep_q (float) – Threshold for deep_x / deep_max
- Returns:
df_batch – Less rows after screen.
- Return type:
pd.DataFrame
- full_dia.fxic.screen_locus_by_sa(scores_sa, top_sa_cut)[source]¶
Screen multi locus of a pr that satisfy: local maximum, quantile1, quantile2
- Parameters:
scores_sa (np.ndarray) – Scores of locus.
top_sa_cut (float) – Quantile threshold on sa level
- Returns:
scores_sa – Bad points have already assigned zero values.
- Return type:
np.ndarray
full_dia.library module¶
- class full_dia.library.Library(dir_lib)[source]¶
Bases:
objectReader class of the spectral library.
- assign_fg_mz(df)[source]¶
Assign fg mz values based on the precursor index from raw df_pr.
- Parameters:
df (pd.DataFrame) – Provide the “pr_index” column.
- Returns:
df – Add the m/z value columns of fragment ions.
- Return type:
pd.DataFrame
- assign_proteins(df)[source]¶
Assign proteins based on the precursor index from raw df_map.
- Parameters:
df (pd.DataFrame) – Provide the “pr_index” column.
- Returns:
df – Add new columns: “protein_id”, “protein_name”, “proteotypic”
- Return type:
pd.DataFrame
- check_lib(df)[source]¶
- Return type:
None
- Check spectral library:
column names, modifications, charges, loss, proteins, length
- construct_dfs(df)[source]¶
Construct the df_pr and df_map from DIA-NN’s .parquet library.
- Parameters:
df (pd.DataFrame) – The raw DIA-NN’s .parquet file.
- Returns:
- df_prpd.DataFrame
Each row corresponds to a precursor and its fragment information.
- df_mappd.DataFrame
Each row represents the protein information corresponding to the peptide in the same row of df_pr.
- Return type:
tuple
full_dia.log module¶
- class full_dia.log.Logger[source]¶
Bases:
objectThis class manages a singleton-style logger instance and provides a class method to (re)configure file and console handlers with a custom formatter.
- logger = <Logger Full-DIA (DEBUG)>¶
- classmethod set_logger(dir_out, is_time_name=False)[source]¶
Configure file and console logging handlers.
- Parameters:
dir_out (pathlib.Path) – Output directory where the log file will be written.
is_time_name (bool, default=False) – Whether to use a timestamp-based log file name. If False, a fixed name report.log.txt is used.
- Return type:
None
full_dia.main module¶
full_dia.models module¶
- class full_dia.models.DeepMall(input_dim, feature_dim)[source]¶
Bases:
ModuleIt’s used to score the intensity similarity with kinds of weights.
- class full_dia.models.DeepMap(map_channels, nn_in_features=220)[source]¶
Bases:
ModuleIn paper, it’s also called DeepProfile
- forward(maps, batch_valid_num)[source]¶
- Parameters:
maps (torch.Tensor) – Dimension: [n_locus, n_ion, n_cycle, n_im_bin]
batch_valid_num (torch.Tensor) – How many real ions for maps. Dimension: [n_locus]
- Returns:
- feature_maptorch.Tensor
The last feature layer.
- resulttorch.Tensor
The inference result.
- Return type:
tuple
full_dia.polish module¶
Calculate how many ions in fg_mz_1 are matched to fg_mz_2.
- full_dia.polish.make_interference_areas_zero(df_input, tol_locus=3, tol_im=0.03, tol_ppm=20)[source]¶
In global analysis, if a fragment ion maybe produced by a more confident pr, its area and SA score will be made to zeros.
- Parameters:
df_input (pd.DataFrame) – Provide the identification info on the run level.
tol_locus (int, default = 3) – If the bias locus of two peps falls in this tolerance, they are competitors.
tol_im (float, default = 0.03) – If the bias im of two peps falls in this tolerance, they are competitors.
tol_ppm (float, default = 20) – If the ppm of two fragment ions falls in this tolerance, they are competitors.
- Returns:
df – The intensities and SA values of fragment ions that lose the competition are set to zero.
- Return type:
pd.DataFrame
- full_dia.polish.make_interference_areas_zero_core(swath_id_v, measure_locus_v, measure_im_v, fg_mz_m, area_m, sa_m, tol_locus, tol_im, tol_ppm, other_idx)[source]¶
The big fish eats the small fish. If a fg ion shared by more confident pr, the sa and fg_mz will be zeros.
- full_dia.polish.polish_prs(df_input, tol_im=0.03, tol_ppm=20, tol_sa_ratio=0.75, tol_share_num=5)[source]¶
As individual DIA signals can be shared among multiple peptides, an additional post-processing step is required to refine the results.
- Parameters:
df_input (pd.DataFrame) – Provide the pr identification results.
tol_im (float, default = 0.03) – If the bias im of two peps falls in this tolerance, they are competitors.
tol_ppm (float, default = 20) – If the ppm of two peps falls in this tolerance, they are competitors.
tol_sa_ratio (float, default = 0.75) – If all fragment ions of a peptide have SA values above the threshold and are more likely to originate from more confident peptides, the peptide is removed.
tol_share_num (int, default=5) – If all fragment ions of a peptide have matched number above the threshold and are more likely to originate from more confident peptides, the peptide is removed.
- Returns:
df – The polished df.
- Return type:
pd.DataFrame
full_dia.quant module¶
- full_dia.quant.grid_xic_best(df_batch, ms1_centroid, ms2_centroid)[source]¶
The profile with the highest SA among other fragment ion profiles is selected as the best profile. Different tolerance combinations are then traversed to extract XICs corresponding to the highest SA with the best profile.
- Parameters:
df_batch (pd.DataFrame) – Provide the precursor information.
ms1_centroid (dict) – The MS1 data.
ms2_centroid (dict) – The MS2 data.
- Returns:
- areasnp.ndarray
Areas by best profiles.
- sasnp.ndarray
The corresponding SA scores.
- Return type:
tuple
- full_dia.quant.interference_correction(xics, best_profile)[source]¶
DIA-NN’s method to correct the interference of profiles.
- Return type:
Tensor
- full_dia.quant.interp_xics(x, rts_input, target_dim)[source]¶
Interpolate XIC along the cycle to target dimension. Also update the rts of new time points.
- Return type:
tuple
- full_dia.quant.mask_tensor(xic, left, right)[source]¶
Set the edge regions of the XIC to zero.
- Parameters:
xic (torch.Tensor) – Dimension: [n_xic, n_ion, n_cycle]
left (torch.Tensor) – Indicate the left region of the XIC.
right (torch.Tensor) – Indicate the right region of the XIC.
- Returns:
xic – The edge regions have been set to zero.
- Return type:
torch.Tensor
- full_dia.quant.quant_center_ions(df_input, ms)[source]¶
A novel xic extraction method to quantify fragment ions.
- Parameters:
df_input (pd.DataFrame) – Provide the identification information of precursors.
ms (tims.Tims) – Provide the MS data.
- Returns:
df – Add new columns: “score_ion_quant” and “score_ion_sa”.
- Return type:
pd.DataFrame
- full_dia.quant.select_best_profile(x_profile)[source]¶
Select the best profile if it has the highest SA among other profiles.
- Return type:
Tensor
- full_dia.quant.select_other_profiles(x_profile, best_profile)[source]¶
Select the profile from different tolerance conditions that has the highest SA with the best profile.
- Parameters:
x_profile (torch.Tensor) – XIC profiles using different tolerances. Dimension: [n_pep, tol, n_ion, n_cycle]
best_profile (torch.Tensor) – The best profile. Dimension: [n_pep]
- Returns:
- best_xtorch.Tensor
Each profile has the highest SA with the best profile.
- sastorch.Tensor
The SA scores of profiles.
- Return type:
tuple
full_dia.refine module¶
- full_dia.refine.construct_train_data(df_top, ms)[source]¶
Construct maps and mall data. Positive samples: [Apex, Apex + 1, Apex - 1] locus from target peak groups. Negative samples: top-3 (by SA score) locus from target peak groups.
- Parameters:
df_top (pd.DataFrame) – Provide the identification information.
ms (tims.Tims) – MS data.
- Returns:
- maps_centernp.ndarray
The maps data for monoisotope ions. Dimension: [n_sample, 14, n_cycle, n_im_bin].
- maps_bignp.ndarray
The maps data for monoisotope + isotope ions. Dimension: [n_sample, 56, n_cycle, n_im_bin].
- mallsnp.ndarray
The mall data for the calculation of intensity similarity.
- center_ion_numsnp.ndarray
Valid ions num for each sample.
- labelsnp.ndarray
Positive or negative.
- Return type:
tuple
- full_dia.refine.eval_one_epoch(trainloader, model)[source]¶
Return the accuracy of the model on the validation set.
- Return type:
float
- full_dia.refine.make_dataset_mall(malls, valid_num, labels, train_ratio=0.9)[source]¶
Make pytorch dataset and split it into train and validation sets for Mall data.
- Parameters:
malls (np.ndarray) – The mall data.
valid_num (np.ndarray) – Valid ion num of each mall.
labels (np.ndarray) – The labels.
train_ratio (float, default=0.9) – The ratio between train set and validation set.
- Returns:
train : torch.utils.data.Dataset eval : torch.utils.data.Dataset Mall’s feature dimention.
- Return type:
tuple
- full_dia.refine.make_dataset_maps(maps, valid_num, labels, train_ratio, maps_type)[source]¶
Make pytorch dataset and split it into train and validation sets for Map data.
- Parameters:
maps (np.ndarray) – The map/profile data.
valid_num (np.ndarray) – Valid ion num of each map.
labels (np.ndarray) – The labels.
train_ratio (float) – The ratio between train set and validation set.
maps_type (str) – “Profile-14”: for 14 monoisotope ions (pr, pr_unfrag, 12 fragment ions) “Profile-56”: for monoisotope + isotope ions (14 * 4)
- Returns:
train : torch.utils.data.Dataset eval : torch.utils.data.Dataset
- Return type:
tuple
- full_dia.refine.refine_models(df_top, ms, model_center, model_big)[source]¶
Refine/Train models using the first round identification result.
- Parameters:
df_top (pd.DataFrame) – Provide the identification result of peptides.
ms (tims.Tims) – MS data.
model_center (torch.nn.Module) – DeepProfile-14 for 14 monoisotope ions.
model_big (torch.nn.Module) – DeepProfile-56 for monoisotope + isotope ions.
- Return type:
tuple- Returns:
The fine-tuned model_center, model_big and the trained model_mall.
- full_dia.refine.retrain_model_map(model_maps, maps, valid_nums, labels, maps_type, epochs)[source]¶
Fine-tune the model and return the model with optimal performance.
- Parameters:
model_maps (torch.nn.Module) – The pretrained DeepProfile model.
maps (np.ndarray) – Run-specific profile/map data for fine-tuning.
valid_nums (np.ndarray) – Valid ion num of each train sample.
labels (np.ndarray) – The labels of train samples.
maps_type (str) – “Profile-14”: for 14 monoisotope ions (pr, pr_unfrag, 12 fragment ions) “Profile-56”: for monoisotope + isotope ions (14 * 4)
epochs (int) – Number of maximum epochs.
- Returns:
model_best – The model with optimal performance.
- Return type:
torch.nn.Module
- full_dia.refine.train_model_mall(malls, valid_num, labels, epochs)[source]¶
Train the model DeepMall from scratch on the training set and return the model with optimal performance.
- Parameters:
malls (np.ndarray) – The mall data.
valid_num (np.ndarray) – Valid ion num of each train sample.
labels (np.ndarray) – The labels of train samples.
epochs (int) – Number of maximum epochs.
- Returns:
model_best – The model with optimal performance.
- Return type:
torch.nn.Module
full_dia.scoring module¶
- full_dia.scoring.numba_scoring_putatives(groups, sa_v, center_v, big_v)[source]¶
Use Numba to accelerate the computation of the maximum and sum scores across different candidate peak groups for the same precursor.
- full_dia.scoring.score_locus(df_target, ms, model_center, model_big)[source]¶
Calculate function-based and learning-based scores for PSMs.
- Parameters:
df_target (pd.DataFrame) – Provide the PSM information.
ms (tims.Tims) – MS data.
model_center (torch.nn.Module) – DeepProfile-14
model_big (torch.nn.Module) – DeepProfile-56
- Returns:
df – Scores have been appended to the DataFrame in columns prefixed with “score_”.
- Return type:
pd.DataFrame
- full_dia.scoring.scoring_by_cross(df_batch, is_update=False)[source]¶
- Return type:
DataFrame
- Compute scores combinations as additional scores:
- Before refine phase (is_update: False):
sa_center - sa_left
deep_center - deep_left
sa_center * deep_center
sa_center * deep_big
- After refine phase (is_update: True):
deep_center - deep_left
sa_center * deep_center
sa_center * deep_big
- full_dia.scoring.scoring_by_deep_layer(df_batch, features_deep_v, x)[source]¶
Append the feature layers scores of DeepProfile to df.
- Parameters:
df_batch (pd.DataFrame) – The object.
features_deep_v (list) – The feature layers scores of DeepProfile.
x (str) – “pre”: scores are from the pretrain models. “refine_p1”: scores are from the refinement models with 0.5 * ppm. “refine_p2”: scores are from the refinement models with 0.25 * ppm.
- Returns:
df – The feature layers scores of DeepProfile have been appended.
- Return type:
pd.DataFrame
- full_dia.scoring.scoring_by_deep_prob(df_batch, scores_deep_v, x)[source]¶
Append the inference scores of DeepProfile to df.
- Parameters:
df_batch (pd.DataFrame) – The object.
scores_deep_v (list) – The inference scores of DeepProfile.
x (str) – “pre”: scores are from the pretrain models. “refine”: scores are from the refinement models. “refine_p1”: scores are from the refinement models with 0.5 * ppm. “refine_p2”: scores are from the refinement models with 0.25 * ppm.
- Returns:
df – The inference scores of DeepProfile have been appended.
- Return type:
pd.DataFrame
- full_dia.scoring.scoring_center_im(df_batch, ims_input)[source]¶
- Return type:
DataFrame
- Calculate mobility related scores with the center cycle MS/MS.
imbias for 14 ions
mean
mean weighting by sa
mean of top-6 weighting by sa
- full_dia.scoring.scoring_center_mz(df_batch, mzs_input)[source]¶
- Return type:
DataFrame
- Calculate ppm related scores with the center cycle MS/MS.
ppm for 14 ions
mean
mean weighting by sa
mean of top-6 weighting by sa
- full_dia.scoring.scoring_center_snr(df_batch, xics)[source]¶
- Return type:
DataFrame
Calculate SNR related scores with the center cycle MS/MS. Signal is the apex intensiy, noise is the median of profile.
snrs for 14 ions
mean
mean weighting by sa
mean of top-6 weighting by sa
- full_dia.scoring.scoring_main_elution(df_batch, xics, x)[source]¶
- Return type:
DataFrame
- Calculate the following elution scores based on the monoisotope types specified by x:
x: [‘center’, ‘center_p1’, ‘center_p2’] 1. The sa for each of the 14 ions 2. mean value of 14 ions 3. mean value of top-6 4. mean value w/o norm of remaining ions 5. sum of top1/2/3 b ions
- full_dia.scoring.scoring_meta(df)[source]¶
- Return type:
DataFrame
- Calculate peptide meta information related scores:
mz (scoring_center_mz)
charge(one-hot encoding using 1, 2, 3, 4)
sequence length
fg_num
library fragment ions intensities
- full_dia.scoring.scoring_other_elution(df_batch, xics, x)[source]¶
- Return type:
DataFrame
- Calculate the following elution scores based on the isotope types specified by x:
x: [‘left’, ‘1H’, ‘2H’] 1. sa for each of the 14 ions 2. mean value of 14 ions 3. mean value of top-6 4. mean value w/o norm of remaining ions
- full_dia.scoring.scoring_putatives(df)[source]¶
- Return type:
DataFrame
- Calculate competition-related scores as a pr has multiple candidate elution groups:
score-i - score-max
np.log(score-i/score.sum)
- full_dia.scoring.scoring_xic_intensity(df_batch, xics, rts)[source]¶
Calculate the intensity related scores. Only top-6 intensities are consideration. apex intensities: ms2_relative, ms2_total, ms1/ms2, similarity profile areas: ms2_relative, ms2_total, ms1/ms2, similarity
- Return type:
DataFrame
- full_dia.scoring.update_scores(df, ms, model_center, model_big, model_mall)[source]¶
- Return type:
DataFrame
- Calculate scores using the refined DeepProfile and the trained DeepMall.
DeepProfile: refined deep prob scores
DeepProfile: cross scores with refined deep prob scores
DeepProfile: refined deep prob and layer scores with 0.5 * ppm
DeepProfile: refined deep prob and layer scores with 0.25 * ppm
DeepMall: deep prob and layer scores
full_dia.search module¶
- full_dia.search.cal_recall_seek_locus(df_lib, ms, model, tol_rt, top_sa_cut, top_deep_cut)[source]¶
For developing.
- full_dia.search.search_core(lib)[source]¶
- Return type:
None
- Search on run level:
Seek seeds for calibration.
Seek candidate elution groups (locus) for each precursor.
Score the elution groups.
Calculate the FDR on run level.
Save all target precursor results and high-quality decoy precursor results.
- full_dia.search.seek_locus(df_target, ms, model_center, top_sa_q, top_deep_q)[source]¶
Seek candidate elution groups (locus) by: 1) scree with sa 2) screen with deep.
- Parameters:
df_target (pd.DataFrame) – Provide the precursor information.
ms (tims.Tims) – MS data.
model_center (torch.nn.Module) – DeepProfile-14, used to score the elution consistency of monoisotope ions.
top_sa_q (float) – First, candidate locus should have good SA scores compared to the best elution group.
top_deep_q (float) – Second, candidate locus should have good deep scores compared to the best elution group.
- Returns:
df – Each row is a candidate elution group.
- Return type:
pd.DataFrame
- full_dia.search.seek_seed(df_target, ms, model_center)[source]¶
Seek the best elution group for each pr using SA scoring methods. Then, model_center scores the elution group. Obviously, these elution groups may contain many false positives, but they can be used as seeds for calibration.
- Parameters:
df_target (pd.DataFrame) – The identification object from library.
ms (tims.Tims) – MS data.
model_center (torch.nn.Module) – DeepProfile will score the coelution consistency of the elution group.
- Returns:
df – One precursor will have one elution group.
- Return type:
pd.DataFrame
- full_dia.search.select_required_and_all_targets(df)[source]¶
Select good target and decoy peps for FDR calculation. Select all target to save, which avoids the second extraction in global analysis.
- Parameters:
df (pd.DataFrame) – The identification result for one batch.
- Returns:
- df_mainpd.DataFrame
Good target and decoy peps.
- df_otherpd.DataFrame
All target peps.
- Return type:
tuple
- full_dia.search.update_tolerance(df_lib, ms, model_center, model_big, sample_ratio)[source]¶
Update the tolerance based on identifications of a subset of target peptides.
- Parameters:
df_lib (pd.DataFrame) – The raw library.
ms (tims.Tims) – MS data.
model_center (torch.nn.Module) – DeepProfile-14, used to score the elution consistency of monoisotopes.
model_big (torch.nn.Module) – DeepProfile-56, used to score the elution consistency of monoisotopes + isotopes.
sample_ratio (float) – Sample the subset of library to expedite the update.
- Returns:
cfg.tol_rt, cfg.tol_im_xic, cfg.tol_ppm
- Return type:
None. The global tolerance values will be updated
full_dia.tims module¶
- class full_dia.tims.Tims(dir_d)[source]¶
Bases:
objectReader and centroiding the profile data for diaPASEF.
- construct_data_by_quadrupole(window_id)[source]¶
Construct profile and centroid data with specified window_id.
- Parameters:
window_id (int) – 0 refers to MS1, others refer to different quadrupole windows.
- Returns:
- all_rtnp.ndarray
The rt values of all cycles.
- cycle_valid_lensnp.ndarray
The number of profile ions per cycle.
- all_pushnp.ndarray
The 1/k0 values of profile ions.
- all_tofnp.ndarray
The m/z values of profile ions.
- all_heightnp.ndarray
The intensities of profile ions.
- cycle_valid_lens2np.ndarray
The number of centroided ions per cycle.
- all_push2np.ndarray
The 1/k0 values of centroided ions.
- all_tof2np.ndarray
The m/z values of centroided ions.
- all_height2np.ndarray
The intensities of centroided ions.
- Return type:
tuple
- copy_map_to_gpu(swath_id, centroid)[source]¶
Copy profile or centroided MS data to GPU.
- Parameters:
swath_id (int) – Specify the SWATH or quadrupole ID.
centroid (bool) – Specify the centroid profile or centroided MS data.
data. (Returns the MS1 chunk and MS2)
- Return type:
list
- property frame_nums¶
- get_centroid_tol_push()[source]¶
Calculate how many pushes should be considered as neighbors when centroiding.
- Return type:
int
- get_dia_quadrupole()[source]¶
Exact boundaries of the quadrupole partitioning. Return likes: [200, 250, 300, 350 … 1150, 1200]
- Return type:
ndarray
- full_dia.tims.numba_index_by_bool(idx, ims, mzs, heights)[source]¶
Value extraction using boolean indexing in Numba
- full_dia.tims.numba_paral_centroid(all_tof, all_push, all_height, tol_tof_sum, tol_tof_suppression, tol_push, cumlen)[source]¶
- Centroid the profile MS data using DIA-NN’s method:
Summarize intensity values within a window range (m/z + 1/K0).
Remove an aggregated point if a higher-intensity aggregated point exists in its neighborhood.
full_dia.utils module¶
- full_dia.utils.cal_acc_recall(path_ws, df_input, diann_q_pr=None, diann_q_pro=None, diann_q_pg=None, alpha_q_pr=None, alpha_q_pro=None, alpha_q_pg=None)[source]¶
For developing.
- full_dia.utils.cal_sa_by_np(x, y)[source]¶
Calculate the SA. The inputs have to be two-dimensions.
- Return type:
ndarray
- full_dia.utils.check_run_info(args)[source]¶
Print run info: version, platform, time, cpu, memory, gpu, cmd.
- Return type:
None
- full_dia.utils.clean_and_save(df_main, df_other, ws_single)[source]¶
Combine, clean and save the high-quality decoy and all target peptides identification information.
- Parameters:
df_main (pd.DataFrame) – High-quality decoy and target peptides.
df_other (pd.DataFrame) – All target peptides. This will avoid the reextraction in global analysis.
ws_single (Path) – The path to save file.
- Return type:
None
- full_dia.utils.convert_cols_to_diann(df, ws_single)[source]¶
Convert local column names to DIA-NN’s column names.
- Return type:
DataFrame
- full_dia.utils.convert_numba_to_tensor(x)[source]¶
Convert numba cuda array to torch cuda array with the help of cupy.
- Return type:
Tensor
- full_dia.utils.create_cuda_zeros(shape, dtype=torch.float32)[source]¶
Create the Numba CUDA zero array with the help of pytorch.
- Return type:
DeviceNDArray
- full_dia.utils.init_gpu_params(gpu_id)[source]¶
- Return type:
None
- Initialize GPU params according to the GPU ID:
for pytorch
for numba.cuda
Empirically adjust the batch size for GPU code.
- full_dia.utils.init_multi_ws(ws_global, out_name)[source]¶
Initialize the paths of .d files and the output folder.
- Return type:
None
- full_dia.utils.init_single_ws(ws_i, total, ws_single)[source]¶
Initialize the output path of single .d file.
- Return type:
None
- full_dia.utils.move_all_zeros_end(a)[source]¶
Move all zero elements in the matrix to the end of rows. Based on http://stackoverflow.com/a/42859463/3293881
- Return type:
ndarray
- full_dia.utils.print_ids(df, q_cut, pr_or_pg, run_or_global)[source]¶
Print the number of ids (pr or pg) in the run/global level.
- Return type:
None
- full_dia.utils.read_from_pq(ws_single, cols=None)[source]¶
Read .parquet file with specific columns.
- Return type:
DataFrame