full_dia package

Subpackages

Submodules

full_dia.assemble module

full_dia.assemble.assemble_pep_to_pg(df_input, q_cut_infer, run_or_global)[source]

Assemble peps to pgs.

Parameters:
  • df_input (pd.DataFrame) – Must have columns: protein_id and simple_seq/strip_seq

  • q_cut_infer (float) – Q-value cutoff to select peptides to assembly.

  • run_or_global ({'run', 'global'}) – Assemble on run or global level.

Returns:

df – Copy of df_input with a new column: protein_group.

Return type:

pd.DataFrame

full_dia.assemble.assemble_pep_to_pg_core(graph)[source]

Perform IDPicker algorithm on pep-protein bipartite graph.

Parameters:

graph (nx.Graph) – The bipartite graph of protein and peptide that needs assignment.

Returns:

protein_vlist

The proteins after assignment.

peptide_vlist of list

The peptides after assignment.

Return type:

tuple

full_dia.assemble.profile(func)[source]

full_dia.calib module

full_dia.calib.cal_im_recall(ws, df_lib, tol_im)[source]

For developing.

full_dia.calib.cal_rt_im_recall(ws, df_lib, tol_rt, tol_im)[source]

For developing.

full_dia.calib.cal_rt_recall(ws, df_lib, tol_rt)[source]

For developing.

full_dia.calib.cal_turning_point(y_data, y_pred)[source]

Determine the tolerance corresponding to the elbow point from the tolerance–coverage curve. See https://stackoverflow.com/questions/2018178/finding-the-best-trade-off-point-on-a-curve

Return type:

float

full_dia.calib.calib_im(df_tol, df_lib)[source]

Fit IM from iIM to real IM based on df_tol, then update the IM in df_lib.

Parameters:
  • df_tol (pd.DataFrame) – Columns: ‘score_deep’, ‘pred_iim’, ‘pred_im’, ‘measure_im’.

  • df_lib (DataFrame) – Columns: ‘pred_iim’.

Returns:

df_tolpd.DataFrame

df_tol with ‘pred_im’ and bias_im will less than tolerance.

df_libpd.DataFrame

Updated ‘pred_im’.

Return type:

tuple

full_dia.calib.calib_mz(df_seed, ms)[source]

Fit m/z and update the measured m/z values.

Parameters:
  • df_seed (pd.DataFrame) – Columns: ‘score_deep’, ‘measure_pr_mz’, ‘pr_mz’.

  • ms (tims.Tims) – Save the raw measured m/z values.

Returns:

df_seed – Nothing new to df_seed.

Return type:

pd.DataFrame

full_dia.calib.calib_rt(df_seed, df_lib)[source]

Fit RT from iRT to real RT based on df_seed, then update the RT in df_lib.

Parameters:
  • df_seed (pd.DataFrame) – Columns: ‘simple_seq’, ‘locus’, ‘score_deep’, ‘pred_irt’, ‘measure_rt’.

  • df_lib (DataFrame) – Columns: ‘pred_irt’.

Returns:

dfpd.DataFrame

df_seed with ‘pred_rt’ and bias_rt will less than tolerance.

df_libpd.DataFrame

Add a new column ‘pred_rt’.

Return type:

tuple

full_dia.calib.fit_by_lowess(x, y, frac)[source]

Perform lowesss fit on x and y arrays using frac value.

Return type:

tuple

full_dia.calib.plot_fit_im(y_measure, y_pred_before, y_pred_after, xout, yout, bias_old, bias, fname)[source]

For developing.

full_dia.calib.plot_fit_mz(x1, y1, x2, y2, x_fit, y_fit, bias_old, bias, fname)[source]

For developing.

full_dia.calib.plot_fit_rt(x, y, x1, y1, x11, y11, x_fit, y_fit, tol_rt, bias, fname)[source]

For developing.

full_dia.calib.polish_ends(x_screen2, y_screen2, tol_bins)[source]

From Calib-RT algorithm: https://doi.org/10.1093/bioinformatics/btae417

Return type:

tuple

full_dia.calib.profile(func)[source]
full_dia.calib.screen_by_graph(x_screen1, y_screen1)[source]

From Calib-RT algorithm: https://doi.org/10.1093/bioinformatics/btae417

Return type:

tuple

full_dia.calib.screen_by_hist(x_data, y_data, bins)[source]

From Calib-RT algorithm: https://doi.org/10.1093/bioinformatics/btae417

Return type:

tuple

full_dia.cfg module

full_dia.cfg.flatten_yaml(cfg_dict)[source]

Remove the first domain for a yaml file.

Return type:

dict

full_dia.cfg.load_default()[source]

Load the default.yaml file in cfg folder

full_dia.cfg.update_from_yaml(yaml_path)[source]

Update params from a yaml file provided by ‘-cfg_develop’ param.

full_dia.cross module

full_dia.cross.drop_batches_mismatch(df)[source]

Delete the decoy peptides if they are same to target peptides. Also delete the duplicated decoy peptides.

Return type:

DataFrame

full_dia.cross.drop_runs_mismatch(df)[source]

Delete the decoy peptides if they are same to target peptides. Delete the duplicated decoy/target peptides.

Return type:

DataFrame

full_dia.cross.group_by_species(species, ratio_cut=0.05)[source]

Determine the species that over the ratio_cut ratio.

Return type:

list

full_dia.cross.perform_global(lib, top_k_fg, top_k_pr, multi_ws)[source]

Compute the q_pr_global; Assemble pep to pg; Compute the q_pg_global; Compute the pg quantification.

Parameters:
  • lib (library.Library) – Provide the pep and protein map info.

  • top_k_fg (int) – How many frag ions will be used to compute pep quantification values.

  • top_k_pr (int) – How many peps will be used to compute protein quantification values.

  • multi_ws (list) – Paths of multiple .d files

  • Returns

  • ----------

  • df_global (pd.DataFrame) –

    Columns:

    [pr_id, decoy, cscore_pr_run_x] [cscore_pr_global_first, q_pr_global_first] [proteotypic, protein_id, protein_name, protein_group] [cscore_pg_global_first, q_pg_global_first] [quant_pr_0, quant_pr_1, …, quant_pr_N] [quant_pg_0, quant_pg_1, …, quant_pg_N]

Return type:

DataFrame

full_dia.cross.profile(func)[source]
full_dia.cross.quant_pr_autoencoder(df_global, top_k_fg)[source]

Quantify peptides by summing fragment-ion intensities smoothed by an autoencoder.

Parameters:
  • df_global (pd.DataFrame) – Provide the quantification values and SA values of fragment ions.

  • top_k_fg (int) – How many fragment ions to sum to a pep quantification value.

Returns:

df_global – Add columns: quant_pr_raw, quant_pr_deep, quant_pr_mix

Return type:

pd.DataFrame

full_dia.cross.save_report_result(df_global, multi_ws)[source]

Combine the df_run and df_global to save the final result in a report.parquet file.

Return type:

None

full_dia.dataloader module

class full_dia.dataloader.MallDataset(*args: Any, **kwargs: Any)[source]

Bases: Dataset

class full_dia.dataloader.MapDataset(*args: Any, **kwargs: Any)[source]

Bases: Dataset

full_dia.dataloader.profile(func)[source]

full_dia.decoy module

full_dia.decoy.cal_fg_mz_iso(df)[source]

Append the iso m/z values to df.

Return type:

DataFrame

full_dia.decoy.convert_seq_to_mass(simple_seq)[source]

A fast method to convert simple sequence to mass list.

Parameters:

simple_seq (pd.Series) – Each element is a stripped sequence.

Returns:

masslist

The mass list from stripped seq.

seq_len_cumsumnp.array

The cumulative length from simple_seq.

Return type:

tuple

full_dia.decoy.gpu_cal_fg_mz(n, fg_num, mass_v, seq_len_cumsum_v, fg_type_m, fg_len_m, fg_charge_m, result_fg_mz)[source]

Calculate the fragment ion m/z values of decoys. Each thread is for an ion of a pr.

full_dia.decoy.make_decoys(df_target, fg_num, method, value=1)[source]

Generate decoys with modified fragment m/z values only.

Parameters:
  • df_target (pd.DataFrame) – Columns: simple_seq, pr_id.

  • fg_num (int) – 12 by default.

  • method (str) – “reverse” | “mutate” | “shift”.

  • value (int) – Decoy is 1; shadow is 2.

Returns:

df_decoy – Copy of df_target with modified fragment m/z values only.

Return type:

pd.DataFrame

full_dia.decoy.profile(func)[source]
full_dia.decoy.sum_gpu(array)[source]

full_dia.deepmall module

full_dia.deepmall.extract_mall(df_batch, map_gpu_ms1, map_gpu_ms2, tol_im, tol_ppm)[source]

Extract top-12 fragment ions mall from ms.

Parameters:
  • df_batch (pd.DataFrame) – Provide the pr info.

  • map_gpu_ms1 (dict) – Provide the MS1 data.

  • map_gpu_ms2 (dict) – Provide the MS2 data.

  • tol_im (float) – Tolerance of ion mobility.

  • tol_ppm (float) – Tolerance of ppm.

Returns:

Mall – Contain the fragment ions info: pred_heights, xics, ppms, bias_ims, fg_type, SA, areas, snr

Return type:

torch.Tensor

full_dia.deepmall.profile(func)[source]
full_dia.deepmall.scoring_mall(model_mall, df_input, map_gpu_ms1, map_gpu_ms2, tol_im, tol_ppm)[source]

Extract and score the Malls for elution groups.

Parameters:
  • model_mall (torch.nn.Module) – The trained DeepMall model.

  • df_input (pd.DataFrame) – Provide the pr info.

  • map_gpu_ms1 (dict) – Provide the MS1 data.

  • map_gpu_ms2 (dict) – Provide the MS2 data.

  • tol_im (float) – Tolerance of ion mobility.

  • tol_ppm (float) – Tolerance of ppm.

Returns:

prednp.ndarray

The scores by DeepMall.

featurenp.ndarray

The features by DeepMall.

Return type:

tuple

full_dia.deepmap module

full_dia.deepmap.extract_maps(df_batch, idx_start_m, locus_num, cycle_num, map_im_size, map_gpu_ms1, map_gpu_ms2, tol_ppm, tol_im_map, im_gap, neutron_num)[source]

Extrac maps for multi elution groups of a pr.

Parameters:
  • df_batch (pd.DataFrame) – Provide pr info.

  • idx_start_m (np.ndarray) – Cycle start index.

  • locus_num (int) – How many locus to extract for a pr.

  • cycle_num (int) – How many cycle to extract for a pr. Default 13.

  • map_im_size (float) – Default 50.

  • map_gpu_ms1 (dict) – Provide the MS1 data.

  • map_gpu_ms2 (dict) – Provide the MS2 data.

  • tol_ppm (float) – The tolerance of ppm.

  • tol_im_map (float) – The tolerance of im.

  • im_gap (float) – The bin width of im in a map.

  • neutron_num (int) – Specify the neutron num.

Returns:

Maps – The extracted maps.

Return type:

torch.Tensor

full_dia.deepmap.extract_scoring_big(model_center, model_big, df_input, map_gpu_ms1, map_gpu_ms2, cycle_num, map_im_gap, map_im_dim, ppm_tolerance, im_tolerance)[source]

Extrac and scoring Maps using DeepProfile-14 and DeepProfile-56.

Parameters:
  • model_center (torch.nn.Module) – The DeepProfile-14 model.

  • model_big (torch.nn.Module) – The DeepProfile-56 model.

  • df_input (pd.DataFrame) – Provide the pr info.

  • map_gpu_ms1 (dict) – Provide the MS1 data.

  • map_gpu_ms2 (dict) – Provide the MS2 data.

  • cycle_num (int) – How many cycle to extract for a pr. Default 13.

  • map_im_gap (float) – The bin width of im in a map.

  • map_im_dim (float) – The dimension of im in a map.

  • ppm_tolerance (float) – The ppm tolerance.

  • im_tolerance (float) – The mobility tolerance.

Returns:

pred_vlist[np.ndarray]

The deep scores for [14-left, 14-center, 14-1H, 14-2H, 56-total]

feature_vlist[np.ndarray]

The deep features for [14-left, 14-center, 14-1H, 14-2H, 56-total]

Return type:

tuple

full_dia.deepmap.find_first_index(scan_mz, query_left, query_right)[source]

Find first index that match the query value

Parameters:
  • scan_mz (cuda.array) – MS data of a cycle with m/z ascending order.

  • query_left (float) – The target m/z value with - ppm.

  • query_right (float) – The target m/z value with + ppm.

Returns:

best_j – The index of the first m/z that matches the query.

Return type:

int

full_dia.deepmap.gpu_bin_map(n, cycle_num, idx_start_v, ms1_scan_seek_idx, ms1_scan_im, ms1_scan_mz, ms1_scan_height, ms2_scan_seek_idx, ms2_scan_im, ms2_scan_mz, ms2_scan_height, query_mz_m, ppm_tolerance, query_im_v, im_tolerance, im_gap, result_maps, ms1_ion_num)[source]

Each CUDA thread generates a map (cycle + mobility + intensity) of an elution group. When multiple signals fall into the same bin, retain only the one with the highest intensity.

full_dia.deepmap.gpu_bin_maps(n, locus_num, cycle_num, idx_start_m, ms1_scan_seek_idx, ms1_scan_im, ms1_scan_mz, ms1_scan_height, ms2_scan_seek_idx, ms2_scan_im, ms2_scan_mz, ms2_scan_height, query_mz_m, tol_ppm, query_im_v, tol_im_map, im_gap, result_maps, ms1_ion_num)[source]

maps: [n_pr, n_locus, n_ion, n_cycle, n_im_bin] Each thread generates maps for multi elution groups of a pr When multiple signals fall into the same bin, retain only the one with the highest intensity.

full_dia.deepmap.load_model_big(dir_model, n_channel)[source]

Load DeepProfile-56 model.

full_dia.deepmap.load_model_center(dir_model, n_channel)[source]

Load DeepProfile-14 model.

full_dia.deepmap.load_models(dir_center=None, dir_big=None)[source]

Load DeepProfile-14 and DeepProfile-56 models.

full_dia.deepmap.profile(func)[source]
full_dia.deepmap.scoring_maps(model, df_input, map_gpu_ms1, map_gpu_ms2, cycle_num, map_im_gap, map_im_dim, ppm_tolerance, im_tolerance, neutron_num, return_feature=True)[source]

Extract and score the Profile-14 maps.

Parameters:
  • model (torch.nn.Module) – DeepProfile-14

  • df_input (pd.DataFrame) – Provide the pr info.

  • map_gpu_ms1 (dict) – Provide the MS1 data.

  • map_gpu_ms2 (dict) – Provide the MS2 data.

  • cycle_num (int) – How many cycle to extract for a pr. Default 13.

  • map_im_gap (float) – The bin width of im in a map.

  • map_im_dim (float) – The dimension of im in a map.

  • ppm_tolerance (float) – The tolerance of ppm.

  • im_tolerance (float) – The tolerance of im.

  • neutron_num (int) – Specify the neutron num.

  • return_feature (bool, default=True) – Whether to return feature or not.

Returns:

predtorch.Tensor

The scores by DeepProfile-14.

featuresnp.ndarray

The features by DeepProfile-14.

Return type:

tuple

full_dia.fdr module

full_dia.fdr.adjust_rubbish_q(df, batch_num)[source]

If the #ids are less than 5000, then we can set the rubbish_q_cut to 0.75 to save more #ids.

Return type:

None

full_dia.fdr.cal_q_pg(df_input_raw, q_pr_cut, run_or_global)[source]

Calculate the q values of pgs based on the assigned peptides.

Parameters:
  • df_input_raw (pd.DataFrame) – Provide columns: “strip_seq”, “cscore_pr”, “decoy”, “q_pr”.

  • q_pr_cut (float) – The q value cut to select which peps will be used to calculate the cscore of a pg.

  • run_or_global (str) – Specify the calculation on “run” or “global” level.

Returns:

df_res – Add new columns: “cscore_pg”, “q_pg”

Return type:

pd.DataFrame

full_dia.fdr.cal_q_pr_batch(df, batch_size, n_model, model_trained=None, scaler=None)[source]

Calculate the q values of peptides in a batch.

Parameters:
  • df (pd.DataFrame) – Provide columns: “pr_id”, “score_”, “decoy”.

  • batch_size (int) – The batch size for training an MLP.

  • n_model (int) – The number of models to ensemble.

  • model_trained (list[MLPClassifier], default=None) – The trained ensemble models or None.

  • scaler (StandardScaler, default=None) – The trained scaler or None.

Returns:

dfpd.DataFrame

Add new columns: “cscore_pr_run”, “group_rank”, “q_pr_run”

model_trainedlist[MLPClassifier]

The trained ensemble models.

scalerStandardScaler

The trained scaler.

Return type:

tuple

full_dia.fdr.cal_q_pr_combined(df, batch_size, n_model)[source]

Calculate the q values of peptides across the batches using DIA-NN’s model.

Parameters:
  • df (pd.DataFrame) – Provide columns: “pr_id”, “score_”, “decoy”.

  • batch_size (int) – The batch size for training an MLP.

  • n_model (int) – The number of models to ensemble.

Returns:

df – Add new columns: “cscore_pr_run”, “q_pr_run”

Return type:

pd.DataFrame

full_dia.fdr.cal_q_pr_core(df, run_or_global)[source]

The core function to calculate the q values of peps using DIA-NN’s model.

Parameters:
  • df (pd.DataFrame) – Provide “cscore” values of peptides.

  • run_or_global (str) – Specify the calculation on “run” or “global” level.

Returns:

df – Add a new column: ‘q_pr’.

Return type:

pd.DataFrame

full_dia.fdr.profile(func)[source]

full_dia.fxic module

full_dia.fxic.cal_coelution_by_gaussion(xics, window_points, valids_num)[source]

Coelution sa scores by sliding windows methods.

Parameters:
  • xics (numba.cuda.devicearray.DeviceNDArray) – The extracted xics on GPU device.

  • window_points (int) – Fixed to 7 cycles when computing the SA scores.

  • valids_num (int) – The number of fragment ions + 2 (pr and unfragmented pr)

Returns:

scorestorch.Tensor

The mean SA coelution scores for peak groups.

scores_rawtorch.Tensor

The raw SA coelution scores for peak groups.

Return type:

tuple

full_dia.fxic.cal_measure_im(locus_ims, locus_sas, good_cut=0.5)[source]

Calculate the measure_im for each locus, weighting with the sa values.

Parameters:
  • locus_ims (np.ndarray) – Ion mobility values for locus. Dimension: [n_locus, n_ion]

  • locus_sas (np.ndarray) – SA scores for locus. Dimension: [n_locus, n_ion]

  • good_cut (float, default=0.5) – Only considering the ion with good_cut threshold

Returns:

locus_im – The weighted mean ion mobility values. Dimension: [n_locus]

Return type:

np.ndarray

full_dia.fxic.concat_nonzero_locus(locus, scores_sa, scores_sa_m)[source]

After screening locus by sa, sa_input has much zero values. Select and concat the nonzero values to vectors.

Parameters:
  • locus (np.ndarray) – The locus of extracted xics. Dimension: [n_pep, n_locus]

  • scores_sa (torch.Tensor) – The SA locus scores of extracted xics. Dimension: [n_pep, n_locus]

  • scores_sa_m (torch.Tensor) – The SA ion scores of extracted xics. Dimension: [n_pep, n_ion, n_locus]

Returns:

locus_vnp.ndarray

The candidate locus after screening in a vector.

locus_numnp.ndarray

Indicate how many locus retained after screening for a peptide.

locus_sa_vnp.ndarray

The SA locus scores of candidate locus.

locus_sasnp.ndarray

The SA ion scores of candidate locus.

Return type:

tuple

full_dia.fxic.estimate_xic_boundary(xics, sa_gausion_m)[source]

Exstimate the boundary of an elution group in cycles.

Parameters:
  • xics (torch.Tensor) – Dimension: [n_pep, n_ion, 13]

  • sa_gausion_m (torch.Tensor) – Dimension: [n_pep, n_ion]

Returns:

left_idx_1dnp.ndarray

The start index for locus.

right_idx_1dnp.ndarray

The end index for locus.

Return type:

tuple

full_dia.fxic.extract_xics(df, map_gpu_ms1, map_gpu_ms2, ppm_tolerance, im_tolerance, rt_tolerance=None, cycle_num=None, scope='center', only_xic=False, by_pred=True)[source]

Extrac XICs from centroid ms data.

Parameters:
  • df (pd.DataFrame) – Provide the info of locus and peptide.

  • map_gpu_ms1 (dict) – MS1 data.

  • map_gpu_ms2 (dict) – MS2 data.

  • ppm_tolerance (float) – The tolerance of ppm.

  • im_tolerance (float) – The tolerance of mobility.

  • rt_tolerance (float, default=None) – The tolerance of rt. If None, from gradient start to end.

  • cycle_num (int, default=None) – The cycle num of xics. If None, from cycle start to end.

  • scope (str, default="center") – Determine which ions to extract. “center”: pr, pr_unfrag, fragment ions. “big”: “center” and the corresponding isotope ions. “top6”: top-6 fragment ions.

  • only_xic (bool, default=False) – Whether return xics with or without im/m/z

  • by_pred (bool, default=True) – Use measure_im or pred_im when extracting xics.

Returns:

cycles_idxnp.ndarray

Each XIC time point corresponds to a cycle index.

rtsnp.ndarray

Each XIC signal point corresponds to a retention time.

imsnp.ndarray

Each XIC signal point corresponds to a ion mobility.

mzsnp.ndarray

Each XIC signal point corresponds to a measured m/z value.

xicsnumba.cuda.devicearray.DeviceNDArray

The extracted xics on GPU device.

Return type:

tuple

full_dia.fxic.find_maximum(scan_im, scan_mz, scan_height, query_left, query_right, query_im_left, query_im_right)[source]

Find the maximum intensity value with tol for query in centroided data

full_dia.fxic.gpu_cal_sa(v)[source]

Calculate the sa between V and Gaussian Vector: [0.0044, 0.054, 0.242, 0.399, 0.242, 0.054, 0.0044]

full_dia.fxic.gpu_extract_xics(n, cycle_nums, idx_start_v, ms1_scan_seek_idx, ms1_scan_im, ms1_scan_mz, ms1_scan_height, ms2_scan_seek_idx, ms2_scan_im, ms2_scan_mz, ms2_scan_height, query_mz_m, ppm_tolerance, query_im_v, im_tolerance, ms1_ion_num, result_im, result_mz, result_xic, only_xic)[source]

Extract xics from MS data for target ions. Each thread works for an ion and make a xic (profile).

full_dia.fxic.gpu_sa_gausion_core(block_num, xics, scores, window_points, valids_num)[source]

Using share-memory to calculate the sa for each locus

full_dia.fxic.gpu_simple_smooth(input_xics)[source]

Smooth the xics (n_pep * n_ion * n_cycle) extracted from raw MS data.

full_dia.fxic.gpu_simple_smooth_core(n, input_xics, output)[source]

The core of gpu_simple_smooth using a weighted mean method.

full_dia.fxic.grid_xic_best(df_batch, ms1_centroid, ms2_centroid)[source]

For developing.

full_dia.fxic.profile(func)[source]
full_dia.fxic.reserve_sa_maximum(x)[source]

If x > x-1 and x > x+1, x is local maximum will be saved. If not, assign 0

Parameters:

x (torch.Tensor) – SA raw values with dimension: [n_pep, n_cycle]

Returns:

x – SA values after suppression with dimension: [n_pep, n_cycle]

Return type:

torch.Tensor

full_dia.fxic.screen_locus_by_deep(df_batch, locus_num, top_deep_q)[source]

Screen locus of a pr by deep scores.

Parameters:
  • df_batch (pd.DataFrame) – Provide columns: “pr_id”, “seek_score_deep”, “seek_score_sa_x_deep” n_pep * n_locus rows

  • top_deep_q (float) – Threshold for deep_x / deep_max

Returns:

df_batch – Less rows after screen.

Return type:

pd.DataFrame

full_dia.fxic.screen_locus_by_sa(scores_sa, top_sa_cut)[source]

Screen multi locus of a pr that satisfy: local maximum, quantile1, quantile2

Parameters:
  • scores_sa (np.ndarray) – Scores of locus.

  • top_sa_cut (float) – Quantile threshold on sa level

Returns:

scores_sa – Bad points have already assigned zero values.

Return type:

np.ndarray

full_dia.fxic.update_sa_by_grid(df, ms)[source]

For developing.

full_dia.library module

class full_dia.library.Library(dir_lib)[source]

Bases: object

Reader class of the spectral library.

assign_fg_mz(df)[source]

Assign fg mz values based on the precursor index from raw df_pr.

Parameters:

df (pd.DataFrame) – Provide the “pr_index” column.

Returns:

df – Add the m/z value columns of fragment ions.

Return type:

pd.DataFrame

assign_proteins(df)[source]

Assign proteins based on the precursor index from raw df_map.

Parameters:

df (pd.DataFrame) – Provide the “pr_index” column.

Returns:

df – Add new columns: “protein_id”, “protein_name”, “proteotypic”

Return type:

pd.DataFrame

check_lib(df)[source]
Return type:

None

Check spectral library:

column names, modifications, charges, loss, proteins, length

construct_dfs(df)[source]

Construct the df_pr and df_map from DIA-NN’s .parquet library.

Parameters:

df (pd.DataFrame) – The raw DIA-NN’s .parquet file.

Returns:

df_prpd.DataFrame

Each row corresponds to a precursor and its fragment information.

df_mappd.DataFrame

Each row represents the protein information corresponding to the peptide in the same row of df_pr.

Return type:

tuple

polish_lib_by_swath(swath, ws_diann=None)[source]

Remove prs whose m/z values are not in the range of SWATH settings.

Parameters:
  • swath (np.ndarray) – The SWATH settings.

  • ws_diann (Path, default=None) – For developing.

Returns:

df_lib – The polished library.

Return type:

pd.DataFrame

full_dia.library.profile(func)[source]

full_dia.log module

class full_dia.log.Logger[source]

Bases: object

This class manages a singleton-style logger instance and provides a class method to (re)configure file and console handlers with a custom formatter.

classmethod get_logger()[source]
logger = <Logger Full-DIA (DEBUG)>
classmethod set_logger(dir_out, is_time_name=False)[source]

Configure file and console logging handlers.

Parameters:
  • dir_out (pathlib.Path) – Output directory where the log file will be written.

  • is_time_name (bool, default=False) – Whether to use a timestamp-based log file name. If False, a fixed name report.log.txt is used.

Return type:

None

class full_dia.log.MyFormatter(*args, **kwargs)[source]

Bases: Formatter

format(record)[source]

Format the specified log record.

Parameters:

record (logging.LogRecord) – The log record to be formatted.

Returns:

The formatted log message string.

Return type:

str

full_dia.main module

full_dia.main.bootstrap(args)[source]

Initialize tasks.

Return type:

None

full_dia.main.main()[source]
full_dia.main.profile(func)[source]

full_dia.models module

class full_dia.models.DeepMall(input_dim, feature_dim)[source]

Bases: Module

It’s used to score the intensity similarity with kinds of weights.

forward(batch_mall, batch_valid_num)[source]
Returns:

featuretorch.Tensor

The last feature layer.

resulttorch.Tensor

The inference result.

Return type:

tuple

class full_dia.models.DeepMap(map_channels, nn_in_features=220)[source]

Bases: Module

In paper, it’s also called DeepProfile

forward(maps, batch_valid_num)[source]
Parameters:
  • maps (torch.Tensor) – Dimension: [n_locus, n_ion, n_cycle, n_im_bin]

  • batch_valid_num (torch.Tensor) – How many real ions for maps. Dimension: [n_locus]

Returns:

feature_maptorch.Tensor

The last feature layer.

resulttorch.Tensor

The inference result.

Return type:

tuple

class full_dia.models.DeepQuant(n_run, n_ion)[source]

Bases: Module

An encoder-decoder model to optimize the intensity matrix.

forward(x_area1, x_area2, x_sa)[source]
full_dia.models.profile(func)[source]

full_dia.polish module

full_dia.polish.is_fg_share(fg_mz_1, fg_mz_2, tol_ppm)[source]

Calculate how many ions in fg_mz_1 are matched to fg_mz_2.

full_dia.polish.make_interference_areas_zero(df_input, tol_locus=3, tol_im=0.03, tol_ppm=20)[source]

In global analysis, if a fragment ion maybe produced by a more confident pr, its area and SA score will be made to zeros.

Parameters:
  • df_input (pd.DataFrame) – Provide the identification info on the run level.

  • tol_locus (int, default = 3) – If the bias locus of two peps falls in this tolerance, they are competitors.

  • tol_im (float, default = 0.03) – If the bias im of two peps falls in this tolerance, they are competitors.

  • tol_ppm (float, default = 20) – If the ppm of two fragment ions falls in this tolerance, they are competitors.

Returns:

df – The intensities and SA values of fragment ions that lose the competition are set to zero.

Return type:

pd.DataFrame

full_dia.polish.make_interference_areas_zero_core(swath_id_v, measure_locus_v, measure_im_v, fg_mz_m, area_m, sa_m, tol_locus, tol_im, tol_ppm, other_idx)[source]

The big fish eats the small fish. If a fg ion shared by more confident pr, the sa and fg_mz will be zeros.

full_dia.polish.polish_prs(df_input, tol_im=0.03, tol_ppm=20, tol_sa_ratio=0.75, tol_share_num=5)[source]

As individual DIA signals can be shared among multiple peptides, an additional post-processing step is required to refine the results.

Parameters:
  • df_input (pd.DataFrame) – Provide the pr identification results.

  • tol_im (float, default = 0.03) – If the bias im of two peps falls in this tolerance, they are competitors.

  • tol_ppm (float, default = 20) – If the ppm of two peps falls in this tolerance, they are competitors.

  • tol_sa_ratio (float, default = 0.75) – If all fragment ions of a peptide have SA values above the threshold and are more likely to originate from more confident peptides, the peptide is removed.

  • tol_share_num (int, default=5) – If all fragment ions of a peptide have matched number above the threshold and are more likely to originate from more confident peptides, the peptide is removed.

Returns:

df – The polished df.

Return type:

pd.DataFrame

full_dia.polish.polish_prs_core(swath_id_v, measure_locus_v, measure_im_v, fg_mz_m, tol_locus, tol_im, tol_ppm, sa_m)[source]

The big fish eats the small fish. If a fg ion shared by more confident pr, the sa and fg_mz will be zeros.

full_dia.polish.profile(func)[source]

full_dia.quant module

full_dia.quant.grid_xic_best(df_batch, ms1_centroid, ms2_centroid)[source]

The profile with the highest SA among other fragment ion profiles is selected as the best profile. Different tolerance combinations are then traversed to extract XICs corresponding to the highest SA with the best profile.

Parameters:
  • df_batch (pd.DataFrame) – Provide the precursor information.

  • ms1_centroid (dict) – The MS1 data.

  • ms2_centroid (dict) – The MS2 data.

Returns:

areasnp.ndarray

Areas by best profiles.

sasnp.ndarray

The corresponding SA scores.

Return type:

tuple

full_dia.quant.interference_correction(xics, best_profile)[source]

DIA-NN’s method to correct the interference of profiles.

Return type:

Tensor

full_dia.quant.interp_xics(x, rts_input, target_dim)[source]

Interpolate XIC along the cycle to target dimension. Also update the rts of new time points.

Return type:

tuple

full_dia.quant.mask_tensor(xic, left, right)[source]

Set the edge regions of the XIC to zero.

Parameters:
  • xic (torch.Tensor) – Dimension: [n_xic, n_ion, n_cycle]

  • left (torch.Tensor) – Indicate the left region of the XIC.

  • right (torch.Tensor) – Indicate the right region of the XIC.

Returns:

xic – The edge regions have been set to zero.

Return type:

torch.Tensor

full_dia.quant.profile(func)[source]
full_dia.quant.quant_center_ions(df_input, ms)[source]

A novel xic extraction method to quantify fragment ions.

Parameters:
  • df_input (pd.DataFrame) – Provide the identification information of precursors.

  • ms (tims.Tims) – Provide the MS data.

Returns:

df – Add new columns: “score_ion_quant” and “score_ion_sa”.

Return type:

pd.DataFrame

full_dia.quant.select_best_profile(x_profile)[source]

Select the best profile if it has the highest SA among other profiles.

Return type:

Tensor

full_dia.quant.select_other_profiles(x_profile, best_profile)[source]

Select the profile from different tolerance conditions that has the highest SA with the best profile.

Parameters:
  • x_profile (torch.Tensor) – XIC profiles using different tolerances. Dimension: [n_pep, tol, n_ion, n_cycle]

  • best_profile (torch.Tensor) – The best profile. Dimension: [n_pep]

Returns:

best_xtorch.Tensor

Each profile has the highest SA with the best profile.

sastorch.Tensor

The SA scores of profiles.

Return type:

tuple

full_dia.refine module

full_dia.refine.construct_train_data(df_top, ms)[source]

Construct maps and mall data. Positive samples: [Apex, Apex + 1, Apex - 1] locus from target peak groups. Negative samples: top-3 (by SA score) locus from target peak groups.

Parameters:
  • df_top (pd.DataFrame) – Provide the identification information.

  • ms (tims.Tims) – MS data.

Returns:

maps_centernp.ndarray

The maps data for monoisotope ions. Dimension: [n_sample, 14, n_cycle, n_im_bin].

maps_bignp.ndarray

The maps data for monoisotope + isotope ions. Dimension: [n_sample, 56, n_cycle, n_im_bin].

mallsnp.ndarray

The mall data for the calculation of intensity similarity.

center_ion_numsnp.ndarray

Valid ions num for each sample.

labelsnp.ndarray

Positive or negative.

Return type:

tuple

full_dia.refine.eval_one_epoch(trainloader, model)[source]

Return the accuracy of the model on the validation set.

Return type:

float

full_dia.refine.make_dataset_mall(malls, valid_num, labels, train_ratio=0.9)[source]

Make pytorch dataset and split it into train and validation sets for Mall data.

Parameters:
  • malls (np.ndarray) – The mall data.

  • valid_num (np.ndarray) – Valid ion num of each mall.

  • labels (np.ndarray) – The labels.

  • train_ratio (float, default=0.9) – The ratio between train set and validation set.

Returns:

train : torch.utils.data.Dataset eval : torch.utils.data.Dataset Mall’s feature dimention.

Return type:

tuple

full_dia.refine.make_dataset_maps(maps, valid_num, labels, train_ratio, maps_type)[source]

Make pytorch dataset and split it into train and validation sets for Map data.

Parameters:
  • maps (np.ndarray) – The map/profile data.

  • valid_num (np.ndarray) – Valid ion num of each map.

  • labels (np.ndarray) – The labels.

  • train_ratio (float) – The ratio between train set and validation set.

  • maps_type (str) – “Profile-14”: for 14 monoisotope ions (pr, pr_unfrag, 12 fragment ions) “Profile-56”: for monoisotope + isotope ions (14 * 4)

Returns:

train : torch.utils.data.Dataset eval : torch.utils.data.Dataset

Return type:

tuple

full_dia.refine.my_collate(items)[source]

The recall function of pytorch dataloader.

full_dia.refine.profile(func)[source]
full_dia.refine.refine_models(df_top, ms, model_center, model_big)[source]

Refine/Train models using the first round identification result.

Parameters:
  • df_top (pd.DataFrame) – Provide the identification result of peptides.

  • ms (tims.Tims) – MS data.

  • model_center (torch.nn.Module) – DeepProfile-14 for 14 monoisotope ions.

  • model_big (torch.nn.Module) – DeepProfile-56 for monoisotope + isotope ions.

Return type:

tuple

Returns:

The fine-tuned model_center, model_big and the trained model_mall.

full_dia.refine.retrain_model_map(model_maps, maps, valid_nums, labels, maps_type, epochs)[source]

Fine-tune the model and return the model with optimal performance.

Parameters:
  • model_maps (torch.nn.Module) – The pretrained DeepProfile model.

  • maps (np.ndarray) – Run-specific profile/map data for fine-tuning.

  • valid_nums (np.ndarray) – Valid ion num of each train sample.

  • labels (np.ndarray) – The labels of train samples.

  • maps_type (str) – “Profile-14”: for 14 monoisotope ions (pr, pr_unfrag, 12 fragment ions) “Profile-56”: for monoisotope + isotope ions (14 * 4)

  • epochs (int) – Number of maximum epochs.

Returns:

model_best – The model with optimal performance.

Return type:

torch.nn.Module

full_dia.refine.train_model_mall(malls, valid_num, labels, epochs)[source]

Train the model DeepMall from scratch on the training set and return the model with optimal performance.

Parameters:
  • malls (np.ndarray) – The mall data.

  • valid_num (np.ndarray) – Valid ion num of each train sample.

  • labels (np.ndarray) – The labels of train samples.

  • epochs (int) – Number of maximum epochs.

Returns:

model_best – The model with optimal performance.

Return type:

torch.nn.Module

full_dia.refine.train_one_epoch(trainloader, model, optimizer, loss_fn)[source]

Train the model on the training set and return the loss.

Return type:

float

full_dia.scoring module

full_dia.scoring.numba_scoring_putatives(groups, sa_v, center_v, big_v)[source]

Use Numba to accelerate the computation of the maximum and sum scores across different candidate peak groups for the same precursor.

full_dia.scoring.profile(func)[source]
full_dia.scoring.score_locus(df_target, ms, model_center, model_big)[source]

Calculate function-based and learning-based scores for PSMs.

Parameters:
  • df_target (pd.DataFrame) – Provide the PSM information.

  • ms (tims.Tims) – MS data.

  • model_center (torch.nn.Module) – DeepProfile-14

  • model_big (torch.nn.Module) – DeepProfile-56

Returns:

df – Scores have been appended to the DataFrame in columns prefixed with “score_”.

Return type:

pd.DataFrame

full_dia.scoring.scoring_by_cross(df_batch, is_update=False)[source]
Return type:

DataFrame

Compute scores combinations as additional scores:
Before refine phase (is_update: False):
  1. sa_center - sa_left

  2. deep_center - deep_left

  3. sa_center * deep_center

  4. sa_center * deep_big

After refine phase (is_update: True):
  1. deep_center - deep_left

  2. sa_center * deep_center

  3. sa_center * deep_big

full_dia.scoring.scoring_by_deep_layer(df_batch, features_deep_v, x)[source]

Append the feature layers scores of DeepProfile to df.

Parameters:
  • df_batch (pd.DataFrame) – The object.

  • features_deep_v (list) – The feature layers scores of DeepProfile.

  • x (str) – “pre”: scores are from the pretrain models. “refine_p1”: scores are from the refinement models with 0.5 * ppm. “refine_p2”: scores are from the refinement models with 0.25 * ppm.

Returns:

df – The feature layers scores of DeepProfile have been appended.

Return type:

pd.DataFrame

full_dia.scoring.scoring_by_deep_prob(df_batch, scores_deep_v, x)[source]

Append the inference scores of DeepProfile to df.

Parameters:
  • df_batch (pd.DataFrame) – The object.

  • scores_deep_v (list) – The inference scores of DeepProfile.

  • x (str) – “pre”: scores are from the pretrain models. “refine”: scores are from the refinement models. “refine_p1”: scores are from the refinement models with 0.5 * ppm. “refine_p2”: scores are from the refinement models with 0.25 * ppm.

Returns:

df – The inference scores of DeepProfile have been appended.

Return type:

pd.DataFrame

full_dia.scoring.scoring_center_im(df_batch, ims_input)[source]
Return type:

DataFrame

Calculate mobility related scores with the center cycle MS/MS.
  1. imbias for 14 ions

  2. mean

  3. mean weighting by sa

  4. mean of top-6 weighting by sa

full_dia.scoring.scoring_center_mz(df_batch, mzs_input)[source]
Return type:

DataFrame

Calculate ppm related scores with the center cycle MS/MS.
  1. ppm for 14 ions

  2. mean

  3. mean weighting by sa

  4. mean of top-6 weighting by sa

full_dia.scoring.scoring_center_snr(df_batch, xics)[source]
Return type:

DataFrame

Calculate SNR related scores with the center cycle MS/MS. Signal is the apex intensiy, noise is the median of profile.

  1. snrs for 14 ions

  2. mean

  3. mean weighting by sa

  4. mean of top-6 weighting by sa

full_dia.scoring.scoring_main_elution(df_batch, xics, x)[source]
Return type:

DataFrame

Calculate the following elution scores based on the monoisotope types specified by x:

x: [‘center’, ‘center_p1’, ‘center_p2’] 1. The sa for each of the 14 ions 2. mean value of 14 ions 3. mean value of top-6 4. mean value w/o norm of remaining ions 5. sum of top1/2/3 b ions

full_dia.scoring.scoring_meta(df)[source]
Return type:

DataFrame

Calculate peptide meta information related scores:
  1. mz (scoring_center_mz)

  2. charge(one-hot encoding using 1, 2, 3, 4)

  3. sequence length

  4. fg_num

  5. library fragment ions intensities

full_dia.scoring.scoring_other_elution(df_batch, xics, x)[source]
Return type:

DataFrame

Calculate the following elution scores based on the isotope types specified by x:

x: [‘left’, ‘1H’, ‘2H’] 1. sa for each of the 14 ions 2. mean value of 14 ions 3. mean value of top-6 4. mean value w/o norm of remaining ions

full_dia.scoring.scoring_putatives(df)[source]
Return type:

DataFrame

Calculate competition-related scores as a pr has multiple candidate elution groups:
  1. score-i - score-max

  2. np.log(score-i/score.sum)

full_dia.scoring.scoring_rt(df_batch)[source]

Calculate RT related scores.

Return type:

DataFrame

full_dia.scoring.scoring_xic_intensity(df_batch, xics, rts)[source]

Calculate the intensity related scores. Only top-6 intensities are consideration. apex intensities: ms2_relative, ms2_total, ms1/ms2, similarity profile areas: ms2_relative, ms2_total, ms1/ms2, similarity

Return type:

DataFrame

full_dia.scoring.update_scores(df, ms, model_center, model_big, model_mall)[source]
Return type:

DataFrame

Calculate scores using the refined DeepProfile and the trained DeepMall.
  1. DeepProfile: refined deep prob scores

  2. DeepProfile: cross scores with refined deep prob scores

  3. DeepProfile: refined deep prob and layer scores with 0.5 * ppm

  4. DeepProfile: refined deep prob and layer scores with 0.25 * ppm

  5. DeepMall: deep prob and layer scores

full_dia.search module

full_dia.search.cal_recall_seek_locus(df_lib, ms, model, tol_rt, top_sa_cut, top_deep_cut)[source]

For developing.

full_dia.search.cal_recall_seek_seed(df_lib, ms, model_center)[source]

For developing.

full_dia.search.profile(func)[source]
full_dia.search.search_core(lib)[source]
Return type:

None

Search on run level:
  1. Seek seeds for calibration.

  2. Seek candidate elution groups (locus) for each precursor.

  3. Score the elution groups.

  4. Calculate the FDR on run level.

  5. Save all target precursor results and high-quality decoy precursor results.

full_dia.search.seek_locus(df_target, ms, model_center, top_sa_q, top_deep_q)[source]

Seek candidate elution groups (locus) by: 1) scree with sa 2) screen with deep.

Parameters:
  • df_target (pd.DataFrame) – Provide the precursor information.

  • ms (tims.Tims) – MS data.

  • model_center (torch.nn.Module) – DeepProfile-14, used to score the elution consistency of monoisotope ions.

  • top_sa_q (float) – First, candidate locus should have good SA scores compared to the best elution group.

  • top_deep_q (float) – Second, candidate locus should have good deep scores compared to the best elution group.

Returns:

df – Each row is a candidate elution group.

Return type:

pd.DataFrame

full_dia.search.seek_seed(df_target, ms, model_center)[source]

Seek the best elution group for each pr using SA scoring methods. Then, model_center scores the elution group. Obviously, these elution groups may contain many false positives, but they can be used as seeds for calibration.

Parameters:
  • df_target (pd.DataFrame) – The identification object from library.

  • ms (tims.Tims) – MS data.

  • model_center (torch.nn.Module) – DeepProfile will score the coelution consistency of the elution group.

Returns:

df – One precursor will have one elution group.

Return type:

pd.DataFrame

full_dia.search.select_required_and_all_targets(df)[source]

Select good target and decoy peps for FDR calculation. Select all target to save, which avoids the second extraction in global analysis.

Parameters:

df (pd.DataFrame) – The identification result for one batch.

Returns:

df_mainpd.DataFrame

Good target and decoy peps.

df_otherpd.DataFrame

All target peps.

Return type:

tuple

full_dia.search.update_tolerance(df_lib, ms, model_center, model_big, sample_ratio)[source]

Update the tolerance based on identifications of a subset of target peptides.

Parameters:
  • df_lib (pd.DataFrame) – The raw library.

  • ms (tims.Tims) – MS data.

  • model_center (torch.nn.Module) – DeepProfile-14, used to score the elution consistency of monoisotopes.

  • model_big (torch.nn.Module) – DeepProfile-56, used to score the elution consistency of monoisotopes + isotopes.

  • sample_ratio (float) – Sample the subset of library to expedite the update.

Returns:

cfg.tol_rt, cfg.tol_im_xic, cfg.tol_ppm

Return type:

None. The global tolerance values will be updated

full_dia.tims module

class full_dia.tims.Tims(dir_d)[source]

Bases: object

Reader and centroiding the profile data for diaPASEF.

construct_data_by_quadrupole(window_id)[source]

Construct profile and centroid data with specified window_id.

Parameters:

window_id (int) – 0 refers to MS1, others refer to different quadrupole windows.

Returns:

all_rtnp.ndarray

The rt values of all cycles.

cycle_valid_lensnp.ndarray

The number of profile ions per cycle.

all_pushnp.ndarray

The 1/k0 values of profile ions.

all_tofnp.ndarray

The m/z values of profile ions.

all_heightnp.ndarray

The intensities of profile ions.

cycle_valid_lens2np.ndarray

The number of centroided ions per cycle.

all_push2np.ndarray

The 1/k0 values of centroided ions.

all_tof2np.ndarray

The m/z values of centroided ions.

all_height2np.ndarray

The intensities of centroided ions.

Return type:

tuple

copy_map_to_gpu(swath_id, centroid)[source]

Copy profile or centroided MS data to GPU.

Parameters:
  • swath_id (int) – Specify the SWATH or quadrupole ID.

  • centroid (bool) – Specify the centroid profile or centroided MS data.

  • data. (Returns the MS1 chunk and MS2)

Return type:

list

property frame_nums
get_centroid_tol_push()[source]

Calculate how many pushes should be considered as neighbors when centroiding.

Return type:

int

get_cycle_time()[source]

Return the cycle time.

Return type:

float

get_device_name()[source]

Get the device name like timsTOF Ultra.

Return type:

str

get_dia_quadrupole()[source]

Exact boundaries of the quadrupole partitioning. Return likes: [200, 250, 300, 350 … 1150, 1200]

Return type:

ndarray

get_dia_windows()[source]

Exact boundaries of the window (m/z + 1/K0) partitioning.

Returns:

dfpd.DataFrame

Each row represents one window range: (im_low, im_high, q_low, q_high)

frames_num_per_cycleint

The frame number per cycle.

Return type:

tuple

get_im_gap()[source]

Calculate the 1/k0 value of a single push.

Return type:

float

get_rt_range()[source]

Return the minimum and maximum of RTs.

Return type:

tuple[float, float]

get_scan_rts()[source]

Get the RT for each cycle or frame.

Return type:

ndarray

plot_dia_windows()[source]

For developing.

split_ms1_to_chunks(ms1_map)[source]

MS1 can split by swath_id to save memory. Also, the start and end add 3Da to cover isos of prs.

Parameters:

ms1_map (tuple, the unsplit ms1 map.)

Returns:

d_ms1_maps – The key is the swath_id, and the value is the MS1 chunk data.

Return type:

dict

full_dia.tims.load_ms(ws)[source]

Wrapper function for loading diaPASEF data.

Return type:

Tims

full_dia.tims.numba_index_by_bool(idx, ims, mzs, heights)[source]

Value extraction using boolean indexing in Numba

full_dia.tims.numba_paral_centroid(all_tof, all_push, all_height, tol_tof_sum, tol_tof_suppression, tol_push, cumlen)[source]
Centroid the profile MS data using DIA-NN’s method:
  1. Summarize intensity values within a window range (m/z + 1/K0).

  2. Remove an aggregated point if a higher-intensity aggregated point exists in its neighborhood.

full_dia.tims.numba_paral_repeat(x, y)[source]

Repeat elements of vectors in Numba

full_dia.tims.numba_paral_sort(all_tof, all_push, all_height, cumlen)[source]

Sort vectors based on the m/z ascending order in Numba

full_dia.tims.numba_paral_sum(select_id, cumlen)[source]

Calculate sum values in Numba

full_dia.tims.profile(func)[source]

full_dia.utils module

full_dia.utils.cal_acc_recall(path_ws, df_input, diann_q_pr=None, diann_q_pro=None, diann_q_pg=None, alpha_q_pr=None, alpha_q_pro=None, alpha_q_pg=None)[source]

For developing.

full_dia.utils.cal_group_rank(x, group_size_cumsum)[source]

Calculate group rank parallelly.

full_dia.utils.cal_sa_by_np(x, y)[source]

Calculate the SA. The inputs have to be two-dimensions.

Return type:

ndarray

full_dia.utils.check_run_info(args)[source]

Print run info: version, platform, time, cpu, memory, gpu, cmd.

Return type:

None

full_dia.utils.clean_and_save(df_main, df_other, ws_single)[source]

Combine, clean and save the high-quality decoy and all target peptides identification information.

Parameters:
  • df_main (pd.DataFrame) – High-quality decoy and target peptides.

  • df_other (pd.DataFrame) – All target peptides. This will avoid the reextraction in global analysis.

  • ws_single (Path) – The path to save file.

Return type:

None

full_dia.utils.convert_cols_to_diann(df, ws_single)[source]

Convert local column names to DIA-NN’s column names.

Return type:

DataFrame

full_dia.utils.convert_numba_to_tensor(x)[source]

Convert numba cuda array to torch cuda array with the help of cupy.

Return type:

Tensor

full_dia.utils.create_cuda_zeros(shape, dtype=torch.float32)[source]

Create the Numba CUDA zero array with the help of pytorch.

Return type:

DeviceNDArray

full_dia.utils.cross_cos(x)[source]

For developing.

full_dia.utils.get_args()[source]

Parse command line arguments.

Return type:

Namespace

full_dia.utils.get_diann_info(path_ws)[source]

For developing.

full_dia.utils.init_gpu_params(gpu_id)[source]
Return type:

None

Initialize GPU params according to the GPU ID:
  1. for pytorch

  2. for numba.cuda

  3. Empirically adjust the batch size for GPU code.

full_dia.utils.init_multi_ws(ws_global, out_name)[source]

Initialize the paths of .d files and the output folder.

Return type:

None

full_dia.utils.init_single_ws(ws_i, total, ws_single)[source]

Initialize the output path of single .d file.

Return type:

None

full_dia.utils.interp_xics(xics, rts, target_dim)[source]

For developing.

full_dia.utils.move_all_zeros_end(a)[source]

Move all zero elements in the matrix to the end of rows. Based on http://stackoverflow.com/a/42859463/3293881

Return type:

ndarray

full_dia.utils.print_ids(df, q_cut, pr_or_pg, run_or_global)[source]

Print the number of ids (pr or pg) in the run/global level.

Return type:

None

full_dia.utils.profile(func)[source]
full_dia.utils.read_from_pq(ws_single, cols=None)[source]

Read .parquet file with specific columns.

Return type:

DataFrame

full_dia.utils.release_gpu_scans(*map_gpus)[source]

Release GPU-resident data and clear GPU memory.

Return type:

None

full_dia.utils.save_as_pkl(df, fname)[source]

For developing.

Module contents