差点误解为某种福利呢,聊了聊才分析是单细胞发一篇文章的全部代码和数据,在code ocean搬运过来的,文章链接是:https://elifesciences.org/articles/43803/figures,关于code ocean,见:代码海洋-你想模仿的这里都有啊
All of the analyzed real datasets are publicly available and the relevant GEO accession codes are included in the manuscript. All of the simulated and real data can be accessed through Code Ocean at the following URL: https://doi.org/10.24433/CO.9044782e-cb96-4733-8a4f-bf42c21399e6. cNMF code is available on Github https://github.com/dylkot/cNMF/ (copy archived at https://github.com/elifesciences-publications/cNMF).
链接: https://pan.baidu.com/s/1MObwTPXjDbPb3EU1alv-Wg 密码: ou19
下载后,就发现 D:\BaiduNetdiskDownload\9044782e-cb96-4733-8a4f-bf42c21399e6_v1.0.zip: 压缩文件已损坏
Part1_Simulations\deloc_0.50\Seed_27157\LDA_Spectra_IndvRefitZscore.npz 里出现校验和错误。该文件已损坏。
Part1_Simulations\deloc_0.50\Seed_9925\ICA_Spectra_IndvRefitTPM.npz 里出现校验和错误。该文件已损坏。
Part1_Simulations\deloc_0.75\Seed_20008\cLDA\cnmf_tmp\cLDA.spectra.k_14.merged.df.npz 里出现校验和错误。该文件已损坏。
Part1_Simulations\deloc_0.75\Seed_31986\NMF_Spectra_IndvRefitZscore.npz 里出现校验和错误。该文件已损坏。
Part1_Simulations\deloc_1.00\Seed_18839\LDA_Spectra_IndvRefitTPM.npz 里出现校验和错误。该文件已损坏。
Part1_Simulations\deloc_1.00\Seed_29975\cLDA\cnmf_tmp\cLDA.spectra.k_14.merged.df.npz 里出现校验和错误。该文件已损坏。
Part1_Simulations\deloc_1.00\Seed_9485\cICA\cnmf_tmp\cICA.spectra.k_14.merged.df.npz 里出现校验和错误。该文件已损坏。
Part2_Organoids\Organoid_cNMF\cnmf_tmp\Organoid_cNMF.spectra.k_31.merged.df.npz 里出现校验和错误。该文件已损坏。
Part3_VisualCortex\HrvatinEtAl_cNMF\cnmf_tmp\HrvatinEtAl_cNMF.spectra.k_28.merged.df.npz 里出现校验和错误。该文件已损坏。
$ du -h -d 2 data/
4.4G data/Part1_Simulations/deloc_0.50
4.4G data/Part1_Simulations/deloc_0.75
5.0G data/Part1_Simulations/deloc_1.00
350M data/Part1_Simulations/deloc_1.00_halfDoublets
5.0K data/Part1_Simulations/runtime_evaluation
14G data/Part1_Simulations
3.5M data/Part2_Organoids/genesets
1.4G data/Part2_Organoids/Organoid_cNMF
2.0G data/Part2_Organoids
86M data/Part3_VisualCortex/genesets
1.5G data/Part3_VisualCortex/HrvatinEtAl_cNMF
1.3G data/Part3_VisualCortex/TasicEtAl_cNMF
2.8G data/Part3_VisualCortex
19G data/
其中,Part1_Simulations的14G占大头,另外的 Part2_Organoids 和 Part3_VisualCortex里面内容也是数据文件罢了。
不过是python的jupyter notebook文件,会python的朋友肯定不怕啦!
pip install jupyter notebook
# 启动Jupyter notebook
jupyter notebook
Part 1 simulations
Step 0. Estimate the simulation parameters Step 1. Simulate the scRNA-seq data Step 2. Preprocess the scRNA-seq data Step 3a. Run cNMF Step 3b. Run cICA Step 3c. Run cLDA Step 3d. Run PCA Step 3e. Run clustering Step 3f. Refit to TPM and Z-scores for indivdiual factorizations Step 4. Compare robustness of matrix factorization methods Step 5. Compare accuracy of deconvolutions Step 6. Plot inferred usages for cells over tSNE plots Step 7. Evaluate cNMF inference of doublets and activity GEP usage Step 8. Compare matrix factorization runtimes Step 9. Analyze high doublet rate simulation
Part 2 organoids
Step 1. Preprocess Quadrato Et Al organoid data Step 2. Run cNMF on Quadrato Et al organoid data and analyze the results
Part 3 visual cortex
Step 1. Preprocess Hrvatin Et Al visual cortex data Step 2. Preprocess Tasic Et Al visual cortex data Step 3. Run cNMF on Hrvatin Et Al visual cortex data Step 4. Run cNMF on Tasic Et Al visual cortex data Step 5. Analyze cNMF results
比如单独查看 Part2_Organoids的Step2_Run_cNMF_QuadratoEtAl
Run cNMF on the Quadrato et al brain organoid data and analyze the cell-type identity programs (in particular the mesodermal programs) and the cell-cycle and hypoxia programs present in this data
Figures and calculations:
Supplementary figure 8a: K selection plot Supplementary figure 8b: Replicate consensus plots before and after filtering Figure 2a: heatmap of GEP usages per cell Supplementary figure 9: Co-usage vs pearson correlation of the GEPs Supplementary figure 11: GEP usage overlap with published clusters Figure 2d: muscle cell marker genes Figure 2e: activity GEP marker genes Figure 2c: Gene-set enrichments Figure 2g: hypoxia GEP usage by cell-type Figure 2f: Cell cycle GEP usage by cell-type Co-usage of the proliferative precursor and immature muscle GEPs calculation Figure 2b: tSNE plot of GEP usage Supplementary figure 10: tSNE of GEP usage with separate panels for activity GEPs
如果你下载这20G的单细胞全部文件代码很困难,我也提供了仅仅是代码的打包文件:https://share.weiyun.com/mSLJxVcm 希望你学的开心