HXu Blog

Do something interesting today

Datasets for EC Numbers Prediction

Dataset description and analysis

SIFTS database The SIFTS database1 contains EC annotations for entries on the Protein Data Bank (PDB). Several models have been evaluated on this database, including IEConv2. I download the summar...

Flux Balance Analysis

Review, method, application and tools

Flux balance analysis is a mathematical approach for analyzing the flow of metabolites through a metabolic network. Recommanded review papers: Park, J. M., Kim, T. Y., & Lee, S. Y. (20...

Molecule Clustering

Method, example, code...

Similarity Based Clustering Here is an example of clusting molecule with Tanimoto distance. Firstly, a similarity matrix whose element in the $i^{th}$ row and $j^{th}$ column is the structure simi...

Protein Function Prediction

based on its sequence and/or 3D structure

Recommanded review paper: Chowdhury, R, Maranas, CD. From directed evolution to computational enzyme engineering—A review. AIChE J. 2020; 66:e16847. [paper] //Nice review of protein design (ex...

Biomedical Knowledge Graph Construction

data integration...

Open Source Collections: Stanford Biomedical Network Dataset Collection [link] Databases: List of biological databases (WiKi) Reactome: human molecular pathways: metabolism, signaling, r...

Programming A GPU with CUDA

GPU Architectures Maxwell / Pascal Memory CUDA Software Model Grids, Blocks, Warps & Threads The Hardware abstracted as a Grid of Thread Blocks, which are indexed from 0. Blocks m...

[Note] Modeling Polypharmacy Side Effects with Graph Convolutional Network

multi-type link predication, multimodal graph, GCN...

Modeling Polypharmacy Side Effects with Graph Convolutional Network Home Page: http://snap.stanford.edu/decagon 1. Dataset The original dataset is a huge multimodal graph. Node: Each node eith...

Speech Processing

DFT, Speech/non-speech, voiced/unvoiced...

1. Sampling and Quantisation Speech signal are typically quantised in amplitude and sampled in time. Quantisation of a signal sequence is achieved by sampling. The process of signal quantisation a...

The Vector Space Model for Information Retrieval (IR)

binary, tf, tfidf, stoplist, stemming...

The Vector Space Model The vector space model maps each document $d\in D$ to a normalized vector $\vec{v}(d)\in V$. Each dimension $i$ of $V$ corresponds to a unique word $w_i$ in the document col...

Unsupervised Learning

k-means, dimensionalty reduction...

$k$-means Clustering Description1 Given a set of observation ${x_1, x_2, …, x_n}$, where each observation is a $d$-dimension real vector. The $k$-means clustering aims to partition the $n$ obs...