Latent Variable Modeling for Cognitive Assessment Through Second-Order Exponential Family
Associate Professor in Statistics at Columbia University：刘京辰
Latent variable models are popular in the analysis of marketing, e-commerce, social network, and many other fields where human behaviors are observed and are summarized to a few characteristics. In this talk, I discuss a framework for latent variable models through a low-rank second-order exponential family. In this framework, the computational overhead is substantially reduced, which is crucial especially for nonlinear models and big data analysis. It is also convenient to incorporate additional graphical structures and other covariates. An R package is developed. I will illustrate the model and the package through several real data examples.
`ezdf` 包的目的是使 R 支持类似 SPSS 或 Stata 那样对用户友好的标签输出。`ezdf` 包并不是要定义一套新的制表函数，而是控制相关制表函数（如 `pander`）在输出时，能够自动带上对应的标签。除此之外，`ezdf` 也封装了几个常用的制表方法。
众所周知，在 R 的体系当中，并无变量标签或者数值标签的定义。对于类别变量，在 R 中使用 `factor` 类型可起到部分标签的功能。对于变量标签，在`data.frame` 中尽管可以直接使用标签来命名变量，例如 `df$/$$年龄` ，但是实际使用中多有不便。
在 R 中导入 SPSS 或 Stata 等传统统计软件的数据格式可有多个包来实现，例如 `foreign`、`readStata13`、`haven`、`sas7bdat` 等等。这些包在导入数据时，都能保持原数据中所定义的标签。然而所有这些包目前来说各有优缺点，即使对同一个格式也做不到支持各个版本的导入，因此难以提供一揽子解决方案。更重要的，各个包导入数据之后所定义的标签属性各不相同，导致对标签的使用难以统一。更不用说，在制作表格或者统计结果输出时，能够让 R 做到标签友好。
RWeekly.org 搭建了一个一站式的信息平台，通过网站，邮件，新浪微博 @rweekly 等渠道，实时地向来自140多个国家的读者推送社区的最新动态。每周的资讯速递帮助 R 用户快速地掌握社区一周内的最新进展。 近年来，R 社区发展迅速，CRAN 现在已有 10000+ 的程序包。学会发现，学习和使用现有的基础资源，掌握社区的最佳实践，可以节省时间、减少重复的轮子。 这个讲座将会介绍 R Weekly 的一些有趣的发现以及背后的故事。
Persistent Reproducible Reporting with Docker and R
Seven Bridges Genomics Program Management
Genomic Data Scientist：肖楠
Automatic report generation has a massive number of use cases for reproducible research and commercial applications. Fortunately, most of the problems involved in this topic have been elegantly solved by knitr and the R Markdown specification for the R community. However, the issues on data persistence and operating system-level reproducibility were rarely considered in the context of reproducible report generation. Today, such issues have become a major concern in the current software implementations. In this talk, we will discuss potential approaches to tackle such problems, particularly with the help of modern containerization technologies. We will also demonstrate how to compose a persistent and reproducible R Markdown report with the help of the two R packages we developed: docker-r and liftr. Specifically, you will learn to dockerize your existing R Markdown documents, how to apply it to the analysis of petabyte-scale cancer genomics data on the Cancer Genomics Cloud, and how to distribute or reuse such containerized reports.
Learning R Internals and C++ via Rcpp
In the realm of high performance computing with R, users might take a learning path from R, Rcpp to some R internals. However, each one of the three parts can be challenging without a proper understanding of the other two. This lecture attempts to share my experience and viewpoint with those who have similar interests in gaining better understanding of how R works behind the scene while advancing their C++ skills.
Building User Profiles from Online Social Behaviors, with Applications in Tencent Social Ads
The QQ (800M monthly users) and Wechat (700M monthly users) are the two largest instant messaging / social networks in China. Tencent Social Ads is the advertising system for both Wechat and QQ, serving well over 10B page views per day, for hundred million daily users.
We strive to understand as much as possible on our users’ multiple aspects, so as to serve the best personalized ads for them. The rich user behaviors on Tencent’s many products lay a solid foundation in user profiling. We develop audience targeting on many dimensions, including demographics, interests, intents, transactions, physical locations, and access environment, etc.
In this presentation, we will share our experience in large-scale user data mining for audience targeting, and discuss the challenges we face and the solutions we have employed.
On equivalence of likelihood maximization of stochastic block model and nonnegative matrix factorization, and beyond
Community structures detection in complex network is important for understanding not only the topological structures of the network, but also the functions of it. Stochastic block model and nonnegative matrix factorization are two widely used methods for community detection, which are proposed from different perspectives. The relations between them are studied in this talk. The logarithm of likelihood function for stochastic block model can be reformulated under the framework of nonnegative matrix factorization. Besides the model equivalence, the algorithms employed by the two methods are different. Furthermore, we design new matrix factorization model for signed network, and its effectiveness is evaluated.