Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to install Rstanarm for Docker Image #608

Open
TrunckYagora opened this issue Nov 17, 2023 · 0 comments
Open

Unable to install Rstanarm for Docker Image #608

TrunckYagora opened this issue Nov 17, 2023 · 0 comments

Comments

@TrunckYagora
Copy link

Hi everyone, we are currently working on Databricks and would like to set up a Docker image so that we don't have to reinstall the library every time we start a cluster. Unfortunately, the library cannot be loaded after installation and we have not yet been able to identify the exact cause.

Below you can find the Docker file to reproduce the error.

Important: the installation takes about 70 minutes.

In addition, you must download the following file and place it in the corresponding folder with the Docker file: Rprofile.site

FROM databricksruntime/minimal:13.3-LTS

# Label version in case we need to force reinstall
LABEL version="1.1"

# Suppress interactive configuration prompts
ENV DEBIAN_FRONTEND=noninteractive

# Set the CRAN mirror
ENV R_CRAN_MIRROR https://cran.uni-muenster.de/

# Install python 3.8 and virtualenv for Spark and Notebooks
RUN apt-get update \
  && apt-get install -y \
    python3.10 \
    virtualenv

# We add RStudio's debian source to install the latest r-base version (4.1)
# We are using the more secure long form of pgp key ID of [email protected]
# based on these instructions (avoiding firewall issue for some users):
# https://cran.rstudio.com/bin/linux/ubuntu/#secure-apt
RUN apt-get update \
  && apt-get install --yes software-properties-common apt-transport-https \
  && gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 \
  && gpg -a --export E298A3A825C0D65DFD57CBB651716619E084DAB9 | sudo apt-key add - \
  && add-apt-repository -y "deb [arch=amd64,i386] https://cran.rstudio.com/bin/linux/ubuntu $(lsb_release -cs)-cran40/" \
  && apt-get update \
  && apt-get install --yes \
    libssl-dev \
    r-base \
    r-base-dev \
  && add-apt-repository -r "deb [arch=amd64,i386] https://cran.rstudio.com/bin/linux/ubuntu $(lsb_release -cs)-cran40/" \
  && apt-key del E298A3A825C0D65DFD57CBB651716619E084DAB9 \
  && apt-get clean \
  && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

# hwriterPlus is used by Databricks to display output in notebook cells
# hwriterPlus is removed for newer version of R, so we hardcode the dependency to archived version
# Rserve allows Spark to communicate with a local R process to run R code
RUN R -e "options(repos = list(CRAN = 'https://cloud.r-project.org/')); install.packages(c('hwriter', 'TeachingDemos', 'htmltools'))" \
 && R -e "install.packages('https://cran.r-project.org/src/contrib/Archive/hwriterPlus/hwriterPlus_1.0-3.tar.gz', repos=NULL, type='source')" \
 && R -e "install.packages('Rserve', repos='http://rforge.net/')"

# Additional instructions to setup rstudio. If you dont need rstudio, you can
# omit the below commands in your docker file. Even after this you need to use
# an init script to start the RStudio daemon (See README.md for details.)

# Databricks configuration for RStudio sessions.
COPY Rprofile.site /usr/lib/R/etc/Rprofile.site

# Rstudio installation.
RUN apt-get update \
 # Install gdebi-core.
 && apt-get install -y gdebi-core \
 # Download rstudio 1.4 package for ubuntu 18.04 and install it.
 && apt-get install -y wget \
 && apt-get install -y gdebi-core \
 && wget https://s3.amazonaws.com/rstudio-ide-build/server/jammy/amd64/rstudio-server-2022.12.1-366-amd64.deb \
 && gdebi -n rstudio-server-2022.12.1-366-amd64.deb  \
 && rstudio-server version

# Initialize the default environment that Spark and notebooks will use
RUN virtualenv -p python3.10 --system-site-packages /databricks/python3

# install relevant packages
RUN R -e "install.packages('rstan', repos = c(CRAN = Sys.getenv('R_CRAN_MIRROR')))"
RUN R -e "install.packages('remotes', repos = c(CRAN = Sys.getenv('R_CRAN_MIRROR')))"
RUN R -e "install.packages('rstanarm', repos = c(CRAN = Sys.getenv('R_CRAN_MIRROR')))"
RUN R -e "install.packages('dplyr', repos = c(CRAN = Sys.getenv('R_CRAN_MIRROR')))"

# Print the installed packages
RUN R -e "library('rstan')"
RUN R -e "library('dplyr')"
RUN R -e "library('rstanarm')"

# verify rstan installation
#RUN R -e "example(stan_model, package = 'rstan', run.dontrun = TRUE)"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant