Power BI Power Query R

How to convert accented characters to unaccented in R or Power BI

Sometimes it’s necessary to convert accented characters to non accented in R or Power BI for text analysis purposes. Accented characters might be a problem in texts with spelling mistakes and spoil the analysis.

Convert accented characters in R

Luckily in R is a base function iconv that makes this task easy. You only have to adjust the target encoding.

iconv("schön, natürlich", to = 'ASCII//TRANSLIT')

iconv("tā ir dzīve", to = 'Latin1')

 

If you want to know how useful RStudio tips and tricks, take a look at this post.

Convert accented characters in Power BI (Power Query)

First of all, R should be installed on your workstation.
Open Power Query editor, go to the Transform tab and choose Run R script.

Use the iconv function to convert the dataset column with accented characters. It worked only if the current encoding parameter (from) was defined.

# 'dataset' holds the input data for this script

dataset$text <- iconv(dataset$text, from = 'UTF-8', to = 'ASCII//TRANSLIT')

output <- dataset

Here is the result.

Here is another example of how to run R script in Power Query.

10 comments on “How to convert accented characters to unaccented in R or Power BI

  1. Just amazing! thanks

  2. That’s incredible. It’s crazy to realize we can use M, Python and R in the same environment and we get to choose which is better for a especific situation.

  3. Hello,

    I tried the script (my column name is “fonction”)

    # ‘dataset’ contient les données d’entrée pour ce script
    dataset$fonction <- iconv(dataset$fonction, from = 'WINDOWS-1252', to = 'ASCII//TRANSLIT')
    output <- dataset

    But I have this error :
    DataSource.Error : ADO.NET : À ρřøБļém øċćűѓřëδ щђϊℓє þŕθćęѕšϊⁿġ ŷöŭŗ Ґ šсѓϊрŧ.
    Ĥзřë åŕė ŧђз ţéćђńΐ¢âľ ðęţªΐļś: [DataFormat.Error] Désolé… Nous n'avons pas pu analyser l'entrée fournie à une valeur Date.
    Détails :
    DataSourceKind=R
    DataSourcePath=R
    Message=À ρřøБļém øċćűѓřëδ щђϊℓє þŕθćęѕšϊⁿġ ŷöŭŗ Ґ šсѓϊрŧ.
    Ĥзřë åŕė ŧђз ţéćђńΐ¢âľ ðęţªΐļś: [DataFormat.Error] Désolé… Nous n'avons pas pu analyser l'entrée fournie à une valeur Date.
    ErrorCode=-2147467259
    ExceptionType=Microsoft.PowerBI.Scripting.R.Exceptions.RUnexpectedException

    Please do you have an idea on how to solve it ?

    Thank you

    • Janis Sturis

      Hello

      I don’t have an exact idea what might cause this problem.
      My first guess is that you should try to change the type for the column “fonction” in Power Query. For example to Text.
      Try that R script with other columns – that will ensure that technically R scripts work in your Power Query.

  4. Hello,

    The column “fonction” is already in type text.
    With other colums I have exactly the same problem,
    I tried also in an other file (much simpler) => same problem.

    In power bi option I defined the R directory end IDE
    First I was in microsoft r open 4, I removed it and installed R 3.6.1 like you => same problem.

    When i type in r studio :
    > iconv(“schön, natürlich”, from = ‘WINDOWS-1252’, to = ‘ASCII//TRANSLIT’)
    [1] “schon, naturlich”
    It’s OK

    It’s the first time I run a R script in power bi, maybe something is “missing” ?

      • So I resolved my first problem, I my query result there was an incorrect date in the column date_begin => 2020-03-00
        I replace it with 2020-03-01 and now it’s OK

        Now I have 2 new problems :
        1/ In the “fonction” column
        “Télécom” become “TAClACcom” but I want “Telecom”
        2/ In the column date_begin I have “Microsoft.OleDb.Date” in place of the date on the file

        I don’t understand why the script modify also the date_begin column.

        • Janis Sturis

          It is great that you are not giving up! That is a good recipe for success.

          1. I tried to change some of the iconv parameters and this worked for me:
          iconv(“Télécom”, to = ‘ASCII//TRANSLIT’)
          [1] “Telecom”

          2. Looks like there is a solution to the date problem: https://stackoverflow.com/questions/41823277/r-script-in-power-bi-returns-date-as-microsoft-oledb-date

          • Thank you very much for your help

            1/ When I try in Rstudio, both are working correctly :
            > iconv(“Télécom”, from = ‘WINDOWS-1252’, to = ‘ASCII//TRANSLIT’)
            [1] “Telecom”
            > iconv(“Télécom”, to = ‘ASCII//TRANSLIT’)
            [1] “Telecom”

            But in power BI, The 2 commands return “TAClACcom” :
            dataset$fonction<- iconv(dataset$fonction, from = 'WINDOWS-1252', to = 'ASCII//TRANSLIT')
            output <- dataset

            dataset$fonction<- iconv(dataset$fonction, to = 'ASCII//TRANSLIT')
            output <- dataset

            I think I have found a solution here :
            https://stackoverflow.com/questions/39148759/remove-accents-from-a-dataframe-column-in-r

            But I don't know how to adapt the code to R in power BI

            2/ The solution in your link solved the problem Thank you.

          • It’s also OK for 1/ with this code

            library(stringi)

            dataset$fonction <- stri_trans_general(dataset$fonction,"Latin-ASCII")

            output <- dataset

Leave a Reply

Your email address will not be published. Required fields are marked *