Simstring (python) installation in windows

Question:

I am trying to install simstring python wrapper in windows by https://github.com/Georgetown-IR-Lab/simstring. For linux it works fine but for windows it is giving me error while installing.

    D:Userssourcerepos>python setup.py install
    running install
    running build
    running build_py
    running build_ext
    building '_simstring' extension
    C:Program Files (x86)Microsoft Visual Studio2017CommunityVCToolsMSVC14.12.25827binHostX86x64cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -I. -IC:ProgramDataAnaconda3include -IC:ProgramDataAnaconda3include "-IC:Program Files (x86)Microsoft Visual Studio2017CommunityVCToolsMSVC14.12.25827ATLMFCinclude" "-IC:Program Files (x86)Microsoft Visual Studio2017CommunityVCToolsMSVC14.12.25827include" "-IC:Program Files (x86)Windows KitsNETFXSDK4.6.1includeum" "-IC:Program Files (x86)Windows Kits10include10.0.16299.0ucrt" "-IC:Program Files (x86)Windows Kits10include10.0.16299.0shared" "-IC:Program Files (x86)Windows Kits10include10.0.16299.0um" "-IC:Program Files (x86)Windows Kits10include10.0.16299.0winrt" /EHsc /Tpexport.cpp /Fobuildtemp.win-amd64-3.6Releaseexport.obj
    export.cpp
    export.cpp(7): fatal error C1083: Cannot open include file: 'iconv.h': No such file or directory
    error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\bin\HostX86\x64\cl.exe' failed with exit status 2

After this I included iconv.h in the project. But now it shows different error.

running install
running build
running build_py
running build_ext
building '_simstring' extension
C:Program Files (x86)Microsoft Visual Studio2017CommunityVCToolsMSVC14.12.25827binHostX86x64cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -I. -IC:ProgramDataAnaconda3include -IC:ProgramDataAnaconda3include "-IC:Program Files (x86)Microsoft Visual Studio2017CommunityVCToolsMSVC14.12.25827ATLMFCinclude" "-IC:Program Files (x86)Microsoft Visual Studio2017CommunityVCToolsMSVC14.12.25827include" "-IC:Program Files (x86)Windows KitsNETFXSDK4.6.1includeum" "-IC:Program Files (x86)Windows Kits10include10.0.16299.0ucrt" "-IC:Program Files (x86)Windows Kits10include10.0.16299.0shared" "-IC:Program Files (x86)Windows Kits10include10.0.16299.0um" "-IC:Program Files (x86)Windows Kits10include10.0.16299.0winrt" /EHsc /Tpexport.cpp /Fobuildtemp.win-amd64-3.6Releaseexport.obj
export.cpp
d:usersakisourcerepossimstringcdbpp.h(101): warning C4267: 'initializing': conversion from 'size_t' to 'uint32_t', possible loss of data
export.cpp(37): error C2664: 'size_t libiconv(libiconv_t,const char **,size_t *,char **,size_t *)': cannot convert argument 2 from 'char **' to 'const char **'
export.cpp(37): note: Conversion loses qualifiers
export.cpp(140): note: see reference to function template instantiation 'bool iconv_convert<std::string,std::wstring>(libiconv_t,const source_type &,destination_type &)' being compiled
        with
        [
            source_type=std::string,
            destination_type=std::wstring
        ]
error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\bin\HostX86\x64\cl.exe' failed with exit status 2

Any help or guidance is appreciated.

Asked By: the_spectator

||

Answers:

Ground notes:

  • I managed to go with the build process but I got stuck at one point. I created [SO]: Compile error for (char based) STL (stream) containers in Visual Studio (I spent quite some time on that issue). I got that working somehow, but there were other (similar?) errors when trying to build SimString, so I had to strip some (Nix based) code (that didn’t compile) out

  • SimString is written in C++. When C++ (C) code is built, the result is PE or Portable Executable (.exe, .dll). Check [SO]: LNK2005 Error in CLR Windows Form (@CristiFati’s answer) for more details regarding how code gets transformed. When dealing with an .exe that depends on (loads) .dlls, there are certain restrictions:

    • The .exe (in this case python.exe)’s architecture (032bit (pc032) vs. 064bit (pc064) or (x86 vs. x64 (or AMD64))) must match the one of any .dll that it loads (and other .dll that a loaded .dll loads, and so on), so all the dlls in the dependency tree, otherwise the .dll won’t load

    • The platform (Debug vs. Release) should match in some cases. Here’s what could happen if it didn’t: [SO]: When using fstream in a library I get linker errors in the executable (@CristiFati’s answer), but I don’t think that we are in that situation

    • The build tools should also match in some (other) cases. Examples:

      • Compiler type ([SO]: Python extensions with C: staticforward (@CristiFati’s answer))

      • The CRT runtime ([SO]: Errors when linking to protobuf 3 on MS Visual C (@CristiFati’s answer))

      • The CRT runtime version is important in our case. Check [Python.Wiki]: WindowsCompilers for compatibilities between Python and VStudio versions. Note that this only applies for Python versions downloaded and installed (if you built your Python from sources, then you should use the same build tool – but I guess it’s not the case here)

        • I see you are using VStudio 2017, so the compatible versions are Python 3.5 and Python 3.6 1. I have ~10 Python installations on my machine (some installed, some built by me – with different compiler; most of them are pc064, I also have some VEnvs, but that shouldn’t make any difference). I also have 5 VStudio versions installed, in my case, setup.py automatically selects VStudio 2015 (but it’s ok, since as VStudio 2017 it has compiler v14.0)
    • SimString depends on LibIconv which also comes as a .dll (actually there are more, but we only care about one). Checking the .dll (see below) with Dependency Walker reveals that it’s x86 (pc032) 2. That means that either:

      • Python 032bit (x86) should be used. This is the variant that I’m going to go with. From 1 and 2, the only available version on my machine is Python 3.6 pc032 (Python 3.5 is my version of choice, I also have it in 032bit format, but I messed it up and didn’t reinstall it)

      • Build LibIconv from source, and get rid of restriction 2. But, that could take time, and it’s outside the scope of the current question. If there will be a question about building it, I’ll take some time and give it a shot, as I enjoy that kind of tasks ([SO]: How to build a DLL version of libjpeg 9b? (@CristiFati’s answer))

Walkthrough:

  • Create a dir and CD to it (should be empty). This will be the %ROOT_DIR%, and all the paths that I’m going to use will be relative to it (except of course for absolute ones), and this will be the default dir (when unspecified)

  • Download SimString sources ([GitHub]: Georgetown-IR-Lab/simstring – simstring-master.zip)

  • Unzip the archive – it will do it in a dir simstring-master (will be automatically created)

  • Create a dir libiconv. Inside it, download:

    1. [SourceForge]: gnuwin32/GnuWin – libiconv-1.9.2-1-lib.zip

    2. [SourceForge]: gnuwin32/GnuWin – libiconv-1.9.2-1-bin.zip

    3. Extract needed stuff from these files:

      • From #1.:

        • include dir – used at compile phase

        • lib dir – used at link phase

        • Both phases are performed by setup.py (below)

      • From #2.:

        • bin dir – used at runtime (when using (importing) the module)
  • CD to the simstring-master dir. To build the extension, I’m using setup.py‘s build_ext command (invoked recursively by install – as seen in your output): [Python 3.Docs]: distutils.command.build_ext – Build any extensions in a package

  • Running build_ext, will yield your error:

    export.cpp(7): fatal error C1083: Cannot open include file: 'iconv.h': No such file or directory
    

    That is because Python build system doesn’t know what we did (in the libiconv dir). To let it know, pass the:

    1. -I (–include-dirs) – will be translated to [MS.Docs]: /I (Additional include directories)

    2. -L (–library-dirs) – will be translated to [MS.Docs]: /LIBPATH (Additional Libpath)

    3. -l (–libraries) – will be translated to [MS.Docs]: LINK Input Files

    flags (python setup.py build_ext --help will display all of them). For now, don’t pass #2. and #3. because we won’t get to the link phase (where they are required):

    (py36x86_test) E:WorkDevStackOverflowq048528041simstring-master>"e:WorkDevVEnvspy36x86_testScriptspython.exe" setup.py build_ext -I"../libiconv/include"
    running build_ext
    building '_simstring' extension
    C:Installx86MicrosoftVisual Studio Community2015VCBINcl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -I. -I../libiconv/include -Ic:Installx86PythonPython3.6include -Ic:Installx86PythonPython3.6include "-IC:Installx86MicrosoftVisual Studio Community2015VCINCLUDE" "-IC:Installx86MicrosoftVisual Studio Community2015VCATLMFCINCLUDE" "-IC:Program Files (x86)Windows Kits10include10.0.16299.0ucrt" "-IC:Program Files (x86)Windows KitsNETFXSDK4.6.1includeum" "-IC:Program Files (x86)Windows Kits10include10.0.16299.0shared" "-IC:Program Files (x86)Windows Kits10include10.0.16299.0um" "-IC:Program Files (x86)Windows Kits10include10.0.16299.0winrt" /EHsc /Tpexport.cpp /Fobuildtemp.win32-3.6Releaseexport.obj
    export.cpp
    export.cpp(112): warning C4297: 'writer::~writer': function assumed not to throw an exception but does
    export.cpp(112): note: destructor or deallocator has a (possibly implicit) non-throwing exception specification
    export.cpp(126): warning C4297: 'writer::~writer': function assumed not to throw an exception but does
    export.cpp(126): note: destructor or deallocator has a (possibly implicit) non-throwing exception specification
    export.cpp(37): error C2664: 'size_t libiconv(libiconv_t,const char **,size_t *,char **,size_t *)': cannot convert argument 2 from 'char **' to 'const char **'
    export.cpp(37): note: Conversion loses qualifiers
    export.cpp(140): note: see reference to function template instantiation 'bool iconv_convert<std::basic_string<char,std::char_traits<char>,std::allocator<char>>,std::wstring>(libiconv_t,const source_type &,destination_type &)' being compiled
    with
    [
        source_type=std::basic_string<char,std::char_traits<char>,std::allocator<char>>,
        destination_type=std::wstring
    ]
    error: command 'C:\Install\x86\Microsoft\Visual Studio Community\2015\VC\BIN\cl.exe' failed with exit status 2
    
  • Things to do (found out fixing the errors one by one, only export.cpp required changes):

    1. #define ICONV_CONST const (cl.exe doesn’t automatically cast constness)

    2. #define __SIZEOF_WCHAR_T__ 2 (as sizeof(wchar_t) is 2)

    3. Strip out the code that doesn’t compile (that I talked about at the beginning): STL containers with 4 byte chars don’t compile on Win, wanted to fix the code, and when Win will support such chars, the code will compile OOTB, but I wasn’t able to, so I had to do whatever was done for OSX. As a consequence, #ifdef __APPLE__ should be replaced by #if defined(__APPLE__) || defined(WIN32) (5 occurrences)

    Note that #1. and #2. could (should) be done either by cmdline (-D flag, but I wasn’t able to specify a value for a defined flag), or in setup.py (so they are only defined once even if they need to be declared in lots of files), but I didn’t spend too much time on it, so I’m replacing them directly in the source code.

    Either apply the changes manually, either save:

    --- export.cpp.orig 2016-11-30 18:53:32.000000000 +0200
    +++ export.cpp  2018-02-14 13:36:31.317953200 +0200
    @@ -19,9 +19,18 @@
     #endif/*USE_LIBICONV_GNU*/
    
     #ifndef ICONV_CONST
    +#if defined (WIN32)
    +#define ICONV_CONST const
    +#else
     #define ICONV_CONST
    +#endif
     #endif/*ICONV_CONST*/
    
    +#if defined (WIN32)
    +#define __SIZEOF_WCHAR_T__ 2
    +#endif
    +
    +
     template <class source_type, class destination_type>
     bool iconv_convert(iconv_t cd, const source_type& src, destination_type& dst)
     {
    @@ -269,7 +278,7 @@
         iconv_close(bwd);
     }
    
    -#ifdef __APPLE__
    +#if defined(__APPLE__) || defined(WIN32)
     #include <cassert>
     #endif
    
    @@ -283,7 +292,7 @@
             retrieve_thru(dbr, query, this->measure, this->threshold, std::back_inserter(ret));
             break;
         case 2:
    -#ifdef __APPLE__
    +#if defined(__APPLE__) || defined(WIN32)
     #if __SIZEOF_WCHAR_T__ == 2
             retrieve_iconv<wchar_t>(dbr, query, UTF16, this->measure, this->threshold, std::back_inserter(ret));
     #else
    @@ -294,7 +303,7 @@
     #endif
             break;
         case 4:
    -#ifdef __APPLE__
    +#if defined(__APPLE__) || defined(WIN32)
     #if __SIZEOF_WCHAR_T__ == 4
             retrieve_iconv<wchar_t>(dbr, query, UTF32, this->measure, this->threshold, std::back_inserter(ret));
     #else
    @@ -317,7 +326,7 @@
             std::string qstr = query;
             return dbr.check(qstr, translate_measure(this->measure), this->threshold);
         } else if (dbr.char_size() == 2) {
    -#ifdef __APPLE__
    +#if defined(__APPLE__) || defined(WIN32)
     #if __SIZEOF_WCHAR_T__ == 2
             std::basic_string<wchar_t> qstr;
     #else
    @@ -333,7 +342,7 @@
             iconv_close(fwd);
             return dbr.check(qstr, translate_measure(this->measure), this->threshold);
         } else if (dbr.char_size() == 4) {
    -#ifdef __APPLE__
    +#if defined(__APPLE__) || defined(WIN32)
     #if __SIZEOF_WCHAR_T__ == 4
             std::basic_string<wchar_t> qstr;
     #else
    

    as simstring_win.diff. That is a diff. See [SO]: Run / Debug a Django application’s UnitTests from the mouse right click context menu in PyCharm Community Edition? (@CristiFati’s answer) (Patching UTRunner section) for how to apply patches on Win (basically, every line that starts with one "+" sign goes in, and every line that starts with one "-" sign goes out).
    I also submitted this patch to [GitHub]: Georgetown-IR-Lab/simstring – Support for Win, and it was merged today (180222).

    (py36x86_test) E:WorkDevStackOverflowq048528041simstring-master>"c:Installx64CygwinCygwinAllVersbinpatch.exe" -i "../simstring_win.diff"
    patching file export.cpp
    
    (py36x86_test) E:WorkDevStackOverflowq048528041simstring-master>rem Looking at export.cpp content, you'll notice the changes
    
    (py36x86_test) E:WorkDevStackOverflowq048528041simstring-master>"e:WorkDevVEnvspy36x86_testScriptspython.exe" setup.py build_ext  -I"../libiconv/include" -L"../libiconv/lib" -llibiconv
    running build_ext
    building '_simstring' extension
    C:Installx86MicrosoftVisual Studio Community2015VCBINcl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -I. -I../libiconv/include -Ic:Installx86PythonPython3.6include -Ic:Installx86PythonPython3.6include "-IC:Installx86MicrosoftVisual Studio Community2015VCINCLUDE" "-IC:Installx86MicrosoftVisual Studio Community2015VCATLMFCINCLUDE" "-IC:Program Files (x86)Windows Kits10include10.0.16299.0ucrt" "-IC:Program Files (x86)Windows KitsNETFXSDK4.6.1includeum" "-IC:Program Files (x86)Windows Kits10include10.0.16299.0shared" "-IC:Program Files (x86)Windows Kits10include10.0.16299.0um" "-IC:Program Files (x86)Windows Kits10include10.0.16299.0winrt" /EHsc /Tpexport.cpp /Fobuildtemp.win32-3.6Releaseexport.obj
    export.cpp
    export.cpp(121): warning C4297: 'writer::~writer': function assumed not to throw an exception but does
    export.cpp(121): note: destructor or deallocator has a (possibly implicit) non-throwing exception specification
    export.cpp(135): warning C4297: 'writer::~writer': function assumed not to throw an exception but does
    export.cpp(135): note: destructor or deallocator has a (possibly implicit) non-throwing exception specification
    C:Installx86MicrosoftVisual Studio Community2015VCBINcl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -I. -I../libiconv/include -Ic:Installx86PythonPython3.6include -Ic:Installx86PythonPython3.6include "-IC:Installx86MicrosoftVisual Studio Community2015VCINCLUDE" "-IC:Installx86MicrosoftVisual Studio Community2015VCATLMFCINCLUDE" "-IC:Program Files (x86)Windows Kits10include10.0.16299.0ucrt" "-IC:Program Files (x86)Windows KitsNETFXSDK4.6.1includeum" "-IC:Program Files (x86)Windows Kits10include10.0.16299.0shared" "-IC:Program Files (x86)Windows Kits10include10.0.16299.0um" "-IC:Program Files (x86)Windows Kits10include10.0.16299.0winrt" /EHsc /Tpexport_wrap.cpp /Fobuildtemp.win32-3.6Releaseexport_wrap.obj
    export_wrap.cpp
    C:Installx86MicrosoftVisual Studio Community2015VCBINlink.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:c:Installx86PythonPython3.6Libs /LIBPATH:../libiconv/lib /LIBPATH:e:WorkDevVEnvspy36x86_testlibs /LIBPATH:e:WorkDevVEnvspy36x86_testPCbuildwin32 "/LIBPATH:C:Installx86MicrosoftVisual Studio Community2015VCLIB" "/LIBPATH:C:Installx86MicrosoftVisual Studio Community2015VCATLMFCLIB" "/LIBPATH:C:Program Files (x86)Windows Kits10lib10.0.16299.0ucrtx86" "/LIBPATH:C:Program Files (x86)Windows KitsNETFXSDK4.6.1libumx86" "/LIBPATH:C:Program Files (x86)Windows Kits10lib10.0.16299.0umx86" libiconv.lib /EXPORT:PyInit__simstring buildtemp.win32-3.6Releaseexport.obj buildtemp.win32-3.6Releaseexport_wrap.obj /OUT:buildlib.win32-3.6_simstring.cp36-win32.pyd /IMPLIB:buildtemp.win32-3.6Release_simstring.cp36-win32.lib
       Creating library buildtemp.win32-3.6Release_simstring.cp36-win32.lib and object buildtemp.win32-3.6Release_simstring.cp36-win32.exp
    Generating code
    Finished generating code
    
    (py36x86_test) E:WorkDevStackOverflowq048528041simstring-master>dir /b "buildlib.win32-3.6"
    _simstring.cp36-win32.pyd
    
  • Finally, it built. the .pyd is just a .dll. This is how it looks like in Dependency Walker:

    _simstring.pyd

  • Let’s try to see if we can use it:

    (py36x86_test) E:WorkDevStackOverflowq048528041simstring-master>"e:WorkDevVEnvspy36x86_testScriptspython.exe" sample.py
    Traceback (most recent call last):
      File "E:WorkDevStackOverflowq048528041simstring-mastersimstring.py", line 18, in swig_import_helper
        fp, pathname, description = imp.find_module('_simstring', [dirname(__file__)])
      File "e:WorkDevVEnvspy36x86_testlibimp.py", line 296, in find_module
        raise ImportError(_ERR_MSG.format(name), name=name)
    ImportError: No module named '_simstring'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "sample.py", line 3, in <module>
        import simstring
      File "E:WorkDevStackOverflowq048528041simstring-mastersimstring.py", line 28, in <module>
        _simstring = swig_import_helper()
      File "E:WorkDevStackOverflowq048528041simstring-mastersimstring.py", line 20, in swig_import_helper
        import _simstring
    ModuleNotFoundError: No module named '_simstring'
    

    That is because when importing SimString, which in turn imports _simstring (the .pyd), Python doesn’t find it. To fix this:

    • Add the .pyd path to %PYTHONPATH%

    • As seen in the pic, the .pyd depends on libiconv2.dll, so the OS must know where to look for it. Simplest way is to add its path to %PATH% ([MS.Docs]: Dynamic-Link Library Search Order)

    (py36x86_test) E:WorkDevStackOverflowq048528041simstring-master>set PYTHONPATH=%PYTHONPATH%;buildlib.win32-3.6
    
    (py36x86_test) E:WorkDevStackOverflowq048528041simstring-master>set PATH=%PATH%;..libiconvbin
    
    (py36x86_test) E:WorkDevStackOverflowq048528041simstring-master>"e:WorkDevVEnvspy36x86_testScriptspython.exe" sample.py
    ('Barack Hussein Obama II',)
    ('James Gordon Brown',)
    ()
    ('Barack Hussein Obama II',)
    

Final notes:

  • There is some output from the module, it’s identical to the one on Nix (Ubuntu) (where I also built it – there I encountered no problem), I’m not sure whether it’s semantically correct or not

  • I didn’t run setup.py‘s install command (and I’m not gonna), one thing that I can think of that could go wrong (although I’m not sure it will), is not copying / including libiconv2.dll into the .whl. If so, you’ll probably need to modify setup.py (changes should be minor)

Answered By: CristiFati

I was able to build that repo under Cygwin. The packages libiconv-devel and python3-devel both need to be installed.

After that, I had one more change that I made to ensure that libiconv would be available for Windows build. I made that single commit here:

https://github.com/burgersmoke/simstring

Answered By: burgersmoke

Besides my other response about building under Cygwin, I’ve made a few other changes to allow this to build and install seamlessly with Windows using Anaconda. Turns out conda can install iconv very easily.

Much of this is based on the work that ChristiFati added in this thread, this change intends to simplify the steps and potential installation.

This change currently exists in my own fork. Steps are in the README here. I have also submitted a Pull Request for this.

UPDATE: This pull request has now been taken into the Georgetown repo so you can get it herre:
https://github.com/Georgetown-IR-Lab/simstring

As a side note, one of the motivations for doing this is making this repo easier to set up: https://github.com/Georgetown-IR-Lab/QuickUMLS

Answered By: burgersmoke
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.